Skip to main content

Causal Discovery - Understanding Others in a Chaotic World

Causal Discovery - Understanding Others in a Chaotic World

Rhys Howard (DPhil Student), Cognitive Robotics Group 

As humans we make many decisions every day, and whether these decisions are insignificant movements of the body or life-altering choices we typically aim to consider how others might react to, or interpret our decisions. This is an important aspect when interacting with people in a social context, yet it is a skill which AI and robots are largely devoid of.

A perfect microcosm for observing these types of interactions is the driving domain. When in control of a vehicle you must constantly be both aware of the behaviour of others around you, as well as the effect of your actions on others. Therefore the interdependencies between the actions of various road agents (e.g. cars, bikes, pedestrians) can be considered as a type of causal system, where the actions of one agent can alter the actions or perceived utility of other agents.

The discovery of causal systems is a well studied field of statistics and computer science, with a multitude of approaches available to tackle various types of challenges from several angles. However, as we will discuss here, the chaotic nature of interactions between agents in the real world presents a new set of obstacles that existing methods are ill suited to tackle.

Simple Scenario, Difficult Problem

When considering any new engineering problem, it makes sense to tackle a simple scenario before increasing the difficulty of the scenario to determine the limits of the approach being applied. However for this particular problem, even the most basic scenario we could envisage proved to be a formidable opponent for all the state of the art methods considered.

To illustrate this scenario, envisage two vehicles, one in front of the other, we will refer to these vehicles moving forwards as "c0" and "c1" respectively. In this scenario, if c0 brakes, c1 is also forced to brake, or else c1 will crash into the rear of c0. Meanwhile, if c0 accelerates, assuming c1 can accelerate without breaching the speed limit, it will likely do so. Thus the scenario describes a causal system in which the behaviour of c0 affects the behaviour of c1.

While the previously described scenario may describe a causal system which we would be interested in discovering, a potential flaw of such a scenario is that there are only two agents present, and we know because of the scenario definition that there exists a causal relationship in one direction between the agents. Therefore, we could hypothetically create an approach that assigns a causal relationship with the direction determined by coin flip, and this approach would be correct 50% of the time, despite clearly being a poor approach for determining whether two road agents influence one another. The solution is to add a third agent — "i0" — which has behaviour entirely independent of the other two agents. In doing so, any proposed approach would not only have to determine the presence of a causal relationship from c0 to c1, but also avoid discovering any spurious causal relationships between i0 and the other agents.

Introducing Our Roster

In order to evaluate the state of the art on the previously described scenario, we assembled a collection of 10 different causal discovery techniques. Each of these techniques is more specifically a temporal causal discovery technique, or in other words a causal discovery technique that is explicitly aware of time as a parameter. This is typically important when the data for variables is given as a time series and the causal relationship between two variables is not instantaneous, which is usually not the case when considering human behaviour.

While the approaches we evaluated each describe complex algorithms for tackling the problem of causal discovery, here a brief overview of each category of methods is given, just to give a rough idea of how they operate:

  • Granger Causality: Assumes that if one variable gives information that helps in predicting another, then the former variable is likely involved in causing the latter in some fashion. This is the premise of the PWGC [4], MVGC [3], TCDF [9] and NAVAR [1] approaches considered by us.
  • Constraint-Based: Relies upon a predefined set of rules being applied based upon conditional independence between variables. Conditional independence here refers to determining if there is any correlation between two variables while accounting for another set of variables. For example, there could be a correlation between ice-cream sales and wildfires, but if one accounts for the temperature / weather this spurious correlation would likely disappear. The PCMCI [12] and tsFCI [2] approaches we consider take this approach.
  • Score-Based: Provides a score for how well the data fits to a causal model and makes small changes while assessing whether these changes lead to a better fitting causal model. The DYNOTEARS [10] method utilises this concept.
  • Noise-Based: Relies upon the fact that noise — or in other words the random fluctuations in data — propagate from causing variables to effected variables. Using the previous example, random fluctuations in daily temperature will affect ice-cream sales, but random fluctuations in ice-cream sales do not affect the daily temperature. The LiNGAM [7] and TiMINo [11] apply this idea.
  • Non-Stationarity-Based: Within temporal causal discovery there is frequently the problem of temporal non-stationarity, or in other words the idea that the causal relationships we are interested in actually vary with time. A few methods such as CD-NOD [6] actively aim to exploit variations in these causal relations with time in order to discover said causal relations.

As for the data these approaches are evaluated upon, we utilised the Lyft Level 5 / Woven Planet Prediction [5] (left) and High-D [8] (right) datasets. Following a mixture of manual and automatic pre-processing we produced 50 Lyft scenes and 3396 High-D scenes matching the previously described scenario. We additionally generated 100 synthetically generated scenes in order to evaluate the extent to which performance was effected by real world conditions.

A Cause for Concern

Having run all the methods across all 3396 scenes, we produced the statistical data shown in the figure above. To quickly cover how each of these graphs differ, the top left shows us the average performance of each of the methods for each dataset, while the top right shows us the average performance depending upon whether acceleration or velocity is used as the variable of interest for each vehicle. The bottom two graphs are just used to determine optimal parameters for each method, and generally the performance varies little across the parameter values.

Arguably the most interesting of the graphs is the top left, since it shows a significant change between the real-world data and the synthetically generated data. Generally we would argue that with our performance metric (i.e. F 1 Score) falling for the most part under 0.5 that the tested methods provide satisfactory less than 50% of the time. Given that we are considering a safety-critical domain like driving, this level of performance is not adequate if we want to utilise causal discovery in real systems.

So why are these tried and tested methods failing at this task where they have succeeded in others:

  1. Generally these methods are used to gathering data across a protracted length of time, and the interactions between variables in the data are continuously happening. In our case, we working with short scenes, moment to moment, with vehicles interacting causally only occasionally. This means that the sheer level of “examples” of causal behaviour for these methods to work with is limited, therefore hampering their performance.
  2. The other immediate issue is that we earlier brought up the issue of causal stationarity, or in other words whether causal relationships change over time. Humans are anything but predictable and so there is no guarantee that if the lead vehicle in a convoy brakes, that the tail vehicle will always respond with the exact same level of braking and with an exact same reaction time. While methods such as CD-NOD that exploit non-stationarity are considered, they only consider variations in causal relationships for established time-lags throughout time. However, because of variations in reaction time, not only can CD-NOD not exploit these variations, but they hamper its ability to exploit non-stationarity in general.

The Good, the Bad, and the Counterfactual

What are our options if the state of the art struggles to tackle even a relatively simple problem when agent behaviour is involved? Well for starters, the methods we considered previously were technically comprised only of those which occupy the first — and simplest — level of causal complexity.

If we consider the left figure, we can see the three levels of increasing complexity within causality. Since all of the methods we considered previously rely upon working with observed data, they all belong to the first rung on this metaphorical ladder: “Association”.

So what about the second rung then: “Intervention”? Well this occasionally applied, and can be an incredibly powerful form of causal discovery. This corresponds to when we experiment and force specific variables to either taken a predefined or entirely random value. So in fact we do utilise this type of causal discovery both within science and our day to day lives, with great effect.

If intervention is so great, why do we not use it? Why even consider the observational methods in the first place? Well in many cases, actively intervening in order to gather data is more effort and thus more costly than just observing. However, in the case of interacting with people, particularly when driving the primary concern is the risk of harm to said people. If we have control of the lead vehicle in a two vehicle convoy, we can hardly randomly alter our speed in an effort to see if the rear vehicle is forced to respond.

This leaves us with only the highest level of causal complexity: “Counterfactuals”. This relies upon utilising an established causal model in order to envision outcomes which have never actually come to pass. However, how can we discover causal relations using a method that relies upon us already having a causal model? This requires us to reframe the problem in light of what we are actually trying to discover, and what we already know.



Thinking About Thinking

Ultimately what we are aiming to learn about is how the behaviour of one agent influences that of those around it. Up until now we have been mainly been considering variables of interest to describe the behaviour of agents. Yet, when we consider our own process of thinking it seems more intuitive to think of agent behaviour in terms of conscious decisions. In any case, switching our mode of thinking from continuous variables of interest to discrete decisions immediately has the benefit of removing the problem of non-stationarity. This is because causal relationships can now be considered between agent decisions rather than agent variables of interest. But this formulation does exacerbate the problem of limited data, which brings us onto our next component.

The above figure illustrates a model of how we consider agents to act or in other words describes a “Theory of Mind”. That being they observe certain variables in the world (e.g. actual vehicular speed), plan a series of decisions that are defined in terms of desired variable values (e.g. desired vehicular speed), and attempt to bring about those desired variable values by controlling a set of actuation variables (e.g. vehicular acceleration). By taking this view of an agent’s thought processes, if we can simulate the controller and world components, we can consider a counterfactual series of events from the planner’s perspective. The controller in our driving case can be based upon a simple proportional error controller, while the world can be simulated using some form of physics engine. It is worth pointing out that by putting all these components together we are indeed constructing a causal model rather than discovering one. However, given that such a causal model only tells us that agents influence each other through interacting in the world, it provides little insight compared with establishing a series of abstract causal links between decisions.

The final concept required to carry out causal discovery on agent behaviour is the idea of “Conditional Optimality”, or the idea that a decision by agent c1 caused by an earlier decision by agent c0 is only optimal for c1 given that the earlier decision occurred and not otherwise. It is straightforward to suggest that an improvement in outcome by taking a decision given that an earlier decision occurred is indicative of a potential causal relationship. However, it is also important to consider that if taking a decision leads to a better outcome regardless of whether another decision occurred, this reduces the likelihood of a causal relationship. This is due to the fact that in such a circumstance the decision would be beneficial to take regardless of what agent behaviour had come beforehand. To illustrate this point, consider the figure above where there exists a two vehicle convoy with red and green vehicles. In this scene we want to determine whether red braking causes green to brake. In the case where they both brake, all is well, however if red brakes while green does not, this results in green colliding with red. Therefore we can say that provided red brakes, it is optimal for green to brake. Meanwhile, if we consider whether it is optimal for green to brake when red does not, we can see that both circumstances result in red colliding with a vehicle, but not green. As such, from green’s perspective, both outcomes can be considered equal, and there is not necessarily a reason green braking would be optimal had red not braked. Thus the four alternate series of events considered would indicate that green braking only occurred as a result of red braking, giving us reason to believe in the existence of a causal relationship.


An Alternate Outcome?

The above figure illustrates the results of us applying our method against the 3396 scenes extracted from the High-D dataset previously. In fact we evaluated three variants of our approach: a reward-based variant, a agency-based variant, and a hybrid variant. The rewardbased variant measures how beneficial an outcome is to an agent as a number between 0 and 1. Meanwhile the agency-based variant determines whether an agent retains control over its own actions, in this case determined by whether a crash occurs or not. Finally the hybrid variant attempts to combine the other two variants. For the purposes of comparison we also included the 3 best performing observation-based methods from our earlier results. Most of these performance metrics remain constant across the x-axis since the x-axis represents a hyperparameter that only affects the reward-based and hybrid variants.

So with all that said and done, what is the verdict? If we take the best of the three variants (i.e. the agency-based approach) it’s clear that the counterfactual approach offers a significant increase in performance. While the agency-based approach does offer marginally less recall compared with the other variants and some observation-based methods, it effectively doubles in precision compared to the best performing observation-based approach. While an F 1 Score of ~0.65 is still far from where we want to be, this still represents a big step in the right direction.

From here we need to consider how we can further develop the idea of a counterfactual approach and apply such methods to increasingly complex scenarios. In doing so, we hope to create a stepping stone from which systems can be developed to discover behavioural causal relationships in real-time and then utilise these in order to produce autonomous agents which interact with humans in a more considerate manner. The scope of this research is by no means limited to autonomous driving either, while this is indeed an important area of study, autonomous systems must be aware of how their actions affect others in all sorts of domains. From elderly care to delivery robots, the service industry to manufacturing co-bots, we need to focus on developing causally aware robot systems as part of the process of transitioning to an increasingly automated world.


Corresponding Papers

Rhys Howard and Lars Kunze, "Evaluating temporal observation-based causal discovery techniques applied to road driver behaviour," in Proceedings of the 2nd Conference on Causal Learning and Reasoning, M. van der Shaar, D. Janzing, and C. Zhang, Eds. Journal of Machine Learning Research, 2023.

Rhys Howard and Lars Kunze, "Simulation-Based Counterfactual Causal Discovery on Real World Driver Behaviour," 2023 IEEE Intelligent Vehicles Symposium (IV), Anchorage, USA, 2023.



[1] Bart Bussmann, Jannes Nys, and Steven Latré. Neural additive vector autoregression models for causal discovery in time series. In Carlos Soares and Luis Torgo, editors, Discovery Science, pages 446–460, Cham, 2021. Springer International Publishing.

[2] Doris Entner and Patrik O Hoyer. On causal discovery from time series data using fci. In 5th European Workshop on Probabilistic Graphical Models, pages 121–128. Helsinki Institute for Information Technology HIIT, 2010.

[3] John Geweke. Measurement of linear dependence and feedback between multiple time series. Journal of the American Statistical Association, 77(378):304–313, 1982.

[4] C. W. J. Granger. Investigating causal relations by econometric models and cross-spectral methods. Econometrica, 37(3), 1969.

[5] J. Houston, G. Zuidhof, L. Bergamini, Y. Ye, A. Jain, S. Omari, V. Iglovikov, and P. Ondruska. One thousand and one hours: Self-driving motion prediction dataset., 2020.

[6] Biwei Huang, Kun Zhang, Jiji Zhang, Joseph Ramsey, Ruben Sanchez-Romero, Clark Glymour, and Bernhard Schölkopf. Causal discovery from heterogeneous/nonstationary data. Journal of Machine Learning Research, 21(89):1–53, 2020.

[7] Aapo Hyvärinen, Kun Zhang, Shohei Shimizu, and Patrik O. Hoyer. Estimation of a structural vector autoregression model using non-gaussianity. Journal of Machine Learning Research, 11(56):1709–1731, 2010.

[8] Robert Krajewski, Julian Bock, Laurent Kloeker, and Lutz Eckstein. The highd dataset: A drone dataset of naturalistic vehicle trajectories on german highways for validation of highly automated driving systems. In 2018 21st International Conference on Intelligent Transportation Systems (ITSC), pages 2118–2125, 2018.

[9] Meike Nauta, Doina Bucur, and Christin Seifert. Causal discovery with attention-based convolutional neural networks. Machine Learning and Knowledge Extraction, 1(1):312–340, 2019.

[10] Roxana Pamfil, Nisara Sriwattanaworachai, Shaan Desai, Philip Pilgerstorfer, Konstantinos Georgatzis, Paul Beaumont, and Bryon Aragam. Dynotears: Structure learning from time-series data. In Silvia Chiappa and Roberto Calandra, editors, Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, volume 108 of Proceedings of Machine Learning Research, pages 1595–1605. PMLR, 26–28 Aug 2020.

[11] Jonas Peters, Dominik Janzing, and Bernhard Schölkopf. Causal inference on time series using restricted structural equation models. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems, volume 26. Curran Associates, Inc., 2013.

[12] Jakob Runge, Peer Nowack, Marlene Kretschmer, Seth Flaxman, and Dino Sejdinovic. Detecting and quantifying causal associations in large nonlinear time series datasets. Science Advances, 5(11), 2019.

Other Credit

Flat car collection in top view: Freepik (

Ladder of Causality: Maayan Harel (