No fuss about causality

Throughout history and up into modern days, a big fuss has been made among philosophers about defining and dealing with causality. For a nice overview, see these lecture slides, which illustrate the troubled history of the concept. In recent times, formal approaches have been developed to connect causality to probabilistic/statistical reasoning (Rubin) or to do just the opposite, treating causality as an extension supposedly completely out of scope of probability theory (Pearl). It seems that the causality debate still rages on, apparently now on the battlefield of notations. For example, listen to Pearl's recent lecture in which he quips that "mere mortals" not trained by Rubin cannot verify certain expressions required within Rubin's framework. Pearl himself advocates a graphical representation of causality (little wonder in light of his past work). Even so, when asked about modeling just slightly complicated scenarios (A causes B, but only given C), he grudgingly admits that graphs do not directly expressing such constraints. Instead, the constraint can be hidden within the probability distribution associated with a graph.

Hearing all this, I wonder whether the award-winning philosopher is not now in the business of shooting sparrows with cannons. I agree with Pearl's assessment that given a set of structural equations or a graphical model (like his electric circuit example), all causal and counterfactual questions can be readily answered by simply running the model (simulation). I'm puzzled why Pearl does not go one step further and point out that nowadays (and since 50+ years) we have very elaborate and wildly popular tools for expressing causal models and the equipment for running them. They are imperative programming languages and computers, of course. Every program written in an imperative language is an intricate causal model, in which expressing constraints of the sort mentioned above comes effortlessly and the notion of time (so central to all causal reasoning) is given by the execution semantics.

For example:

if (c == C)
{
        if (a == A)
        {
                b = B;
        }
}

which is of course equivalent to

if (c == C && a == A) { b = B; }

which is of course equivalent to stating "A and C (combined) cause B". Given such a model, we may call A and C separately "necessary causes" if we so prefer. We may call "A and C" the "sufficient cause". Finally, given a particular run and a different expression of the sort "A or C", we may speak of the "actual" cause having been either "A" or "C" or both. What I wish to say is that there are no doubts about causality given a model in form of a computer program. It also makes obviously clear how pointing to a single variable as "the" cause of something could be incorrect. Finally, modeling runs of computer programs has been a topic in computer science for decades, even if the researchers have never bothered to use the word "causality" in this context.

Of course, computer programs are entirely deterministic and hardly "statistical" beasts. However, who says that the "real-world" causality is not or at least may not be treated as such? If you view probability, as I do, as a means for modeling epistemic (that is, modeler's own) uncertainty rather than some ontologic "stochastic randomness" of nature, then you can apply it without hesitation to deterministic computer programs, in circumstances where parts of the state or code are unknown. For example, you could model an unknown variable value as a probability distribution over possible values, or you could model an unknown segment of code as a probability distribution over possible segments. (If you can't even enumerate the possibilities or if they appear "infinite", you are in trouble; ask yourself whether and why you know so little and how you could find out more.)

The challenge of science is, as Pearl rightly points out, that we seldom know the causal model. That is, we either don't know what program has (or may have) generated our observations, or the same set of observations might have been equally been generated by many different programs. In this latter case we have a uniform probability distribution over programs. Our task then is to somehow infer the program from the observations and from "causal assumptions" - data and the prior. The "somehow" should be plausible reasoning according to the rules of probability theory, and so we have a connection (not of the sort contemplated by Pearl/Rubin).

The causal assumptions correspond to our estimate about which models (programs) are possible at all, and which are consistent with other models (programs) that we already deem as accurate and useful representations of reality. Interventions before observation help enormously by lowering probabilities for sets of programs not compatible with the intervention+observation data.

For example, given the following set of observations:

a = 1, b = 0
a = 0, b = 0
a = 0, b = 1
a = 1, b = 0
a = 0, b = 1
a = 1, b = 0
a = 1, b = 0

we could just as well fit the following two causal models (and many others):

if (a == 1) { b = 0; }

or

if (b == 1) { a = 0; }

However, if we perform a set of interventions of setting b = 1 and observing a != 0, and another set of interventions of setting a = 1 and observing b == 0, the first model will stand the test while the second one will become very implausible. However, we should be careful to not proclaim it impossible, as there still could be hidden variables not accounted for within the model contributing to the observed outcomes. One day, we might find these factors and control for them and setting b = 1 might then indeed begin causing a == 0. And so we see that:

  • causality, much like probability, is in the eye of the beholder
  • (incomplete) causal models may be treated as if they generated data according to some probability distributions
  • causal models may be assigned probabilities

That said, there is little reason to make a big fuss about finding the "one true definition" of causality, the "one true notation" for representing causal arguments, or "measurement methods" for determining strength of "causal connections". We have no need for big philosophy of causal reasoning, but great need for good, sufficiently granular and computationally cheap causal models that reliably deliver predictions about effects of actions to their users.

No comments: