Throughout history and up into modern days, a big fuss has been made
among philosophers about defining and dealing with causality. For a nice
overview, see these lecture slides, which illustrate the
troubled history of the concept. In recent times, formal approaches have
been developed to connect causality to probabilistic/statistical
reasoning (Rubin) or to do just the opposite, treating
causality as an extension supposedly completely out of scope of
probability theory (Pearl). It seems that the causality
debate still rages on, apparently now on the battlefield of notations.
For example, listen to Pearl's recent lecture in which he
quips that "mere mortals" not trained by Rubin cannot verify certain
expressions required within Rubin's framework. Pearl himself advocates a
graphical representation of causality (little wonder in light of his
past work). Even so, when asked about modeling just slightly
complicated scenarios (A causes B, but only given C), he grudgingly
admits that graphs do not directly expressing such constraints. Instead,
the constraint can be hidden within the probability distribution
associated with a graph.
Hearing all this, I wonder whether the award-winning philosopher
is not now in the business of shooting sparrows with cannons. I agree
with Pearl's assessment that given a set of structural equations
or a graphical model (like his electric circuit example), all causal
and counterfactual questions can be readily answered by simply running
the model (simulation). I'm puzzled why Pearl does not go one step
further and point out that nowadays (and since 50+ years) we have
very elaborate and wildly popular tools for expressing causal models
and the equipment for running them. They are imperative programming
languages and computers, of course. Every program written in
an imperative language is an intricate causal model, in which
expressing constraints of the sort mentioned above comes effortlessly
and the notion of time (so central to all causal reasoning) is given
by the execution semantics.
For example:
if (c == C)
{
if (a == A)
{
b = B;
}
}
which is of course equivalent to
if (c == C && a == A) { b = B; }
which is of course equivalent to stating "A and C (combined) cause B".
Given such a model, we may call A and C separately "necessary causes" if we
so prefer. We may call "A and C" the "sufficient cause". Finally, given
a particular run and a different expression of the sort "A or C", we may
speak of the "actual" cause having been either "A" or "C" or both.
What I wish to say is that there are no doubts about causality given
a model in form of a computer program. It also makes obviously clear
how pointing to a single variable as "the" cause of something could
be incorrect. Finally, modeling runs of computer programs has been
a topic in computer science for decades, even if the researchers have
never bothered to use the word "causality" in this context.
Of course, computer programs are entirely deterministic and hardly
"statistical" beasts. However, who says that the "real-world" causality
is not or at least may not be treated as such? If you view
probability, as I do, as a means for modeling epistemic (that is,
modeler's own) uncertainty rather than some ontologic "stochastic
randomness" of nature, then you can apply it without hesitation to
deterministic computer programs, in circumstances where parts of the
state or code are unknown. For example, you could model an unknown
variable value as a probability distribution over possible values,
or you could model an unknown segment of code as a probability
distribution over possible segments. (If you can't even enumerate the
possibilities or if they appear "infinite", you are in trouble;
ask yourself whether and why you know so little and how you could
find out more.)
The challenge of science is, as Pearl rightly points out, that we
seldom know the causal model. That is, we either don't know
what program has (or may have) generated our observations, or the same
set of observations might have been equally been generated by many
different programs. In this latter case we have a uniform probability
distribution over programs. Our task then is to somehow infer the
program from the observations and from "causal assumptions" - data
and the prior. The "somehow" should be plausible reasoning according
to the rules of probability theory, and so we have a connection
(not of the sort contemplated by Pearl/Rubin).
The causal assumptions correspond to our estimate about which models
(programs) are possible at all, and which are consistent with other
models (programs) that we already deem as accurate and useful
representations of reality. Interventions before observation help
enormously by lowering probabilities for sets of programs not
compatible with the intervention+observation data.
For example, given the following set of observations:
a = 1, b = 0
a = 0, b = 0
a = 0, b = 1
a = 1, b = 0
a = 0, b = 1
a = 1, b = 0
a = 1, b = 0
we could just as well fit the following two causal models (and many
others):
if (a == 1) { b = 0; }
or
if (b == 1) { a = 0; }
However, if we perform a set of interventions of setting b = 1 and
observing a != 0, and another set of interventions of setting a = 1
and observing b == 0, the first model will stand the test while the
second one will become very implausible. However, we should be careful
to not proclaim it impossible, as there still could be hidden variables
not accounted for within the model contributing to the observed outcomes.
One day, we might find these factors and control for them and
setting b = 1 might then indeed begin causing a == 0. And so we see
that:
- causality, much like probability, is in the eye of the beholder
- (incomplete) causal models may be treated as if they generated
data according to some probability distributions
- causal models may be assigned probabilities
That said, there is little reason to make a big fuss about finding
the "one true definition" of causality, the "one true notation" for
representing causal arguments, or "measurement methods" for
determining strength of "causal connections". We have no need for
big philosophy of causal reasoning, but great need for good,
sufficiently granular and computationally cheap causal models
that reliably deliver predictions about effects of actions to
their users.