The intellectual dishonesty of "stochastics"

To describe inference problems using the language of stochastics does not necessarily yield poor results, but it seems inherently intellectually dishonest. To see what I mean, consider a typical language used in stochastics: "Given that we are dealing with a random process of the sort X, we can infer that Y is true ... [a valid argument follows]". The intellectual dishonesty is concealed in the "given that" introduction, as users of stochastics arguments hardly ever feel obliged to demonstrate that the premise is fulfilled. A particularly frequent example is the assumption of normally distributed errors.

A satisfactory demonstration of the assumptions' validity would usually require many empirical measurements, which might be outright impossible (e.g. to determine an error of an instrument you need an even more accurate instrument, which might be unavailable), too expensive, or simply out-of-reach of the person who is making the stochastic argument. If confronted with that inconvenient fact, several lame tactics are possible:

  • refer to the literature (claim that the actual measurements have been made already.. by someone else.. sometime);
  • refer to others behaving the same way (if everybody does it, then it must be right);
  • vaguely proclaim that we are dealing with idealized models, so we're ok after all;
  • if the normal distribution is questioned, refer to its natural occurrence and the central limit theorem - that is, claim that it is very likely to be the right distribution, after all.

Why are these tactics lame? Because they are simply attempts to conceal, at all cost, the speaker's lack of information; socially conditioned grasps to retain authority about a subject. However, we can and ought to be smarter than that. Consider this:

  1. Knowing is generally preferred to not knowing.
  2. Knowing that you don't know is generally preferred to pretending (to yourself and others) that you do. Even if it makes you feel good and shuts up critics.

It turns out that the "stochastic" statements about the random process can be easily translated into statements about the speaker's (and perhaps everyone else's!) lack of information about the exact characteristics of the deterministic process. In other words, we assume a particular distribution because we don't know any better one - the alternatives are even worse given what we do know. If you think about it for a second, it is quite a different (and better) approach than lying to yourself about what you know, for the very simple reason that the former way of thinking invites the possibility of learning more while the latter way of thinking has the precisely opposite effect.

It is very possible that experienced users of stochastics do realize all of the above, and so I am belaboring a trivial point. If so, it remains somewhat puzzling as to why their language does not mirror their thinking. A case of professional jargon abuse, maybe? Needless to say, this sort of language is definitely misleading to the uninitiated student of probability theory/statistics. The sooner you see through it, the better.

No comments: