Introduction to Probability Theory, Part 4

Having discussed how different persons might assign different probabilities to the same proposition and for the time being disposed of the notion of the "one true probability", let's turn to another intriguing question: how does one person know whether her assigned probability is correct - that it reflects her own background information? Obviously, if what you know remains unchanged, so should the probability assignments that you make for any proposition on that basis. In other words, it would not be sensible to assign at your whim two different probabilities to the same proposition, unless you have in the meanwhile learned something new which is somehow related to that proposition. However, if you agree with me that the probability assignments are stable given some state of knowledge, then still the question remains: which probability assignment among the infinite number of assignments between 0 and 1 is appropriate - and what does "appropriate" even mean precisely?

The answer to the second question given by probability theory is intuitive and satisfactory. An "appropriate" or "correct" probability assignment is such one that is consistent with all the other probability assignments you might make. That is, you cannot have your cake and eat it too: because all considered propositions are either true or false, and because they are interwoven (their meanings are related to each other), there is some risk that you might come up with an internally contradictory probability assignment - based on which you'd have to conclude that some proposition is both true and false. The "correct" assignment, on the other hand, does not evoke any such absurd conclusions.

For example, you cannot and would not at the same time believe that I am both younger and older than any given age; if you felt 75% sure that I'm older, you'd also feel 25% sure that I'm younger and vice versa. However, if you were to assign probabilities to some related and indirect propositions instead, from which my age could be derived (say, propositions about myself having witnessed certain historical events during my lifetime, propositions about my friends' and parents' ages etc.), it could happen by accident that your combined probability assignment would imply that you do believe in that absurdity. You would then have to reject such a probability assignment and find out at which particular subproposition it went wrong (including the possibility of going wrong many times).

As an analogy, it helps to consider "financial arithmetics". If you were an accountant tasked with summing percentual fractions representing parts of a whole amount and arrived at a sum which was either greater or less than 100%, you'd know that you must have made a mistake somewhere along the way. Note, however, that it is a rather weak criterion of correctness: for not all sums that end up with 100% contain all the right components. Indeed, you can produce an infinite number of artificial sums that all end up with 100% by tweaking the individual components relatively to each other. So what you'd need to become more certain that the arithmetical calculation reflects reality would be some additional means of checking consistency, such as partial sums. Depending on your level of paranoia, you could introduce more and more partial sums ad infinitum. The point is, they would all have to be consistent in order for you to be satisfied that the calculation - analogous to a probability assignment - was true. Just one slight deviation from their expected relationship would mean that an error slipped in.

Now that we have introduced self-consistency as a means of checking whether a given probability assignment is the correct one, have we also answered the first question - how do we find this correct assignment? Yes and no. In principle, we could "simply" write down hundreds of thousands of probability assignments and then go through each one and note the inconsistencies it contains, and in the end accept the probability assignment which has the least amount of inconsistency. Obviously, given the infinite number of different possible assignments, this would be a formidable task (for any human and for any machine), and also nothing like what we are used to in solving real problems. This would be comparable to an accountant generating randomly hundreds of thousands of balance statements and then going through the heap to check which of them reflects the company's finances. Fortunately, it's not how accountants work and not a sensible use of probability theory either. What we need instead is a kind of reliable, mechanical rules that allow us to construct internally consistent assignments, as long as we stick to them, much like rules of artihmetics don't ever let you down. Such rules indeed do exist, and they form the very core of probability theory, or as R. T. Cox called them "an algebra of probable inference".

To be continued...


Michael said...

How do you recommend learning probability theory?

My goal is to work on AI. My math level is basic college calculus and linear algebra.

Should I just get a college textbook on probability theory and work through the exercises?

Anonymous said...

@Michael: Use multiple sources. For starters I would recommend the book "The Algebra of Probable Inference" by Richard T. Cox. Short, clear and to the point. Other than that, to pique your interest, read Eliezer Yudkowsky's introduction (though it is kind of biased, like the articles on this blog). Then get a standard college textbook, compare the approach (which will likely be slanted toward "frequentist" rather than "Bayesian"), and go through the exercises.

Michael said...


By Yudkowsky's introduction, do you mean "An Intuitive Explanation of Bayes' Theorem"?

I started reading Cox, he mentions a book by Venn "Logic of Chance", have you read it?

Also, I see a lot of praise for "Science of Logic" by Jaynes, what do you think about it?