Understanding Bayesian approaches to estimating probabilities are important. Often people don't get the full import of it and/or fail to see the consequences of not estimating it the right way. Most books that discuss it have confusing terminology to explain something that is fairly simple. Estimating Bayesian probabilities for events are relatively easier to understand, while those involving hypotheses aren't so even though the math involved is the exact same! In this blog, I'll try to explain the concept, but more importantly show the formula that any student can use so that they don't have "think through" the Bayesian logic all the time. However for some problems, it might be better to enumerate out the cases. To develop an intuition for this, you need to keep an eye out for new information that could be coming in and altering our belief in an existing hypothesis.

Here is the structure of the problem you will almost always run into. For simplicity let us assume you have a two hypotheses H0 and H1. There are probabilities associated with them P(H0) and P(H1). There comes along a piece of evidence E and you want to update your probability measures. The formula you could use is as simple as

..and thats it! This works, always. You just need to be able to map your problem to this framework.

To demonstrate this here is an example worked out.

Q. There are two boxes. Box A has 2 white coins and 1 black coin. Box B has 1 white coin and 2 black coins. If a user picks a box at random, what is the probability that box A was chosen? If the user reveals the coin to be a white one, what is the probability that box A was chosen?

A. The first half is simple. No extra information is revealed. The probability is simply 50%. The second half gets a little more interesting. As there are only two boxes to be chosen, knowing the probability of one implies you know the other.

So let's cast it in the framework indicated above. The hypothesis H0 is that box A was chosen. P(H0) is the "prior", that is the probability that box A was chosen prior to any new knowledge or evidence. This we know is 50% which is the same as P(H1).

The next piece is to understand P(E | H0). This is the probability that you would see the "evidence"

__given__that the hypothesis H0 is true. In this case its relatively easy to estimate this as 2/3. This also means that P(E | H1) is 1/3. Plugging all these into the equation above gives,

The intuition behind this is also easy to follow. As the person revealed a white ball it is more likely it came from box A than B. To further reinforce that intuition, think what would you have concluded if box A had a 100 white coins.

The above is a relatively simple exercise in Bayesian inference. While most real world problems can be mapped to this frame work the difficulty comes in

- Realizing there is a Bayesian "trap" hidden somewhere
- If there is, casting it to the above framework

Here is another example of a scenario where Bayesian thinking comes into play often, that of tests for diseases.

Q. Assume there is a disease D, that has a test T. Overall 2% of the population get the disease. If a person actually has the disease, the test is right 90% of the time. If the person does not have the disease, the test could still show as positive 20% of the time. If the test shows positive for a person, what is the probability that the person has the disease.

A. Here, the hypotheses are "No Disease" and "Disease", named H0 and H1 respectively. With no prior evidence we know that P(H0) is 98% and P(H1) = 100% - 98% = 2%. Now, there is new evidence (E) that the test is showing up as a positive. So let us see how each of the parts would fit in.

We want to estimate P(H1|E), we know P(H1) & P(H0). Additionally we know P(E|H1) = 90% and thus P(E | H0) = 10%. Simply plug them all in again.

Notice, that even though the test is right 90% of the time the person actually has just a 15.51% chance of having it given the test proved positive. The intuition here is that it is a rare disease and it would take a very accurate test to confirm it.

All is fine in such scenarios where the numbers are nicely given to us. The Bayesian angle is becomes elusive when it is not put forth cleanly. The next example demonstrates that.

Q. A man has two children. One of them is a boy, what is the probability that the other is a girl?

A. You might be tempted to say 50%. You be wrong! Here is why. Your evidence (E) here is "one of the children is a boy". The hypothesis you want the probability for is H0 = "Other child is a girl". This makes H1 = "Other child is a boy". The values for P(H0) = P(H1) = 1/2. We want to estimate P(H0 | E). To estimate this, notice P(E | H1) = 1/4, as there is exactly one way this is possible. Now we are all set, simply plug it into the formula (again!)

$$P(H_{0}|E) = \frac{\frac{1}{2}\times \frac{1}{2}}{\frac{1}{2}\times \frac{1}{2} + \frac{1}{4}\times \frac{1}{2}} = \frac{2}{3}$$

This is one example where it is likely easier to visualize the problem. In the diagram below, the left hand side shows the situation without any information, and the right hand side shows the information provided and how it ends up encapsulating the relevant cases. It is easier to see why the probability is 66% from this figure.

Clearly not all fit the "formula" framework. Some of these problems can solved more easily by using the conventional counting method. Perhaps the most startling of Bayesian puzzles to hit the web is the Tuesday Birthday problem. It is a very subtle variant of the boy/girl problem mentioned above, but with a startling result. The problem is "A man has two children. One of them is a boy born on a Tuesday. What is the probability that the other child is a boy?". The link I mention above (and other sources on the web) describe the solution and I'll try to describe it in my own words here.

If the first child is a boy born on a Tuesday, then the second child can be either Boy/Girl and could be born on any of the 7 days. This yields 14 cases (7 x 2). If the second child is a boy born on a Tuesday, then, just as the previous argument, the first child can be a Boy/Girl born on any of the 7 days yielding 7 x 2 = 14 cases. However, both sets have a case of a Boy-Boy. The total 14 + 14 = 28 double counts this case. So in reality we have 28 - 1 = 27 cases. Next of these 27 cases, we need to know how many have two boys in them. We can apply the same logic. If the first child is a boy born on a Tuesday, the second boy child can be born on any of the 7 days giving 7 cases. Same logic applies if the second child is a boy born on a Tuesday, but like before we need to subtract one because the case of boy-boy is counted twice. This gives a total of 13 cases where there are two boys. So the required probability is 13/27.

If you are creative you can extend this to make your own tricky problems. What happens if you change day of week to month of year? If you follow the train of thought above, you will arrive at 23/47, which is slightly greater than 13/27.

If you are looking to buy some books in probability here are some of the best books to learn the art of Probability here are some great books to own

Fifty Challenging Problems in Probability with Solutions (Dover Books on Mathematics)

This book is a great compilation that covers quite a bit of puzzles. What I like about these puzzles are that they are all tractable and don't require too much advanced mathematics to solve.

Introduction to Algorithms

This is a book on algorithms, some of them are probabilistic. But the book is a must have for students, job candidates even full time engineers & data scientists

Introduction to Probability Theory

An Introduction to Probability Theory and Its Applications, Vol. 1, 3rd Edition

The Probability Tutoring Book: An Intuitive Course for Engineers and Scientists (and Everyone Else!)

Introduction to Probability, 2nd Edition

The Mathematics of Poker

Good read. Overall Poker/Blackjack type card games are a good way to get introduced to probability theory

Let There Be Range!: Crushing SSNL/MSNL No-Limit Hold'em Games

Easily the most expensive book out there. So if the item above piques your interest and you want to go pro, go for it.

Quantum Poker

Well written and easy to read mathematics. For the Poker beginner.

Bundle of Algorithms in Java, Third Edition, Parts 1-5: Fundamentals, Data Structures, Sorting, Searching, and Graph Algorithms (3rd Edition) (Pts. 1-5)

An excellent resource (students/engineers/entrepreneurs) if you are looking for some code that you can take and implement directly on the job.

Understanding Probability: Chance Rules in Everyday Life A bit pricy when compared to the first one, but I like the look and feel of the text used. It is simple to read and understand which is vital especially if you are trying to get into the subject

Data Mining: Practical Machine Learning Tools and Techniques, Third Edition (The Morgan Kaufmann Series in Data Management Systems) This one is a must have if you want to learn machine learning. The book is beautifully written and ideal for the engineer/student who doesn't want to get too much into the details of a machine learned approach but wants a working knowledge of it. There are some great examples and test data in the text book too.

Discovering Statistics Using R

This is a good book if you are new to statistics & probability while simultaneously getting started with a programming language. The book supports R and is written in a casual humorous way making it an easy read. Great for beginners. Some of the data on the companion website could be missing.

On second child problem - (1/4) / (1/4 + 1/8) = 2/3, not 1/3.

ReplyDeleteSame is obvious from the picture - 'second child is girl' happens on two boxes out of three.

Typo, corrected. Thanks for pointing out

ReplyDeleteAnother typo... In the disease/test question, P(E | H0) should be 20% instead of 10% (of course, you can change the false positive precentage from 20% to 10% in the question definition).

ReplyDeleteIn the second child problem, the answer would be 1/3 IF the initial condition was "The FIRST child is a boy" or indeed "The SECOND child is a boy".

ReplyDeleteIn the disease example I find that the way to visualise the problem is to take an example with a sample: say, of 1000 people, 20 will have the disease, of which the test will find 90% ie 18, and 980 will not have the disease, of which the test will falsely identify 20%, ie 196. So 18 of the (18 + 196 =) 214 who test positive actually have the disease = 1/12 = 8.33%