Q: You are told that a certain area in a forest has lions, tigers and bears. You tour that area and observe 5 tigers, 2 lions and 1 bear. What is your estimate on the distribution of these animals?

Fifty Challenging Problems in Probability with Solutions (Dover Books on Mathematics)

A: This is a good example to demonstrate the multinomial distributions, it's application and also introduce the concept of "conjugate priors". Lets deal with them one at a time and then revisit the problem.

The Conjugate Prior:

This concept is part of Bayesian analysis. Lets assume you have a prior which belongs to a "family" of functions say \( y = f(\theta,x) \). You get some additional information and you update your estimate such that the posterior belongs to the same family of functions. So the prior and posterior differ only in the parameters that go into the function and not in its form and structure. An example of conjugate priors is the Gaussian distribution.

The Multinomial Distribution:

This is an extension of the binomial distribution. Assume you have a bag with black, blue & green balls. The probability of drawing each one of them is \({p_{black},p_{blue},p_{green}}\) and you pull 10 balls. What is the probability of seeing 2 black balls, 6 blue balls and 2 green balls? This is given by the following formula (which can be derived)

$$ P(2Black,6Blue,2Green) = \frac{10!}{2!6!2!} p_{black}^{2}\times p_{blue}^{6}\times p_{green}^{2}$$

The Dirichlet Distrubution:

The probability density of this distribution has the following form and requires a set of parameters as input. These parameters are characterized by a vector \( \{\alpha_1,\alpha_2,\alpha_3,\ldots\}\) which we shall represent as \(\boldsymbol\alpha\). So, if we have a set of probability measures \(\{p_{black},p_{blue},p_{green}\}\), a Dirichlet distribution is

$$f(p_{black},p_{blue},p_{green};\alpha_1,\alpha_2,\alpha_3) = \frac{1}{Z(\boldsymbol \alpha)}p_{black}^{\alpha_1 - 1}p_{blue}^{\alpha_1 - 1}p_{green}^{\alpha_3 - 1}$$

where \( Z(\boldsymbol \alpha)\) is given by a more daunting form in terms of the gamma function, but we will not go into its details any further.

$$ Z(\boldsymbol \alpha) = \frac{\prod_{i=1}^{3}\Gamma({\alpha_i})}{\Gamma(\sum_{i}^{3}\alpha_i)}$$

And now for the kicker... the Dirichlet distribution is the conjugate prior of the multinomial distribution. This can be proved algebraically. Here is a wikipedia link describing the same. Notice, the form and structure of the two equations are one and the same and the parameters \(\boldsymbol \alpha\) can be seen as "prior" counts of categories we have seen. This is critical to understand. What we are concluding is not that we have figured out a good way to find priors, but instead if we knew some counts from earlier on the way to include them into the model is to simply treat them as prior counts!

Now, coming back to the problem. We observe 5 Lions, 2 Tigers and 1 Bear. Note, this is our first observation and we have no prior counts whatsoever. So how do we cast this to a Bayesian framework? We exploit one little piece of information that could have gotten overlooked. We implicitly know that there are Lions, Tigers & Bears in the forest. So there must be at least one observed by someone at sometime. Can we leverage this information? Absolutely! We will simply set our \(\boldsymbol \alpha\) parameter to \(\{1,1,1\}\). A frequentist approach would estimate the distribution of lions, tigers and bears as

$$P(Lion) = \frac{5}{5 + 2 + 1}= \frac{5}{8}\\ P(Tiger) = \frac{2}{5 + 2 + 1} = \frac{2}{8}\\ P(Bear) = \frac{1}{5 + 2 + 1}=\frac{1}{8}$$

and the Bayesian approach would estimate it in the following way

$$P(Lion) = \frac{5 + \color{red}{1}}{5 + 2 + 1 + \color{red}{1+1+1}}= \frac{6}{11}\\ P(Tiger) = \frac{2 + \color{red}{1}}{5 + 2 + 1 + \color{red}{1+1+1}} = \frac{3}{11}\\ P(Bear) = \frac{1 + \color{red}{1}}{5 + 2 + 1 + \color{red}{1+1+1}}=\frac{2}{11}$$

Notice what happened? Our estimates will now tend to be a bit smoother if we were to do this experiment several times, the \(\boldsymbol \alpha\) would smooth out the estimates. How about our choice of values for \(\boldsymbol \alpha\)? We chose 1s here based on the wording of the problem. But if we want more dampening we could as well choose bigger values. This is related to the amount of confidence we have on the priors. If we chose 100s instead of 1s, it would take a lot of observations to move the estimates.

Finally, let us try out a simulation. The R code below attempts to simulate this very situation. The code does the following:

The RMSE is simple and quite well known, you can look it up here for more information. The lower it is, the better the estimate. The output of the above R code is a graph that charts out how close we got to the real estimate as we increased the number of iterations in the simulations. Initially the lines may be a little close to each other but for a large number of iterations, you can clearly see the Bayes estimate winning hands down. This is a powerful method with wide ranging applications and provides a good degree of robustness.

Some good books on probability worth buying

40 Puzzles and Problems in Probability and Mathematical Statistics (Problem Books in Mathematics)

A new entrant and seems promising

Fifty Challenging Problems in Probability with Solutions (Dover Books on Mathematics)

This book is a great compilation that covers quite a bit of puzzles. What I like about these puzzles are that they are all tractable and don't require too much advanced mathematics to solve.

Introduction to Algorithms

This is a book on algorithms, some of them are probabilistic. But the book is a must have for students, job candidates even full time engineers & data scientists

Introduction to Probability Theory

An Introduction to Probability Theory and Its Applications, Vol. 1, 3rd Edition

The Probability Tutoring Book: An Intuitive Course for Engineers and Scientists (and Everyone Else!)

Introduction to Probability, 2nd Edition

The Mathematics of Poker

Good read. Overall Poker/Blackjack type card games are a good way to get introduced to probability theory

Let There Be Range!: Crushing SSNL/MSNL No-Limit Hold'em Games

Easily the most expensive book out there. So if the item above piques your interest and you want to go pro, go for it.

Quantum Poker

Well written and easy to read mathematics. For the Poker beginner.

Bundle of Algorithms in Java, Third Edition, Parts 1-5: Fundamentals, Data Structures, Sorting, Searching, and Graph Algorithms (3rd Edition) (Pts. 1-5)

An excellent resource (students/engineers/entrepreneurs) if you are looking for some code that you can take and implement directly on the job.

Understanding Probability: Chance Rules in Everyday Life A bit pricy when compared to the first one, but I like the look and feel of the text used. It is simple to read and understand which is vital especially if you are trying to get into the subject

Data Mining: Practical Machine Learning Tools and Techniques, Third Edition (The Morgan Kaufmann Series in Data Management Systems) This one is a must have if you want to learn machine learning. The book is beautifully written and ideal for the engineer/student who doesn't want to get too much into the details of a machine learned approach but wants a working knowledge of it. There are some great examples and test data in the text book too.

Discovering Statistics Using R

This is a good book if you are new to statistics & probability while simultaneously getting started with a programming language. The book supports R and is written in a casual humorous way making it an easy read. Great for beginners. Some of the data on the companion website could be missing.

Fifty Challenging Problems in Probability with Solutions (Dover Books on Mathematics)

A: This is a good example to demonstrate the multinomial distributions, it's application and also introduce the concept of "conjugate priors". Lets deal with them one at a time and then revisit the problem.

The Conjugate Prior:

This concept is part of Bayesian analysis. Lets assume you have a prior which belongs to a "family" of functions say \( y = f(\theta,x) \). You get some additional information and you update your estimate such that the posterior belongs to the same family of functions. So the prior and posterior differ only in the parameters that go into the function and not in its form and structure. An example of conjugate priors is the Gaussian distribution.

The Multinomial Distribution:

This is an extension of the binomial distribution. Assume you have a bag with black, blue & green balls. The probability of drawing each one of them is \({p_{black},p_{blue},p_{green}}\) and you pull 10 balls. What is the probability of seeing 2 black balls, 6 blue balls and 2 green balls? This is given by the following formula (which can be derived)

$$ P(2Black,6Blue,2Green) = \frac{10!}{2!6!2!} p_{black}^{2}\times p_{blue}^{6}\times p_{green}^{2}$$

The Dirichlet Distrubution:

The probability density of this distribution has the following form and requires a set of parameters as input. These parameters are characterized by a vector \( \{\alpha_1,\alpha_2,\alpha_3,\ldots\}\) which we shall represent as \(\boldsymbol\alpha\). So, if we have a set of probability measures \(\{p_{black},p_{blue},p_{green}\}\), a Dirichlet distribution is

$$f(p_{black},p_{blue},p_{green};\alpha_1,\alpha_2,\alpha_3) = \frac{1}{Z(\boldsymbol \alpha)}p_{black}^{\alpha_1 - 1}p_{blue}^{\alpha_1 - 1}p_{green}^{\alpha_3 - 1}$$

where \( Z(\boldsymbol \alpha)\) is given by a more daunting form in terms of the gamma function, but we will not go into its details any further.

$$ Z(\boldsymbol \alpha) = \frac{\prod_{i=1}^{3}\Gamma({\alpha_i})}{\Gamma(\sum_{i}^{3}\alpha_i)}$$

And now for the kicker... the Dirichlet distribution is the conjugate prior of the multinomial distribution. This can be proved algebraically. Here is a wikipedia link describing the same. Notice, the form and structure of the two equations are one and the same and the parameters \(\boldsymbol \alpha\) can be seen as "prior" counts of categories we have seen. This is critical to understand. What we are concluding is not that we have figured out a good way to find priors, but instead if we knew some counts from earlier on the way to include them into the model is to simply treat them as prior counts!

Now, coming back to the problem. We observe 5 Lions, 2 Tigers and 1 Bear. Note, this is our first observation and we have no prior counts whatsoever. So how do we cast this to a Bayesian framework? We exploit one little piece of information that could have gotten overlooked. We implicitly know that there are Lions, Tigers & Bears in the forest. So there must be at least one observed by someone at sometime. Can we leverage this information? Absolutely! We will simply set our \(\boldsymbol \alpha\) parameter to \(\{1,1,1\}\). A frequentist approach would estimate the distribution of lions, tigers and bears as

$$P(Lion) = \frac{5}{5 + 2 + 1}= \frac{5}{8}\\ P(Tiger) = \frac{2}{5 + 2 + 1} = \frac{2}{8}\\ P(Bear) = \frac{1}{5 + 2 + 1}=\frac{1}{8}$$

and the Bayesian approach would estimate it in the following way

$$P(Lion) = \frac{5 + \color{red}{1}}{5 + 2 + 1 + \color{red}{1+1+1}}= \frac{6}{11}\\ P(Tiger) = \frac{2 + \color{red}{1}}{5 + 2 + 1 + \color{red}{1+1+1}} = \frac{3}{11}\\ P(Bear) = \frac{1 + \color{red}{1}}{5 + 2 + 1 + \color{red}{1+1+1}}=\frac{2}{11}$$

Notice what happened? Our estimates will now tend to be a bit smoother if we were to do this experiment several times, the \(\boldsymbol \alpha\) would smooth out the estimates. How about our choice of values for \(\boldsymbol \alpha\)? We chose 1s here based on the wording of the problem. But if we want more dampening we could as well choose bigger values. This is related to the amount of confidence we have on the priors. If we chose 100s instead of 1s, it would take a lot of observations to move the estimates.

Finally, let us try out a simulation. The R code below attempts to simulate this very situation. The code does the following:

- Pick the true number of Lions, Tigers and Bears. This number will change with every iteration.
- Pick a random subset of Lions, Tigers and Bears as the observed. These numbers will always be lesser than the true number of Lions, Tigers and Bears.
- Compute the distribution based on frequentist and Bayesian approaches described above keeping \(\boldsymbol \alpha = \{1,1,1\}\).
- Compute an error statistic.

The RMSE is simple and quite well known, you can look it up here for more information. The lower it is, the better the estimate. The output of the above R code is a graph that charts out how close we got to the real estimate as we increased the number of iterations in the simulations. Initially the lines may be a little close to each other but for a large number of iterations, you can clearly see the Bayes estimate winning hands down. This is a powerful method with wide ranging applications and provides a good degree of robustness.

Some good books on probability worth buying

40 Puzzles and Problems in Probability and Mathematical Statistics (Problem Books in Mathematics)

A new entrant and seems promising

Fifty Challenging Problems in Probability with Solutions (Dover Books on Mathematics)

This book is a great compilation that covers quite a bit of puzzles. What I like about these puzzles are that they are all tractable and don't require too much advanced mathematics to solve.

Introduction to Algorithms

This is a book on algorithms, some of them are probabilistic. But the book is a must have for students, job candidates even full time engineers & data scientists

Introduction to Probability Theory

An Introduction to Probability Theory and Its Applications, Vol. 1, 3rd Edition

The Probability Tutoring Book: An Intuitive Course for Engineers and Scientists (and Everyone Else!)

Introduction to Probability, 2nd Edition

The Mathematics of Poker

Good read. Overall Poker/Blackjack type card games are a good way to get introduced to probability theory

Let There Be Range!: Crushing SSNL/MSNL No-Limit Hold'em Games

Easily the most expensive book out there. So if the item above piques your interest and you want to go pro, go for it.

Quantum Poker

Well written and easy to read mathematics. For the Poker beginner.

Bundle of Algorithms in Java, Third Edition, Parts 1-5: Fundamentals, Data Structures, Sorting, Searching, and Graph Algorithms (3rd Edition) (Pts. 1-5)

An excellent resource (students/engineers/entrepreneurs) if you are looking for some code that you can take and implement directly on the job.

Understanding Probability: Chance Rules in Everyday Life A bit pricy when compared to the first one, but I like the look and feel of the text used. It is simple to read and understand which is vital especially if you are trying to get into the subject

Data Mining: Practical Machine Learning Tools and Techniques, Third Edition (The Morgan Kaufmann Series in Data Management Systems) This one is a must have if you want to learn machine learning. The book is beautifully written and ideal for the engineer/student who doesn't want to get too much into the details of a machine learned approach but wants a working knowledge of it. There are some great examples and test data in the text book too.

Discovering Statistics Using R

This is a good book if you are new to statistics & probability while simultaneously getting started with a programming language. The book supports R and is written in a casual humorous way making it an easy read. Great for beginners. Some of the data on the companion website could be missing.

## Comments

## Post a Comment