Friday, November 28, 2014

Bayesian Cognitive Bias

I chanced on this excellent puzzle on the net that tends to reveal a cognitive bias in our heads against Bayesian reasoning. The puzzle statement is quite simple

You are given four cards. Each card has a letter on one side and number on the other side. You are told the statement "If there is a D on one side, there is a 5 on the other side". Which two cards would you flip over to validate the statement?

The original article is here, think hard before you click through for an answer :)

Tuesday, September 2, 2014

Maximizing Chances in an Unfair Game

submit to reddit

Q: You are about to play a game wherein you flip a biased coin. The coin falls heads with probability \(p\) and tails with \(1 - p\) where \(p \le \frac{1}{2}\). You are forced to play by selecting heads so the game is biased against you. For every toss you make, your opponent gets to toss too. The winner of this game is the one who wins the toss the most. You, however get to choose the number of rounds that get played. Can you ever hope to win?

Machine Learning: The Art and Science of Algorithms that Make Sense of Data

A: At a first look, it might appear that the odds are stacked against you as you are forced to play by choosing heads. You would think that your chances or winning decrease as you play more and more. But, surprisingly there is a way to choose the optimal number of tosses (remember, you get to choose the number of times this game is played). To see how, lets crank out some numbers. If you get to toss the coin \(n\) times, then the total number of coin tosses you and your opponent flips is \(2n\).  Out of the \(2n\) tosses if \(y\) turns out heads, the probability that you would win is
P(\text{y Wins}) = {2n  \choose y} p^{y}(1 - p)^{2n - y}
In order to win, the value of \(y\) should run from \(n + 1\) to \(2n\) and the overall probability works out to
P(\text{Win}) = \sum_{y = n + 1}^{2n}{2n  \choose y} p^{y}(1 - p)^{2n - y}
We can work out the probability of winning by choosing various values of \(p\) and \(n\) and chart them out. Here is the R code that does it.

The code runs pretty quickly and uses the data.table package. All the processed data is contained in variables z and z1. They are plotted using the ggplot package to generate the following charts for the strategy.

The first chart shows the variation of the probability of winning by the number of games played for various probability bias values.

The next chart shows the optimal number of games to play for a given bias probability value.

Some good books to own for learning probability is listed here
Yet another fascinating area of probability are Monte Carlo methods. Here are a list of good books to own to learn Monte Carlo methods.

Tuesday, August 12, 2014

The Best Books for Monte Carlo Methods

The following are some of the best books to own to learn Monte Carlo methods for sampling and estimation problems

Monte Carlo Methods in Statistics (Springer)
This is a good book which discusses both Bayesian methods from a practical point of view as well as theoretical point of view with integrations. The explanations given are also fairly comprehensive. There are also a fair amount of examples in this text. Overall, this is an excellent book to own if you want to understand Monte Carlo sampling methods and algorithms at an intermediate to graduate level.

Explorations in Monte Carlo Methods (Undergraduate Texts in Mathematics)
This is good book to own to get you started on Monte Carlo methods. It starts with fairly simple and basic examples and illustrations. The mathematics used is also fairly basic. Buy this if you are at an undergraduate level and want to get into using Monte Carlo methods but have only a basic knowledge of statistics and probability.

Monte Carlo Methods in Financial Engineering (Stochastic Modelling and Applied Probability) (v. 53)
Another excellent book to own if you are curious to learn about the methods of Monte Carlo in the finance industry. Some really nice areas that are covered in the book include variance reduction techniques, diffusion equations, change point detections, Option pricing methods etc. Ideal for students of financial engineering or ones wanting to break into it. The book tends to overtly rate MC methods (well its a book on MC!).

Monte Carlo Simulation and Resampling Methods for Social Science
This book gives a good introduction and goes over some basic probability theory, statistics and distributions before it hops on to the Monte Carlo methods. This makes it a good introductory book for sampling methods. Recommended for undergraduates with minimal statistical background.

Simulation and Monte Carlo Method

An excellent book to own at the intermediate to graduate level. The text provides a good course in simulation and Monte Carlo methods. Some interesting topics covered in the text include rare-event simulation. The book assumes you have a background in statistics and probability theory.

Sunday, July 6, 2014

Embarrassing Questions, German Tanks and Estimations

submit to reddit

Q: You are conducting a survey and want to ask an embarrassing yes/no question to subjects. The subjects wouldn't answer that embarrassing question honestly unless they are guaranteed complete anonymity. How would you conduct the survey?

Machine Learning: The Art and Science of Algorithms that Make Sense of Data

A: One way to do this is to assign a fair coin to the subject and ask them to toss it in private. If it came out heads then answer the question truthfully else toss the coin a second time and record the result (heads = yes, tails = no). With some simple algebra you can estimate the proportion of users who have answered the question with a yes.

Assume total population surveyed is \(X\). Let \(Y\) subjects have answered with a "yes". Let \(p\) be the sort after proportion. The tree diagram below shows the user flow.

The total expected number of "yes" responses can be estimated as
\frac{pX}{2} + \frac{X}{4} = Y
which on simplification yields
p = \big(\frac{4Y}{X} - 1\big)\frac{1}{2}

Best Books on Probability

Q: A bag contains unknown number of tiles numbered in serial order \(1,2,3,...,n\). You draw \(k\) tiles from the bag without replacement and find the maximum number etched on them to be \(m\). What is your estimate of the number of tiles in the bag?

A: This and problems like these are called German War Tank problems. During WW-II German tanks were numbered in sequential order when they were manufactured. Allied forces needed an estimate of how many tanks were deployed and they had a handful of captured tanks and their serial numbers painted on them. Using this, statisticians estimated the actual tanks to be far lower than what intelligence estimates had them believe. So how does it work?

Let us assume we draw a sample of size \(k\). The maximum in that sample is \(m\). If we estimate the maximum of the population to be \(m\) then probability of the sample maximum to be \(m\) is
P(\text{Sample Max} = m) = \frac{m-1 \choose k-1}{N \choose k}
The \(-1\) figures because the maximum is already taken out of the sample leaving behind \(m - 1\) to choose \(k -1 \) from. The expected value of the maximum using this strategy is thus
E(\text{Maximum}) = \sum_{m=k}^{m=N}m\frac{m-1 \choose k-1}{N \choose k}
Note, we run the above summation from \(k\) to \(N\) as for \(m < k\) the expectation is \(0\) because the sample maximum has to be at least \(k\). After a series of algebraic manipulations ( ref ) the above simplifies to
E(\text{Maximum}) = M\big( 1 + \frac{1}{k}\big) - 1
which is quite an ingenious and simple way to estimate population size given serial number ordering.

If you are looking to buy some books on probability theory here is a good list.

Sunday, June 1, 2014

The Chakravala Algorithm in R

submit to reddit

A class of analysis that has piqued the interest of mathematicians across millennia are Diophantine equations. Diophantine equations are polynomials with multiple variables and seek integer solutions. A special case of Diophantine equations is the Pell's equation. The name is a bit of a misnomer as Euler mistakenly attributed it to the mathematician John Pell. The problem seeks integer solutions to the polynomial
x^{2} - Dy^{2} = 1
Several ancient mathematicians have attempted to study and find generic solutions to Pell's equation. The best known algorithm is the Chakravala algorithm discovered by Bhaskara circa 1114 AD. Bhaskara implicitly credits Brahmagupta (circa 598 AD) for it initial discovery, though some credit it to Jayadeva too. Several Sanskrit words used to describe the algorithm appear to have changed in the 500 years between the two implying other contributors. The Chakravala technique is simple and implementing it in any programming language should be a breeze (credit citation)

Diophantine Equations (Pure & Applied Mathematics)

The method works as follows. Find a trivial solution to the equation. \(x=1,y=0\) can be used all the time. Next, initialize two parameters \([p_i,k_i]\) where \(i\) is an iteration count. \(p_i\) is updated to \(p_{i+1}\) if the following two criteria are satisfied.
  • \(p_i + p_{i+1} mod  k_{i} = 0\) i.e. \(p_i + p_{i+1}\) is divisible by \(k_i\)
  • \(| p_{i+1} - d |^{2}\) is minimized
After updating \(p_{i+1}\) \(k_{i+1}\) is found by evaluating
k_{i+1} = \frac{p_{i+1}^{2} - d}{k_{i}}
and the next pair of values for \([x,y]\) is computed as
x_{i+1} = \frac{p_{i+1}x_i + dy_{i}}{|k_{i}|}
y_{i+1} = \frac{p_{i+1}y_i + x_{i}}{|k_{i}|}
The algorithm also has an easy way to check if the found solution is a solution. It does so by only accepting values where \(k_{i} = 1\).

A screen grab of the entire algorithm done in R is shown below.

A Related Puzzle:
A drawer has \(x\) black socks and \(y\) white socks. You draw two socks consecutively and they are both black. You repeat this several times (by replacing the socks) and find that you get a pair of blacks with probability \(\frac{1}{2}\). You know that there are no more than 30 socks in the draw in total. How many black and white socks are there?

The probability that you would draw two black socks in a row is
P = \frac{x}{x+y}\times\frac{x - 1}{x+y - 1} = \frac{1}{2}
Simplifying and solving for \(x\) yields
x^{2} - (2y + 1)x + y - y^2 = 0
which on further simplification gives
x = \frac{2y + 1 \pm \sqrt{(2y+1)^2 +4(y^2 - y)}}{2}
We can ignore the root with the negative sign as it would yield a negative value for \(x\) which is impossible. The positive root of the quadratic equation yields
x = \frac{2y + 1 + \sqrt{8y^2 + 1}}{2}
For \(x\) to be an integer, the term \(\sqrt{8y^2 + 1}\) has to be an odd integer number \(z\) (say). We can now write it out as
z = \sqrt{8y^2 + 1}
z^{2} - 8y^2 = 1
This is Pell's equation (or Vargaprakriti in Sanskrit).
As we know that there are no more than 30 socks in the draw, we can quickly work our way to two admissible solutions to the problem \(\{3,1\}, \{15,6\}\).
If you are looking to buy books on probability theory here is a good list to own.
If you are looking to buy books on time series analysis here is an excellent list to own.

Wednesday, May 7, 2014

Hopping Robots and Reinforcement Learning

submit to reddit

All too often, when we deal with data the outcome needed is a strategy or an algorithm itself. To arrive at that strategy we may have historic data or some model on how entities in system respond to various situations. In this write up, I'll go over the method of reinforcement learning. The general idea behind reinforcement learning is to come up with a strategy to maximize some measurable goal. For example, if you are modelling a robot that learns to navigate around obstacles, you want the learning process to come back with a strategy that minimizes collisions (say) with other entities in the environment.

Pattern Recognition and Machine Learning (Information Science and Statistics)

For the sake of simplicity, lets assume the following scenario. A robot is placed (at random) on flat plank of wood which has some sticky glue in the center. To its left there is a hole which damages the robot a bit and to its right is a reward which is its destination, as shown in the figure below

The robot has the following capabilities
  • It can sense what state \({S_1,S_2,...,S_7}\) it is in
  • It can detect a reward or damage that happens to it while in a particular state
  • It can move one space left or right
  • It can hop a space on to the right
If the robot ends up in either of the terminal states of \(S_1\) or \(S_7\) it is taken and placed in another state at random. State \(S_1\) does damage to the robot while state \(S_7\) rewards it. \(S_4\) is not exactly a damage causing state but it is an avoidable state and we want to learn that over time. All other states are neither harmful nor rewarding. The robot's goal is to "learn" to get to state \(S_7\) in as few steps as possible.

In order to get reinforcement learning to work, you need to know what the reward values are for each of the states the robot can be in. In this particular example, we will assume the following reward structure for each of the states.
Note, the numbers are fairly arbitrary. In addition to this we need a function or a table, mapping out actions/states pairs leading to new states. Given the robot's movement description above, we can use a table as follows
Zero is being used to designate the scenario when the robot is reset to a random state. With the above two sets of information we are good to start the learning process.

Reinforcement learning works in the following way. You maintain a matrix of values for each state/action pair. This is the reward that can be attained by arriving at a particular state by taking a particular action. But the trick is to also account for what possible future state you can get to, given that you arrive at a given state. For example it may be beneficial to be in a certain state \(X\), but all downstream states from \(X\) may be bad to be in. You want to avoid such states. This way, the algorithm tries to ensure that future favourable and achievable states are taken into account. The algorithm does not immediately update itself based on whatever it learns, it preserves old values and learns gradually. The above methodology can be stated as follows.

If \(Q^{*}(s_t,a_t)\) represents the pay off received by being in state then

Q^{*}(s_t,a_t) = R(s_{t+1},s_t) + \alpha max_{a_t}Q^{*}(s_{t+1},a_{t+1})

To slow down learning a bit, we stick with whatever prior estimate of \(Q^{*}(s_t,a_t)\) we have by a fraction of \(\beta\) as shown below

Q^{*}_{new}(s_t,a_t) = \beta Q^{*}_{prior}(s_t,a_t) + (1 - \beta)Q^{*}(s_t,a_t)
That's it! We now let the system take on various initial states, and let the device play around moving over to different states while we constantly update our \(Q\) matrix. After several iterations, the \(Q\) matrix will end with some values which reflect what strategy to take.

To give a swirl, here is an R code that walks through the entire process for this particular example

A thing to note in the code is how the reward function is encoded. There is a penalty imposed on moving from a higher state to a lower state. This is the simple way to ensure that whatever strategy it comes up with does not involve  going backwards from rightward positional gains that have been made. If you try it without the penalty, you will see cases where the strategy does not care going left or right in some states. The above code when run, creates the following output (screen grab below)

The rows represent the states, the columns represent the three possible actions move-left, move-right and hop-right that the robot can take. Notice what it's trying to say:
  • State 1, do nothing 
  • State 2, move right, but don't hop
  • State 3, don't move right, hop
  • State 4, move right or hop
  • State 5, move right (small chance), hop (definitely)
  • State 6, move right, move left (small chance)
This is exactly what you would expect and want! More on this subject can be read from the following books
Reinforcement Learning: State-of-the-Art (Adaptation, Learning, and Optimization)
Reinforcement Learning: An Introduction (Adaptive Computation and Machine Learning)
If you are looking to buy some books on probability theory here is a good list to own.

Thursday, April 17, 2014

The Two Strategies

submit to reddit
Q: You are in a game where you get to toss a pair of coins once. There are two boxes (A & B) holding a pair each. Box A's coins are fair however B's coins are biased with probability of heads being \(0.6\) and \(0.4\) respectively. You are paid for the expected number of heads you will win. Which of the boxes should you pick?

Machine Learning: The Art and Science of Algorithms that Make Sense of Data

A: The expected number of heads if you chose box A is easy to calculate as
E(\text{heads}| A) = \frac{1}{2} + \frac{1}{2} = 1
However the expected number of heads if you chose box B is also the same
E(\text{heads}| B) = \frac{4}{10} + \frac{6}{10} = 1
The average yield being the same could make one think that both boxes yield the same. However there is one difference, its the variance. The variance of a distribution of a random variable \(X\) is defined as
Var(X) = \sum_{i=0}^{N} (x_i - \bar{x})^{2}p_i
where \(p_i\) is the probability of \(x_i\). Given this, lets compute the variance of each strategy
Var(X | \text{Box=A}) = (\frac{1}{2} - 1)^2 \frac{1}{2} + (\frac{1}{2} - 1)^2 \frac{1}{2} = \frac{1}{4} = 0.25 \\
Var(X | \text{Box=B}) = (\frac{2}{5} - 1)^2 \frac{2}{5} + (\frac{3}{5} - 1)^2 \frac{3}{5} = \frac{6}{25} = 0.24
The variance for box B is slightly tighter than box A which makes choosing the coins in box B a better strategy.

If you are looking to buy some books on probability theory here is a good list.