## Sunday, March 2, 2014

### The Lazy Apprentice

Q: A shopkeeper hires an apprentice for his store which gets one customer per minute on average uniformly randomly. The apprentice is expected to leave the shop open until at least 6 minutes have passed when no customer arrives. The shop keeper suspects that the apprentice is lazy and wants to close the shop at a shorter notice. The apprentice claims (and the shopkeeper verifies), that the shop is open for about 2.5hrs on average. How could the shopkeeper back his claim?

Statistics: A good book to learn statistics

A: Per the contract, at least 6 minutes should pass without a single customer showing up before the apprentice can close the shop. To solve this lets tackle a different problem first. Assume you have a biased coin with a probability $$p$$ of landing heads. What is the expected number of tosses before you get $$n$$ heads in a row. The expected number of tosses to get to the first head is simple enough to calculate, its $$\frac{1}{p}$$. How about two heads? We can formulate this recursively. We need to get to a head first. Following this, you need to toss the coin one more time for sure. With a probability $$p$$ you get the second heads or with a probability $$1 - p$$ you have to start over again. The number of tosses to two heads is thus $$\frac{1}{p} + 1 + \frac{1}{p}\times(1-p)$$.

Extending this out to get $$n$$ tosses, if you assume that $$y(n)$$ is the expected number of tosses to get to $$n$$ heads in a row then the following state transition diagram shows how the transitions happen.

From the state $$y_{n-1}$$, with probability $$1- p$$ you start over again. Stated recursively
$$y_{n} = y_{n-1} + 1 + (1-p)y_{n}\\ py_{n} = y_{n-1} + 1$$
Using the above expression, it is easy to derive the general expression for $$y_{n}$$ as follows, keeping in mind $$y_{0} = 0$$
$$y_{1} = \frac{y_{0}}{p} + \frac{1}{p} = \frac{1}{p}\\ y_{2} = \frac{1}{p}(y_{1} + 1) = \frac{1}{p^{2}} + \frac{1}{p} y_{3} = \frac{1}{p}(y_{2} + 1) = \frac{1}{p^{3}} + \frac{1}{p^{2}} + \frac{1}{p}$$
Being a sum of a geometric series, the $$y_{n}$$ can be evaluated to
$$y_{n} = \frac{1}{1-p}\big(\frac{1}{p^{n}} - 1\big)$$
Now, back to the original question. Assume the apprentice waits $$k$$ minutes before no customer shows up and he chooses to shut the shop. The situation is "similar" to the coin tossing and awaiting for a string of heads. (Note: Strictly speaking, it's similar only in the limiting case when the time interval is very small). In this case each "head" signifies the absence of a customer showing up in a minute. As the customers arrive uniformly at random we can assume they follow a Poisson process. The probability that $$m$$ customers arrive in a one minute window if the rate parameter is $$\lambda$$ (in this case $$\lambda = 1min^{-1}$$) is

$$P(m,\lambda) = \frac{(\lambda)^{m}e^{-\lambda}}{m!}$$

When $$m =0$$ we get $$p = e^{-\lambda}$$. Plugging this value back to our equation for $$y_{n}$$ we get
$$y_{n} = \frac{1}{1 - e^{-\lambda}}\big(e^{k\lambda} - 1\big)$$
for small $$\lambda$$ the denominator of the first part of the equation can be approximated as $$\frac{1}{\lambda}$$ yielding
$$y_{n} = \frac{1}{\lambda}\big(e^{k\lambda} - 1\big)$$

Note, if you plug $$k=6$$ into the above equation you get $$\approx 7$$ hours whereas for $$k=5$$ you get $$\approx 2.5$$ hours. Due to the exponential connection between $$y_{n}$$ and $$k$$ the values for $$y_{n}$$ are super sensitive to changes in $$k$$.

If you are looking to buy some books in probability here are some of the best books to learn the art of Probability

## Tuesday, February 11, 2014

### The Best Books for Time Series Analysis

If you are looking to learn time series analysis, the following are some of the best books in time series analysis.

Introductory Time Series with R (Use R!)
This is good book to get one started on time series. A nice aspect of this book is that it has examples in R and some of the data is part of standard R packages which makes good introductory material for learning the R language too. That said this is not exactly a graduate level book, and some of the data links in the book may not be valid.

Econometrics
A great book if you are in an economics stream or want to get into it. The nice thing in the book is it tries to bring out a oneness in all the methods used. Econ majors need to be up-to speed on the grounding mathematics for time series analysis to use this book. Outside of those prerequisites, this is one of the best books on econometrics and time series analysis.

Pattern Recognition and Machine Learning (Information Science and Statistics)
This is excellent book to own for scientists and engineers wanting to use time series methods in machine learning, forecasting and regression. Great charts and fairly readable text.

Time Series: Theory and Methods (Springer Series in Statistics)
A good book which covers a lot of theoretical aspects but with little practical applications covered. It comes with software so it doesn't really support open source alternatives like R/Python. This is all about a rigorous treatment to Time series analysis (as is the case with most Springer Series). Good for graduate students and academics.

Time Series Analysis
An ideal book for graduate students and it is fairly comprehensive. Lots of essential approaches are covered in this text. Typical ones include Bayesian approaches, Spectral Analysis and the newer vector auto regression. Strongly recommended for graduate students. The book does not cover real world data sets and applications in enough detail.

Time Series Analysis and Forecasting by Example (Wiley Series in Probability and Statistics)
A good book to get an introduction to time series analysis. It stresses more on the signal processing aspects too like auto regressive models. A drawback is the book requires software and does not use open source, likewise there aren't answers to the questions posted. All said a good book to own but do not forget the caveats.

## Sunday, February 2, 2014

### Two Quick Puzzles

The following are two puzzles which look tough at first but have quick and really elegant solutions.

Q1: Ants on a wire:
A large number of ants are on a wire of length $$L$$. All ants start moving randomly, either right or left with a fixed velocity $$V$$. If they collide they turn around and move in the opposite direction. Ants at the ends of the wire fall off. What is the time taken for all ants to fall off the wire?

Q2: The Unruly Passenger:
Several passengers are in a queue to board a plane. The first passenger in the queue is an unruly one and chooses a seat at random. Subsequent passengers take their allotted seat if it is unoccupied or pick a seat at random if it is occupied. What is the probability that the last passenger gets to sit on his allotted seat?
Statistics: A good book to learn statistics

A1: This seemingly complex problem has an elegantly simple solution. The fact that they collide and turn around is the same as if they walked through each other! See figure below

Once this is insight sinks in, the average time taken for all ants to fall off the wire can be easily calculated. It is the same as the time an ant takes to move from one end of the wire to the other end. This works out to $$\frac{L}{V}$$.

Effective Java (2nd Edition)

A2: You absolutely do not want to consider the various ways a large number of passengers can fill up an equally large number of seats. Bear in mind that the only unruly passenger is the first one, and what we want to know is the probability that the last passenger gets to sit on his seat. The last passenger will face exactly two scenarios, either he gets his seat or not. He will get his seat if the first passenger picks his allotted seat which happens with a probability $$\frac{1}{2}$$

If you are looking to buy some books in probability here are some of the best books to learn the art of Probability

## Friday, January 24, 2014

### Three Random Numbers

Q: You play a game with a friend where he chooses two random numbers between 0 and 1. Next you choose a random number between 0 and 1. If your number falls between the prior two numbers you win. What is the probability that you would win?

A: Consider the number line between 0 and 1 shown in figure below

Let $$x$$ and $$y$$ be the two numbers chosen. The probability of a win, i.e. choosing a number in the range between the two chosen numbers can be estimated as $$\frac{|y-x|}{1}$$. Note, we have chosen the modulus operation because $$x$$ could be greater than $$y$$ or vice versa. The feasible region of numbers to be chosen to win, is the ratio of the absolute difference between $$x$$ and $$y$$ divided by the total possible range, which is 1. In order to estimate the probability that a third chosen number will lie between the two we integrate out (a double integral) between the ranges of $$[0,1]$$. This is estimated as
$$P(\text{win}) = \int_{0}^{1}\int_{0}^{1}|y - x| dx dy$$
To evaluate the above integral, we split the inner integral as follows
$$P(\text{win}) = \int_{0}^{1}\Big[\int_{0}^{y}(y - x)dx + \int_{y}^{1}(x-y)dx\Big]dy$$
The inner integral evaluates to $$y^2 - y +\frac{1}{2}$$ which on further integration between the ranges of $$[0,1]$$ yields
$$P(\text{win}) = [\frac{y^3}{3} - \frac{y^2}{2} + \frac{y}{2}]_{0}^{1} = \frac{1}{3}$$
which is the sought probability.

If you are looking to buy some books in probability here are some of the best books to learn the art of Probability

Fifty Challenging Problems in Probability with Solutions (Dover Books on Mathematics)
This book is a great compilation that covers quite a bit of puzzles. What I like about these puzzles are that they are all tractable and don't require too much advanced mathematics to solve.

Introduction to Algorithms
This is a book on algorithms, some of them are probabilistic. But the book is a must have for students, job candidates even full time engineers & data scientists

Introduction to Probability Theory
Overall an excellent book to learn probability, well recommended for undergrads and graduate students

An Introduction to Probability Theory and Its Applications, Vol. 1, 3rd Edition
This is a two volume book and the first volume is what will likely interest a beginner because it covers discrete probability. The book tends to treat probability as a theory on its own

The Probability Tutoring Book: An Intuitive Course for Engineers and Scientists (and Everyone Else!)
A good book for graduate level classes: has some practice problems in them which is a good thing. But that doesn't make this book any less of buy for the beginner.

Introduction to Probability, 2nd Edition
A good book to own. Does not require prior knowledge of other areas, but the book is a bit low on worked out examples.

Bundle of Algorithms in Java, Third Edition, Parts 1-5: Fundamentals, Data Structures, Sorting, Searching, and Graph Algorithms (3rd Edition) (Pts. 1-5)
An excellent resource (students, engineers and even entrepreneurs) if you are looking for some code that you can take and implement directly on the job

Understanding Probability: Chance Rules in Everyday Life
This is a great book to own. The second half of the book may require some knowledge of calculus. It appears to be the right mix for someone who wants to learn but doesn't want to be scared with the "lemmas"

Data Mining: Practical Machine Learning Tools and Techniques, Third Edition (The Morgan Kaufmann Series in Data Management Systems)
This one is a must have if you want to learn machine learning. The book is beautifully written and ideal for the engineer/student who doesn't want to get too much into the details of a machine learned approach but wants a working knowledge of it. There are some great examples and test data in the text book too.

This is a good book if you are new to statistics & probability while simultaneously getting started with a programming language. The book supports R and is written in a casual humorous way making it an easy read. Great for beginners. Some of the data on the companion website could be missing.

Covered in this book are the central limit theorem and other graduate topics in probability. You will need to brush up on some mathematics before you dive in but most of that can be done online

This book has been yellow-flagged with some issues: including sequencing of content that could be an issue. But otherwise its good

## Monday, January 6, 2014

### The Three Magical Boxes

Q: You are playing a game wherein you are presented 3 magical boxes. Each box has a set probability of delivering a gold coin when you open it. On a single attempt, you can take the gold coin and close the box. In the next attempt you are free to either open the same box again or pick another box. You have a 100 attempts to open the boxes. You do not know what the win probability is for each of the boxes. What would be a strategy to maximize your returns?

Machine Learning: A Probabilistic Perspective (Adaptive Computation and Machine Learning series)

A: Problems of this type fall into a category of algorithms called "multi armed bandits". The name has its origin in casino slot machines wherein a bandit is trying to maximize his returns by pulling different arms of a slot machine by using several "arms". The dilemma he faces is similar to the game described above. Notice, the problem is a bit different from a typical estimation exercise. You could simply split your 100 attempts into 3 blocks of 33,33 & 34 for each of the boxes. But this would not be optimal. Assume that one of the boxes had just a $$1\%$$ probability of yielding a golden coin. Even as you probe and explore that box you know intuitively that you have spent a fair amount of attempts to simply reinforce something you already knew. You need a strategy that adjusts according to new information that you gain from each attempt. Something that gradually transitions away from a box that yields less to a box that yields more.

Assume at the beginning of the game you do not know anything about the yield probabilities. Assign a prior set of values of $$\big[\frac{1}{2}, \frac{1}{2},\frac{1}{2}\big]$$. Simultaneously maintain a set of likelihoods using which you will decide which box to sample next. Initially all three values are set to 1s $$\{p_1 = 1,p_2 = 1,p_3 = 1\}$$. First open the boxes in succession and use up $$n$$ attempts per box. If you denote the number of successes for each box as $$\{s_1,s_2,s_3\}$$, then you could update the posterior distribution of your belief in what box yields as follows
$$p_1 = \frac{1 + s_1}{2 + n} \\ p_2 = \frac{1 + s_2}{2 + n} \\ p_3 = \frac{1 + s_3}{2 + n}$$
Think of this as your initializing phase. Once you initialize your estimates, subsequent choice of boxes should be based on a re-normalized probability vector derived from $$p_1,p_2,p_3$$. What this means is that the probability you would pick a box is computed as follows
$$P(\text{pick box 1}) = \frac{p_1}{p_1 + p_2 + p_3} \\ P(\text{pick box 2}) = \frac{p_2}{p_1 + p_2 + p_3} \\ P(\text{pick box 3}) = \frac{p_3}{p_1 + p_2 + p_3}$$
What ends up happening here is that you will pick the box which has the highest probability of winning based on information gleaned up to a certain point. Another benefit of this approach is you are learning in real time. If a certain box isn't yielding as much as another you don't discard opening that box all together, instead you progressively sample it less often.

If you are looking to buy some books in probability here are some of the best books to own

Fifty Challenging Problems in Probability with Solutions (Dover Books on Mathematics)
This book is a great compilation that covers quite a bit of puzzles. What I like about these puzzles are that they are all tractable and don't require too much advanced mathematics to solve.

Introduction to Algorithms
This is a book on algorithms, some of them are probabilistic. But the book is a must have for students, job candidates even full time engineers & data scientists

Introduction to Probability Theory
Overall an excellent book to learn probability, well recommended for undergrads and graduate students

An Introduction to Probability Theory and Its Applications, Vol. 1, 3rd Edition
This is a two volume book and the first volume is what will likely interest a beginner because it covers discrete probability. The book tends to treat probability as a theory on its own

The Probability Tutoring Book: An Intuitive Course for Engineers and Scientists (and Everyone Else!)
A good book for graduate level classes: has some practice problems in them which is a good thing. But that doesn't make this book any less of buy for the beginner.

Introduction to Probability, 2nd Edition
A good book to own. Does not require prior knowledge of other areas, but the book is a bit low on worked out examples.

Bundle of Algorithms in Java, Third Edition, Parts 1-5: Fundamentals, Data Structures, Sorting, Searching, and Graph Algorithms (3rd Edition) (Pts. 1-5)
An excellent resource (students, engineers and even entrepreneurs) if you are looking for some code that you can take and implement directly on the job

Understanding Probability: Chance Rules in Everyday Life
This is a great book to own. The second half of the book may require some knowledge of calculus. It appears to be the right mix for someone who wants to learn but doesn't want to be scared with the "lemmas"

Data Mining: Practical Machine Learning Tools and Techniques, Third Edition (The Morgan Kaufmann Series in Data Management Systems)
This one is a must have if you want to learn machine learning. The book is beautifully written and ideal for the engineer/student who doesn't want to get too much into the details of a machine learned approach but wants a working knowledge of it. There are some great examples and test data in the text book too.

This is a good book if you are new to statistics & probability while simultaneously getting started with a programming language. The book supports R and is written in a casual humorous way making it an easy read. Great for beginners. Some of the data on the companion website could be missing.

Covered in this book are the central limit theorem and other graduate topics in probability. You will need to brush up on some mathematics before you dive in but most of that can be done online

This book has been yellow-flagged with some issues: including sequencing of content that could be an issue. But otherwise its good

## Sunday, December 22, 2013

### Santa's Dice Game

Q: Santa offers you to play a game of dice. You get to roll a dice six times. You can stop rolling whenever you wish and you get the dollar amount shown on that roll. What is an optimal strategy to maximize your payoff?

Probability and Statistics (4th Edition)

A: Let us take a moment and think through this. At each point in the sequence of rolls you make, you have a decision to make. Do you keep rolling or do you stop and walk away with what is being "offered" to you? You also need to bear in mind that if you keep pushing your luck you will reach a point (the 6th roll) where you would have to be content with whatever comes out for the last roll. So lets start with the simple case of what the expected payoff is for the last roll. Lets call this $$E_{6}$$. To compute it, simply take the payoff multiplied by the respective probability.
$$E_{6} = \frac{1 + 2 + 3 + 4 + 5 + 6}{6} = \frac{7}{2} = 3.5$$
The general strategy to be followed is to check what the expected pay off is for the next roll and accept nothing less than that. Working backwards, lets consider each of the rolls starting with the 5th roll.

Roll 5:
You know that on 6th roll you get an expected payoff of 3.5. You should accept anything greater than 3.5, which means you stop rolling if you get a 4,5 or 6, put in other words anything greater than 4. The expected payoff is
$$\frac{4 + 5 + 6}{6} + \frac{7}{2}\times\frac{1}{2} = \frac{17}{4}$$
Roll 4:
Looking ahead you find the expected payoff for the 5th roll is $$\frac{17}{4} = 4.25$$. You stop if you get a 5 or greater for this roll. The expected payoff would be
$$\frac{5+6}{6} + \frac{17}{4}\times\frac{4}{6} = \frac{14}{3}$$

Roll 3:
The expected payoff from roll 4 above is $$\frac{14}{3} = 4.66$$ which means you stop if you get a 5 or above. The expected payoff for this roll is
$$\frac{5 + 6}{6} + \frac{14}{3}\times\frac{4}{6} = \frac{89}{18}$$

Roll 2:
The expected payoff from roll 3 above is $$\frac{89}{18}=4.94$$ implying accept and stop if roll gives 5 or 6. The expected payoff for this roll is
$$\frac{5 + 6}{6} + \frac{89}{18}\times\frac{4}{6} = 5.12$$

Roll 1:
Being the first roll, your expected payoff from any subsequent rolls is 5.12. So you should accept nothing short of a 6 from the first roll.

The optimal strategy would then be to stop at each of the first five rolls (when you have the option to stop) if you get 6,5,5,5,4 respectively. An interesting takeaway here is the expected gains from following this strategy can be computed as
$$\frac{1}{6}\times 6 + \frac{5}{6}\times 5.12 = 5.26$$
which is quite high for a seemingly random game!

If you are looking to buy some books in probability here are some of the best books to own

Fifty Challenging Problems in Probability with Solutions (Dover Books on Mathematics)
This book is a great compilation that covers quite a bit of puzzles. What I like about these puzzles are that they are all tractable and don't require too much advanced mathematics to solve.

Introduction to Algorithms
This is a book on algorithms, some of them are probabilistic. But the book is a must have for students, job candidates even full time engineers & data scientists

Introduction to Probability Theory
Overall an excellent book to learn probability, well recommended for undergrads and graduate students

An Introduction to Probability Theory and Its Applications, Vol. 1, 3rd Edition
This is a two volume book and the first volume is what will likely interest a beginner because it covers discrete probability. The book tends to treat probability as a theory on its own

The Probability Tutoring Book: An Intuitive Course for Engineers and Scientists (and Everyone Else!)
A good book for graduate level classes: has some practice problems in them which is a good thing. But that doesn't make this book any less of buy for the beginner.

Introduction to Probability, 2nd Edition
A good book to own. Does not require prior knowledge of other areas, but the book is a bit low on worked out examples.

Bundle of Algorithms in Java, Third Edition, Parts 1-5: Fundamentals, Data Structures, Sorting, Searching, and Graph Algorithms (3rd Edition) (Pts. 1-5)
An excellent resource (students, engineers and even entrepreneurs) if you are looking for some code that you can take and implement directly on the job

Understanding Probability: Chance Rules in Everyday Life
This is a great book to own. The second half of the book may require some knowledge of calculus. It appears to be the right mix for someone who wants to learn but doesn't want to be scared with the "lemmas"

Data Mining: Practical Machine Learning Tools and Techniques, Third Edition (The Morgan Kaufmann Series in Data Management Systems)
This one is a must have if you want to learn machine learning. The book is beautifully written and ideal for the engineer/student who doesn't want to get too much into the details of a machine learned approach but wants a working knowledge of it. There are some great examples and test data in the text book too.

This is a good book if you are new to statistics & probability while simultaneously getting started with a programming language. The book supports R and is written in a casual humorous way making it an easy read. Great for beginners. Some of the data on the companion website could be missing.

Covered in this book are the central limit theorem and other graduate topics in probability. You will need to brush up on some mathematics before you dive in but most of that can be done online

This book has been yellow-flagged with some issues: including sequencing of content that could be an issue. But otherwise its good

## Wednesday, December 11, 2013

### A Box of Apples and Oranges

Q: A box contains 6 apples and 5 oranges. You know the number of apples are 6. You pick them out one at a time. What is the probability that the box will be empty by the time you have all 6 apples out?

A: It is tempting to say that the answer is $$50\%$$, but its not, as explained further below. Let us start with the number of ways to pull all 11 out. Since you know that there are six apples, you would keep drawing until you see all six apples. The six apples can come out in any order. So there are $$\binom{11}{6}$$ ways to do this.The number of favorable cases can be computed by making the following observation. If the box has to be empty when the last apple is drawn, then the finishing of the draws must end with 1, 2, 3, 4, 5 or 6 apples. These are the only scenarios. It can never end in an orange being drawn. Lets first consider the case when the last fruit drawn is an apple and the one prior is an orange. That's two down from the lot with one being an apple and the other being an orange. The number of ways this can happen is $$\binom{9}{5}$$. A similar reasoning applies when the last three fruits drawn are one orange and two apples in a row. The number of ways this can happen is $$\binom{8}{4}$$ and so on. Thus, the probability is the required sum of scenarios divided by the total number of ways
$$P(\text{box is empty}) = \frac{\binom{9}{5} + \binom{8}{4} + \binom{7}{3} + \binom{6}{2} + \binom{5}{1} + 1 }{ \binom{11}{6}}$$
Note, the final one is added to the numerator to factor the case when all six apples are drawn at the very in end in a row. The above simplifies to
$$P(\text{box is empty}) = \frac{252}{462} = 54.54\%$$
If you are looking to buy some books in probability here are some of the best books to learn the art of Probability

Fifty Challenging Problems in Probability with Solutions (Dover Books on Mathematics)
This book is a great compilation that covers quite a bit of puzzles. What I like about these puzzles are that they are all tractable and don't require too much advanced mathematics to solve.

Introduction to Algorithms
This is a book on algorithms, some of them are probabilistic. But the book is a must have for students, job candidates even full time engineers & data scientists

Introduction to Probability Theory
Overall an excellent book to learn probability, well recommended for undergrads and graduate students

An Introduction to Probability Theory and Its Applications, Vol. 1, 3rd Edition
This is a two volume book and the first volume is what will likely interest a beginner because it covers discrete probability. The book tends to treat probability as a theory on its own

The Probability Tutoring Book: An Intuitive Course for Engineers and Scientists (and Everyone Else!)
A good book for graduate level classes: has some practice problems in them which is a good thing. But that doesn't make this book any less of buy for the beginner.

Introduction to Probability, 2nd Edition
A good book to own. Does not require prior knowledge of other areas, but the book is a bit low on worked out examples.

Bundle of Algorithms in Java, Third Edition, Parts 1-5: Fundamentals, Data Structures, Sorting, Searching, and Graph Algorithms (3rd Edition) (Pts. 1-5)
An excellent resource (students, engineers and even entrepreneurs) if you are looking for some code that you can take and implement directly on the job

Understanding Probability: Chance Rules in Everyday Life
This is a great book to own. The second half of the book may require some knowledge of calculus. It appears to be the right mix for someone who wants to learn but doesn't want to be scared with the "lemmas"

Data Mining: Practical Machine Learning Tools and Techniques, Third Edition (The Morgan Kaufmann Series in Data Management Systems)
This one is a must have if you want to learn machine learning. The book is beautifully written and ideal for the engineer/student who doesn't want to get too much into the details of a machine learned approach but wants a working knowledge of it. There are some great examples and test data in the text book too.

This is a good book if you are new to statistics & probability while simultaneously getting started with a programming language. The book supports R and is written in a casual humorous way making it an easy read. Great for beginners. Some of the data on the companion website could be missing.

Covered in this book are the central limit theorem and other graduate topics in probability. You will need to brush up on some mathematics before you dive in but most of that can be done online

This book has been yellow-flagged with some issues: including sequencing of content that could be an issue. But otherwise its good