Thursday, April 17, 2014

The Two Strategies

Q: You are in a game where you get to toss a pair of coins once. There are two boxes (A & B) holding a pair each. Box A's coins are fair however B's coins are biased with probability of heads being $$0.6$$ and $$0.4$$ respectively. You are paid for the expected number of heads you will win. Which of the boxes should you pick?

Machine Learning: The Art and Science of Algorithms that Make Sense of Data

A: The expected number of heads if you chose box A is easy to calculate as
$$E(\text{heads}| A) = \frac{1}{2} + \frac{1}{2} = 1$$
However the expected number of heads if you chose box B is also the same
$$E(\text{heads}| B) = \frac{4}{10} + \frac{6}{10} = 1$$
The average yield being the same could make one think that both boxes yield the same. However there is one difference, its the variance. The variance of a distribution of a random variable $$X$$ is defined as
$$Var(X) = \sum_{i=0}^{N} (x_i - \bar{x})^{2}p_i$$
where $$p_i$$ is the probability of $$x_i$$. Given this, lets compute the variance of each strategy
$$Var(X | \text{Box=A}) = (\frac{1}{2} - 1)^2 \frac{1}{2} + (\frac{1}{2} - 1)^2 \frac{1}{2} = \frac{1}{4} = 0.25 \\ Var(X | \text{Box=B}) = (\frac{2}{5} - 1)^2 \frac{2}{5} + (\frac{3}{5} - 1)^2 \frac{3}{5} = \frac{6}{25} = 0.24$$
The variance for box B is slightly tighter than box A which makes choosing the coins in box B a better strategy.

If you are looking to buy some books on probability theory here is a good list.

Sunday, March 23, 2014

Linear Regression, Transforms and Regularization

This write up is about the simple linear regression and ways to make it robust to outliers and non linearity. The linear regression method is a simple and powerful method. It is powerful because it helps compress a lot of information through a simple straight line. The complexity of the problem is vastly simplified. However being so simple comes with its set of limitations. For example, the method assumes that after a fit is made, the differences between the predicted and actual values are normally distributed. In reality, we rarely run into such ideal conditions. Almost always there is non-normality and outliers in the data that makes fitting a straight line insufficient. However there are some tricks you could do to make it better.

Statistics: A good book to learn statistics

As an example data set consider some dummy data shown in the table/chart below. Notice, value 33 is an outlier. When charted. you can see there is some non-linearity in the data too, for higher values of $$x$$
First lets tackle the non-linearity. The non-linearity can be managed by doing a transformation onthe y-values using the box-cox transform which is a class of power transformation. It is a useful transform to bring about normality in time series values that looks "curvy". The transform looks like
$$\hat{y} = \frac{y^{\lambda} - 1}{\lambda}$$
The value of lambda needs to be chosen optimally that maximizes the log likelihood that would make the time series more like it came from a normal distribution. Most statistical tools out there do it for you. In R, the function "boxcox" in the package "MASS" does it for you. The following code snippet computes the optimal value of $$\lambda$$ as -0.30303

Let's apply this transformation to the data and see how it looks on a chart.

You can see that it looks a lot better and more like a straight line, except for the outlier at $$x = 4$$. The goodness of straight line fit is measured by the fit's $$R^2$$ value. The $$R^2$$ value tries to quantify the amount of variance that can be explained by the fit. If we fitted a straight line through the original data we get an $$R^2 = 0.06804$$, and the transformed data yields an $$R^2 = 0.4636$$ demonstrating an improvement. Next, lets try and manage the outlier. If you try fitting a straight line $$y = \alpha_0 + \alpha_{1} x$$ through a set of points that have a few outliers you will notice that the values of $$\alpha_{0}, \alpha_{1}$$ tend to be slightly large. They end up being slightly large because the fit is trying to "reach out" and accommodate the outlier. In order to minimize the effect of the outlier we get back into the guts of the linear regression. A linear regression typically tries to minimize the overall error $$e$$ as computed as
$$e = \sum_{i=1}^{N}\frac{(y_{actual} - \alpha_{0} - \alpha_{1}x)^2}{N}$$
where $$N$$ is the number of points. We can tweak the above equation to minimize as follows
$$e = \sum_{i=1}^{N}\frac{(y_{actual} - \alpha_{0} - \alpha_{1}x)^2}{N} + \lambda(\alpha_{0}^2 + \alpha_{1}^2)$$
The tweaked error equation forces towards choices of $$\alpha_{0}, \alpha_{1}$$ where they cannot take larger values. The optimal value of $$\lambda$$ needs to be ascertained by tuning. A formulaic solution does not exist, so we use another tool in R, the function "optim". This function lets you do basic optimization and minimizes any function you pass it, along with required parameters. It returns parameter values that minimize this function. The actual usage of this function isn't exactly intuitive. Most examples on the internet talk of minimizing a proper well formed function. Most real life applications involve minimizing functions having lots of parameters and additional data. The funct"optim" accepts the "..." argument which is a means to pass through arguments to the function you want to minimize. So here is how you would do it in R, all of it.

The above code walks through this example by calling optim. It finally outputs the fits in original domain using all three methods
1. The grey line represents the fit if you simply used "lm"
2. The red line represents the fit if you transformed the data and used "lm" in the transformed domain but without regularization. Note: you are clearly worse off.
3. The blue dotted line shows the fit if you used transformation and regularization, clearly a much better fit

The function "optim" has lots of methods that it uses for finding the minimum value of the function. Typically, you may also want to poke around with the best value of $$\lambda$$ in the minimization function to get better fits.
If you are looking to buy some books on time series analysis here is a good collection. Some good books to own for probability theory are referenced here

Sunday, March 2, 2014

The Lazy Apprentice

Q: A shopkeeper hires an apprentice for his store which gets one customer per minute on average uniformly randomly. The apprentice is expected to leave the shop open until at least 6 minutes have passed when no customer arrives. The shop keeper suspects that the apprentice is lazy and wants to close the shop at a shorter notice. The apprentice claims (and the shopkeeper verifies), that the shop is open for about 2.5hrs on average. How could the shopkeeper back his claim?

Statistics: A good book to learn statistics

A: Per the contract, at least 6 minutes should pass without a single customer showing up before the apprentice can close the shop. To solve this lets tackle a different problem first. Assume you have a biased coin with a probability $$p$$ of landing heads. What is the expected number of tosses before you get $$n$$ heads in a row. The expected number of tosses to get to the first head is simple enough to calculate, its $$\frac{1}{p}$$. How about two heads? We can formulate this recursively. We need to get to a head first. Following this, you need to toss the coin one more time for sure. With a probability $$p$$ you get the second heads or with a probability $$1 - p$$ you have to start over again. The number of tosses to two heads is thus $$\frac{1}{p} + 1 + \frac{1}{p}\times(1-p)$$.

Extending this out to get $$n$$ tosses, if you assume that $$y(n)$$ is the expected number of tosses to get to $$n$$ heads in a row then the following state transition diagram shows how the transitions happen.

From the state $$y_{n-1}$$, with probability $$1- p$$ you start over again. Stated recursively
$$y_{n} = y_{n-1} + 1 + (1-p)y_{n}\\ py_{n} = y_{n-1} + 1$$
Using the above expression, it is easy to derive the general expression for $$y_{n}$$ as follows, keeping in mind $$y_{0} = 0$$
$$y_{1} = \frac{y_{0}}{p} + \frac{1}{p} = \frac{1}{p}\\ y_{2} = \frac{1}{p}(y_{1} + 1) = \frac{1}{p^{2}} + \frac{1}{p} y_{3} = \frac{1}{p}(y_{2} + 1) = \frac{1}{p^{3}} + \frac{1}{p^{2}} + \frac{1}{p}$$
Being a sum of a geometric series, the $$y_{n}$$ can be evaluated to
$$y_{n} = \frac{1}{1-p}\big(\frac{1}{p^{n}} - 1\big)$$
Now, back to the original question. Assume the apprentice waits $$k$$ minutes before no customer shows up and he chooses to shut the shop. The situation is "similar" to the coin tossing and awaiting for a string of heads. (Note: Strictly speaking, it's similar only in the limiting case when the time interval is very small). In this case each "head" signifies the absence of a customer showing up in a minute. As the customers arrive uniformly at random we can assume they follow a Poisson process. The probability that $$m$$ customers arrive in a one minute window if the rate parameter is $$\lambda$$ (in this case $$\lambda = 1min^{-1}$$) is

$$P(m,\lambda) = \frac{(\lambda)^{m}e^{-\lambda}}{m!}$$

When $$m =0$$ we get $$p = e^{-\lambda}$$. Plugging this value back to our equation for $$y_{n}$$ we get
$$y_{n} = \frac{1}{1 - e^{-\lambda}}\big(e^{k\lambda} - 1\big)$$
for small $$\lambda$$ the denominator of the first part of the equation can be approximated as $$\frac{1}{\lambda}$$ yielding
$$y_{n} = \frac{1}{\lambda}\big(e^{k\lambda} - 1\big)$$

Note, if you plug $$k=6$$ into the above equation you get $$\approx 7$$ hours whereas for $$k=5$$ you get $$\approx 2.5$$ hours. Due to the exponential connection between $$y_{n}$$ and $$k$$ the values for $$y_{n}$$ are super sensitive to changes in $$k$$.

If you are looking to buy some books in probability here are some of the best books to learn the art of Probability

Tuesday, February 11, 2014

The Best Books for Time Series Analysis

If you are looking to learn time series analysis, the following are some of the best books in time series analysis.

Introductory Time Series with R (Use R!)
This is good book to get one started on time series. A nice aspect of this book is that it has examples in R and some of the data is part of standard R packages which makes good introductory material for learning the R language too. That said this is not exactly a graduate level book, and some of the data links in the book may not be valid.

Econometrics
A great book if you are in an economics stream or want to get into it. The nice thing in the book is it tries to bring out a oneness in all the methods used. Econ majors need to be up-to speed on the grounding mathematics for time series analysis to use this book. Outside of those prerequisites, this is one of the best books on econometrics and time series analysis.

Pattern Recognition and Machine Learning (Information Science and Statistics)
This is excellent book to own for scientists and engineers wanting to use time series methods in machine learning, forecasting and regression. Great charts and fairly readable text.

Time Series: Theory and Methods (Springer Series in Statistics)
A good book which covers a lot of theoretical aspects but with little practical applications covered. It comes with software so it doesn't really support open source alternatives like R/Python. This is all about a rigorous treatment to Time series analysis (as is the case with most Springer Series). Good for graduate students and academics.

Time Series Analysis
An ideal book for graduate students and it is fairly comprehensive. Lots of essential approaches are covered in this text. Typical ones include Bayesian approaches, Spectral Analysis and the newer vector auto regression. Strongly recommended for graduate students. The book does not cover real world data sets and applications in enough detail.

Time Series Analysis and Forecasting by Example (Wiley Series in Probability and Statistics)
A good book to get an introduction to time series analysis. It stresses more on the signal processing aspects too like auto regressive models. A drawback is the book requires software and does not use open source, likewise there aren't answers to the questions posted. All said a good book to own but do not forget the caveats.

Sunday, February 2, 2014

Two Quick Puzzles

The following are two puzzles which look tough at first but have quick and really elegant solutions.

Q1: Ants on a wire:
A large number of ants are on a wire of length $$L$$. All ants start moving randomly, either right or left with a fixed velocity $$V$$. If they collide they turn around and move in the opposite direction. Ants at the ends of the wire fall off. What is the time taken for all ants to fall off the wire?

Q2: The Unruly Passenger:
Several passengers are in a queue to board a plane. The first passenger in the queue is an unruly one and chooses a seat at random. Subsequent passengers take their allotted seat if it is unoccupied or pick a seat at random if it is occupied. What is the probability that the last passenger gets to sit on his allotted seat?
Statistics: A good book to learn statistics

A1: This seemingly complex problem has an elegantly simple solution. The fact that they collide and turn around is the same as if they walked through each other! See figure below

Once this is insight sinks in, the average time taken for all ants to fall off the wire can be easily calculated. It is the same as the time an ant takes to move from one end of the wire to the other end. This works out to $$\frac{L}{V}$$.

Effective Java (2nd Edition)

A2: You absolutely do not want to consider the various ways a large number of passengers can fill up an equally large number of seats. Bear in mind that the only unruly passenger is the first one, and what we want to know is the probability that the last passenger gets to sit on his seat. The last passenger will face exactly two scenarios, either he gets his seat or not. He will get his seat if the first passenger picks his allotted seat which happens with a probability $$\frac{1}{2}$$

If you are looking to buy some books in probability here are some of the best books to learn the art of Probability

Friday, January 24, 2014

Three Random Numbers

Q: You play a game with a friend where he chooses two random numbers between 0 and 1. Next you choose a random number between 0 and 1. If your number falls between the prior two numbers you win. What is the probability that you would win?

A: Consider the number line between 0 and 1 shown in figure below

Let $$x$$ and $$y$$ be the two numbers chosen. The probability of a win, i.e. choosing a number in the range between the two chosen numbers can be estimated as $$\frac{|y-x|}{1}$$. Note, we have chosen the modulus operation because $$x$$ could be greater than $$y$$ or vice versa. The feasible region of numbers to be chosen to win, is the ratio of the absolute difference between $$x$$ and $$y$$ divided by the total possible range, which is 1. In order to estimate the probability that a third chosen number will lie between the two we integrate out (a double integral) between the ranges of $$[0,1]$$. This is estimated as
$$P(\text{win}) = \int_{0}^{1}\int_{0}^{1}|y - x| dx dy$$
To evaluate the above integral, we split the inner integral as follows
$$P(\text{win}) = \int_{0}^{1}\Big[\int_{0}^{y}(y - x)dx + \int_{y}^{1}(x-y)dx\Big]dy$$
The inner integral evaluates to $$y^2 - y +\frac{1}{2}$$ which on further integration between the ranges of $$[0,1]$$ yields
$$P(\text{win}) = [\frac{y^3}{3} - \frac{y^2}{2} + \frac{y}{2}]_{0}^{1} = \frac{1}{3}$$
which is the sought probability.

If you are looking to buy some books in probability here are some of the best books to learn the art of Probability

Fifty Challenging Problems in Probability with Solutions (Dover Books on Mathematics)
This book is a great compilation that covers quite a bit of puzzles. What I like about these puzzles are that they are all tractable and don't require too much advanced mathematics to solve.

Introduction to Algorithms
This is a book on algorithms, some of them are probabilistic. But the book is a must have for students, job candidates even full time engineers & data scientists

Introduction to Probability Theory
Overall an excellent book to learn probability, well recommended for undergrads and graduate students

An Introduction to Probability Theory and Its Applications, Vol. 1, 3rd Edition
This is a two volume book and the first volume is what will likely interest a beginner because it covers discrete probability. The book tends to treat probability as a theory on its own

The Probability Tutoring Book: An Intuitive Course for Engineers and Scientists (and Everyone Else!)
A good book for graduate level classes: has some practice problems in them which is a good thing. But that doesn't make this book any less of buy for the beginner.

Introduction to Probability, 2nd Edition
A good book to own. Does not require prior knowledge of other areas, but the book is a bit low on worked out examples.

Bundle of Algorithms in Java, Third Edition, Parts 1-5: Fundamentals, Data Structures, Sorting, Searching, and Graph Algorithms (3rd Edition) (Pts. 1-5)
An excellent resource (students, engineers and even entrepreneurs) if you are looking for some code that you can take and implement directly on the job

Understanding Probability: Chance Rules in Everyday Life
This is a great book to own. The second half of the book may require some knowledge of calculus. It appears to be the right mix for someone who wants to learn but doesn't want to be scared with the "lemmas"

Data Mining: Practical Machine Learning Tools and Techniques, Third Edition (The Morgan Kaufmann Series in Data Management Systems)
This one is a must have if you want to learn machine learning. The book is beautifully written and ideal for the engineer/student who doesn't want to get too much into the details of a machine learned approach but wants a working knowledge of it. There are some great examples and test data in the text book too.

This is a good book if you are new to statistics & probability while simultaneously getting started with a programming language. The book supports R and is written in a casual humorous way making it an easy read. Great for beginners. Some of the data on the companion website could be missing.

Covered in this book are the central limit theorem and other graduate topics in probability. You will need to brush up on some mathematics before you dive in but most of that can be done online

This book has been yellow-flagged with some issues: including sequencing of content that could be an issue. But otherwise its good

Monday, January 6, 2014

The Three Magical Boxes

Q: You are playing a game wherein you are presented 3 magical boxes. Each box has a set probability of delivering a gold coin when you open it. On a single attempt, you can take the gold coin and close the box. In the next attempt you are free to either open the same box again or pick another box. You have a 100 attempts to open the boxes. You do not know what the win probability is for each of the boxes. What would be a strategy to maximize your returns?

Machine Learning: A Probabilistic Perspective (Adaptive Computation and Machine Learning series)

A: Problems of this type fall into a category of algorithms called "multi armed bandits". The name has its origin in casino slot machines wherein a bandit is trying to maximize his returns by pulling different arms of a slot machine by using several "arms". The dilemma he faces is similar to the game described above. Notice, the problem is a bit different from a typical estimation exercise. You could simply split your 100 attempts into 3 blocks of 33,33 & 34 for each of the boxes. But this would not be optimal. Assume that one of the boxes had just a $$1\%$$ probability of yielding a golden coin. Even as you probe and explore that box you know intuitively that you have spent a fair amount of attempts to simply reinforce something you already knew. You need a strategy that adjusts according to new information that you gain from each attempt. Something that gradually transitions away from a box that yields less to a box that yields more.

Assume at the beginning of the game you do not know anything about the yield probabilities. Assign a prior set of values of $$\big[\frac{1}{2}, \frac{1}{2},\frac{1}{2}\big]$$. Simultaneously maintain a set of likelihoods using which you will decide which box to sample next. Initially all three values are set to 1s $$\{p_1 = 1,p_2 = 1,p_3 = 1\}$$. First open the boxes in succession and use up $$n$$ attempts per box. If you denote the number of successes for each box as $$\{s_1,s_2,s_3\}$$, then you could update the posterior distribution of your belief in what box yields as follows
$$p_1 = \frac{1 + s_1}{2 + n} \\ p_2 = \frac{1 + s_2}{2 + n} \\ p_3 = \frac{1 + s_3}{2 + n}$$
Think of this as your initializing phase. Once you initialize your estimates, subsequent choice of boxes should be based on a re-normalized probability vector derived from $$p_1,p_2,p_3$$. What this means is that the probability you would pick a box is computed as follows
$$P(\text{pick box 1}) = \frac{p_1}{p_1 + p_2 + p_3} \\ P(\text{pick box 2}) = \frac{p_2}{p_1 + p_2 + p_3} \\ P(\text{pick box 3}) = \frac{p_3}{p_1 + p_2 + p_3}$$
What ends up happening here is that you will pick the box which has the highest probability of winning based on information gleaned up to a certain point. Another benefit of this approach is you are learning in real time. If a certain box isn't yielding as much as another you don't discard opening that box all together, instead you progressively sample it less often.

If you are looking to buy some books in probability here are some of the best books to own

Fifty Challenging Problems in Probability with Solutions (Dover Books on Mathematics)
This book is a great compilation that covers quite a bit of puzzles. What I like about these puzzles are that they are all tractable and don't require too much advanced mathematics to solve.

Introduction to Algorithms
This is a book on algorithms, some of them are probabilistic. But the book is a must have for students, job candidates even full time engineers & data scientists

Introduction to Probability Theory
Overall an excellent book to learn probability, well recommended for undergrads and graduate students

An Introduction to Probability Theory and Its Applications, Vol. 1, 3rd Edition
This is a two volume book and the first volume is what will likely interest a beginner because it covers discrete probability. The book tends to treat probability as a theory on its own

The Probability Tutoring Book: An Intuitive Course for Engineers and Scientists (and Everyone Else!)
A good book for graduate level classes: has some practice problems in them which is a good thing. But that doesn't make this book any less of buy for the beginner.

Introduction to Probability, 2nd Edition
A good book to own. Does not require prior knowledge of other areas, but the book is a bit low on worked out examples.

Bundle of Algorithms in Java, Third Edition, Parts 1-5: Fundamentals, Data Structures, Sorting, Searching, and Graph Algorithms (3rd Edition) (Pts. 1-5)
An excellent resource (students, engineers and even entrepreneurs) if you are looking for some code that you can take and implement directly on the job

Understanding Probability: Chance Rules in Everyday Life
This is a great book to own. The second half of the book may require some knowledge of calculus. It appears to be the right mix for someone who wants to learn but doesn't want to be scared with the "lemmas"

Data Mining: Practical Machine Learning Tools and Techniques, Third Edition (The Morgan Kaufmann Series in Data Management Systems)
This one is a must have if you want to learn machine learning. The book is beautifully written and ideal for the engineer/student who doesn't want to get too much into the details of a machine learned approach but wants a working knowledge of it. There are some great examples and test data in the text book too.

This is a good book if you are new to statistics & probability while simultaneously getting started with a programming language. The book supports R and is written in a casual humorous way making it an easy read. Great for beginners. Some of the data on the companion website could be missing.

Covered in this book are the central limit theorem and other graduate topics in probability. You will need to brush up on some mathematics before you dive in but most of that can be done online

This book has been yellow-flagged with some issues: including sequencing of content that could be an issue. But otherwise its good