Skip to main content

Posts

Embarrassing Questions, German Tanks and Estimations

Follow @ProbabilityPuz Q: You are conducting a survey and want to ask an embarrassing yes/no question to subjects. The subjects wouldn't answer that embarrassing question honestly unless they are guaranteed complete anonymity. How would you conduct the survey? Machine Learning: The Art and Science of Algorithms that Make Sense of Data A: One way to do this is to assign a fair coin to the subject and ask them to toss it in private. If it came out heads then answer the question truthfully else toss the coin a second time and record the result (heads = yes, tails = no). With some simple algebra you can estimate the proportion of users who have answered the question with a yes. Assume total population surveyed is \(X\). Let \(Y\) subjects have answered with a "yes". Let \(p\) be the sort after proportion. The tree diagram below shows the user flow. The total expected number of "yes" responses can be estimated as $$ \frac{pX}{2} +...

The Chakravala Algorithm in R

Follow @ProbabilityPuz A class of analysis that has piqued the interest of mathematicians across millennia are Diophantine equations . Diophantine equations are polynomials with multiple variables and seek integer solutions. A special case of Diophantine equations is the Pell's equation . The name is a bit of a misnomer as Euler mistakenly attributed it to the mathematician John Pell. The problem seeks integer solutions to the polynomial $$ x^{2} - Dy^{2} = 1 $$ Several ancient mathematicians have attempted to study and find generic solutions to Pell's equation. The best known algorithm is the Chakravala algorithm discovered by Bhaskara circa 1114 AD. Bhaskara implicitly credits Brahmagupta (circa 598 AD) for it initial discovery, though some credit it to Jayadeva too. Several Sanskrit words used to describe the algorithm appear to have changed in the 500 years between the two implying other contributors. The Chakravala technique is simple and implementing it ...

Hopping Robots and Reinforcement Learning

Follow @ProbabilityPuz All too often, when we deal with data the outcome needed is a strategy or an algorithm itself. To arrive at that strategy we may have historic data or some model on how entities in system respond to various situations. In this write up, I'll go over the method of reinforcement learning. The general idea behind reinforcement learning is to come up with a strategy to maximize some measurable goal. For example, if you are modelling a robot that learns to navigate around obstacles, you want the learning process to come back with a strategy that minimizes collisions (say) with other entities in the environment. Pattern Recognition and Machine Learning (Information Science and Statistics) For the sake of simplicity, lets assume the following scenario. A robot is placed (at random) on flat plank of wood which has some sticky glue in the center. To its left there is a hole which damages the robot a bit and to its right is a reward which is its dest...

The Two Strategies

Follow @ProbabilityPuz Q: You are in a game where you get to toss a pair of coins once. There are two boxes (A & B) holding a pair each. Box A's coins are fair however B's coins are biased with probability of heads being \(0.6\) and \(0.4\) respectively. You are paid for the expected number of heads you will win. Which of the boxes should you pick? Machine Learning: The Art and Science of Algorithms that Make Sense of Data A: The expected number of heads if you chose box A is easy to calculate as $$ E(\text{heads}| A) = \frac{1}{2} + \frac{1}{2} = 1 $$ However the expected number of heads if you chose box B is also the same $$ E(\text{heads}| B) = \frac{4}{10} + \frac{6}{10} = 1 $$ The average yield being the same could make one think that both boxes yield the same. However there is one difference, its the variance. The variance of a distribution of a random variable \(X\) is defined as $$ Var(X) = \sum_{i=0}^{N} (x_i - \bar{x})^{2}p_i $$ where ...

Linear Regression, Transforms and Regularization

Follow @ProbabilityPuz This write up is about the simple linear regression and ways to make it robust to outliers and non linearity. The linear regression method is a simple and powerful method. It is powerful because it helps compress a lot of information through a simple straight line. The complexity of the problem is vastly simplified. However being so simple comes with its set of limitations. For example, the method assumes that after a fit is made, the differences between the predicted and actual values are normally distributed. In reality, we rarely run into such ideal conditions. Almost always there is non-normality and outliers in the data that makes fitting a straight line insufficient. However there are some tricks you could do to make it better. Statistics: A good book to learn statistics As an example data set consider some dummy data shown in the table/chart below. Notice, value 33 is an outlier. When charted. you can see there is some non-linearity in the...

The Lazy Apprentice

Follow @ProbabilityPuz Q: A shopkeeper hires an apprentice for his store which gets one customer per minute on average uniformly randomly. The apprentice is expected to leave the shop open until at least 6 minutes have passed when no customer arrives. The shop keeper suspects that the apprentice is lazy and wants to close the shop at a shorter notice. The apprentice claims (and the shopkeeper verifies), that the shop is open for about 2.5hrs on average. How could the shopkeeper back his claim? Statistics: A good book to learn statistics A: Per the contract, at least 6 minutes should pass without a single customer showing up before the apprentice can close the shop. To solve this lets tackle a different problem first. Assume you have a biased coin with a probability \(p\) of landing heads. What is the expected number of tosses before you get \(n\) heads in a row. The expected number of tosses to get to the first head is simple enough to calculate, its \(\frac{1}{p}\). ...

The Best Books for Time Series Analysis

Follow @ProbabilityPuz If you are looking to learn time series analysis, the following are some of the best books in time series analysis. Introductory Time Series with R (Use R!) This is good book to get one started on time series. A nice aspect of this book is that it has examples in R and some of the data is part of standard R packages which makes good introductory material for learning the R language too. That said this is not exactly a graduate level book, and some of the data links in the book may not be valid. Econometrics A great book if you are in an economics stream or want to get into it. The nice thing in the book is it tries to bring out a oneness in all the methods used. Econ majors need to be up-to speed on the grounding mathematics for time series analysis to use this book. Outside of those prerequisites, this is one of the best books on econometrics and time series analysis. Pattern Recognition and Machine Learning (Information Science and Statis...