Skip to main content

Embarrassing Questions, German Tanks and Estimations




submit to reddit

Q: You are conducting a survey and want to ask an embarrassing yes/no question to subjects. The subjects wouldn't answer that embarrassing question honestly unless they are guaranteed complete anonymity. How would you conduct the survey?

Machine Learning: The Art and Science of Algorithms that Make Sense of Data

A: One way to do this is to assign a fair coin to the subject and ask them to toss it in private. If it came out heads then answer the question truthfully else toss the coin a second time and record the result (heads = yes, tails = no). With some simple algebra you can estimate the proportion of users who have answered the question with a yes.

Assume total population surveyed is \(X\). Let \(Y\) subjects have answered with a "yes". Let \(p\) be the sort after proportion. The tree diagram below shows the user flow.


The total expected number of "yes" responses can be estimated as
$$
\frac{pX}{2} + \frac{X}{4} = Y
$$
which on simplification yields
$$
p = \big(\frac{4Y}{X} - 1\big)\frac{1}{2}
$$

Best Books on Probability

Q: A bag contains unknown number of tiles numbered in serial order \(1,2,3,...,n\). You draw \(k\) tiles from the bag without replacement and find the maximum number etched on them to be \(m\). What is your estimate of the number of tiles in the bag?

A: This and problems like these are called German War Tank problems. During WW-II German tanks were numbered in sequential order when they were manufactured. Allied forces needed an estimate of how many tanks were deployed and they had a handful of captured tanks and their serial numbers painted on them. Using this, statisticians estimated the actual tanks to be far lower than what intelligence estimates had them believe. So how does it work?

Let us assume we draw a sample of size \(k\). The maximum in that sample is \(m\). If we estimate the maximum of the population to be \(m\) then probability of the sample maximum to be \(m\) is
$$
P(\text{Sample Max} = m) = \frac{m-1 \choose k-1}{N \choose k}
$$
The \(-1\) figures because the maximum is already taken out of the sample leaving behind \(m - 1\) to choose \(k -1 \) from. The expected value of the maximum using this strategy is thus
$$
E(\text{Maximum}) = \sum_{m=k}^{m=N}m\frac{m-1 \choose k-1}{N \choose k}
$$
Note, we run the above summation from \(k\) to \(N\) as for \(m < k\) the expectation is \(0\) because the sample maximum has to be at least \(k\). After a series of algebraic manipulations ( ref ) the above simplifies to
$$
E(\text{Maximum}) = M\big( 1 + \frac{1}{k}\big) - 1
$$
which is quite an ingenious and simple way to estimate population size given serial number ordering.

If you are looking to buy some books on probability theory here is a good list.

Comments

Popular posts from this blog

The Best Books to Learn Probability

If you are looking to buy some books in probability here are some of the best books to learn the art of Probability The Probability Tutoring Book: An Intuitive Course for Engineers and Scientists (and Everyone Else!) A good book for graduate level classes: has some practice problems in them which is a good thing. But that doesn't make this book any less of buy for the beginner. An Introduction to Probability Theory and Its Applications, Vol. 1, 3rd Edition This is a two volume book and the first volume is what will likely interest a beginner because it covers discrete probability. The book tends to treat probability as a theory on its own Discovering Statistics Using R This is a good book if you are new to statistics & probability while simultaneously getting started with a programming language. The book supports R and is written in a casual humorous way making it an easy read. Great for beginners. Some of the data on the companion website could be missing. Fifty Cha

The Best Books for Linear Algebra

The following are some good books to own in the area of Linear Algebra. Linear Algebra (2nd Edition) This is the gold standard for linear algebra at an undergraduate level. This book has been around for quite sometime a great book to own. Linear Algebra: A Modern Introduction Good book if you want to learn more on the subject of linear algebra however typos in the text could be a problem. Linear Algebra (Dover Books on Mathematics) An excellent book to own if you are looking to get into, or want to understand linear algebra. Please keep in mind that you need to have some basic mathematical background before you can use this book. Linear Algebra Done Right (Undergraduate Texts in Mathematics) A great book that exposes the method of proof as it used in Linear Algebra. This book is not for the beginner though. You do need some prior knowledge of the basics at least. It would be a good add-on to an existing course you are doing in Linear Algebra. Linear Algebra, 4th Ed

Fun with Uniform Random Numbers

Q: You have two uniformly random numbers x and y (meaning they can take any value between 0 and 1 with equal probability). What distribution does the sum of these two random numbers follow? What is the probability that their product is less than 0.5. The Probability Tutoring Book: An Intuitive Course for Engineers and Scientists A: Let z = x + y be the random variable whose distribution we want. Clearly z runs from 0 to 2. Let 'f' denote the uniform random distribution between [0,1]. An important point to understand is that f has a fixed value of 1 when x runs from 0 to 1 and its 0 otherwise. So the probability density for z, call it P(z) at any point is the product of f(y) and f(z-y), where y runs from 0 to 1. However in that range f(y) is equal to 1. So the above equation becomes From here on, it gets a bit tricky. Notice that the integral is a function of z. Let us take a look at how else we can simply the above integral. It is easy to see that f(z-y) = 1 when (