## Sunday, May 19, 2013

### Angelina Jolie and Bayesian Statistics

This is less of a puzzle and more a write up based on recent news surrounding Angelina Jolie and breast cancer. This write up focusses on the statistical angle and is not meant to prove or disprove conspiracy theories that float around :). An interesting take is mentioned here and all statistics described here borrow from that article.

The Probability Tutoring Book: An Intuitive Course for Engineers and Scientists (And Everone Else!)

A key point floated around by popular media was that she had an 87% risk of breast cancer because she had the BRCA mutation in her genes. But note, just $$\frac{1}{600}$$ actually have that mutation! Even if you did have that mutation, the chances of you eventually getting breast cancer is said to be around $$56\%$$. Also note, it is important to understand whether it is exactly that mutation which is causing breast cancer, you have a $$13\%$$ chance of getting it anyway (source: Breast Cancer.org). So even if you do test positive for the BRCA mutation, the incremental probability that you will actually develop it is can be calculated as $$56 - 13 = 43\%$$. This implies that eventually only $$\frac{600}{0.43} = 1395$$ people will actually benefit from the screening. Note how the articles will lead you to believe that you should have a screening done. Once you screen for that gene (and pay for it!) most likely you will come back relieved that you don't have that mutation. This analysis of course discounts false positives and negatives that can come out of the screening process itself which further adds to the chaos. The good news is that if you eat healthy and stay fit you will do well :)

If you are looking to learn the art of probability here are a few good books to own

Fifty Challenging Problems in Probability with Solutions (Dover Books on Mathematics)

This book is a great compilation that covers quite a bit of puzzles. What I like about these puzzles are that they are all tractable and don't require too much advanced mathematics to solve.

Introduction to Algorithms
This is a book on algorithms, some of them are probabilistic. But the book is a must have for students, job candidates even full time engineers & data scientists

An Introduction to Probability Theory and Its Applications, Vol. 1, 3rd Edition

The Probability Tutoring Book: An Intuitive Course for Engineers and Scientists (and Everyone Else!)

Introduction to Probability, 2nd Edition

The Mathematics of Poker
Good read. Overall Poker/Blackjack type card games are a good way to get introduced to probability theory

Bundle of Algorithms in Java, Third Edition, Parts 1-5: Fundamentals, Data Structures, Sorting, Searching, and Graph Algorithms (3rd Edition) (Pts. 1-5)
An excellent resource (students/engineers/entrepreneurs) if you are looking for some code that you can take and implement directly on the job.

Understanding Probability: Chance Rules in Everyday Life A bit pricy when compared to the first one, but I like the look and feel of the text used. It is simple to read and understand which is vital especially if you are trying to get into the subject

Data Mining: Practical Machine Learning Tools and Techniques, Third Edition (The Morgan Kaufmann Series in Data Management Systems) This one is a must have if you want to learn machine learning. The book is beautifully written and ideal for the engineer/student who doesn't want to get too much into the details of a machine learned approach but wants a working knowledge of it. There are some great examples and test data in the text book too.

Discovering Statistics Using R
This is a good book if you are new to statistics & probability while simultaneously getting started with a programming language. The book supports R and is written in a casual humorous way making it an easy read. Great for beginners. Some of the data on the companion website could be missing.

## Thursday, May 16, 2013

### The Uncertain Game Host

Q: You are in a game where the game host has a gold coin covered under one of three hats. However, the game host can choose to not put the gold coin under any hat with a probability of 10%. You open the first two hats and find no coin under it. What is the probability that there is a gold under the 3rd hat?

The Probability Tutoring Book: An Intuitive Course for Engineers and Scientists (And Everone Else!)

A: The problem clearly requires a Bayesian approach. However, it needs to be framed correctly, or else it may not be easy to grasp. Let us start with the hypothesis we want to test for $$H$$. Clearly if the host has put in a gold coin, then the probability of winning is 1 as you have already opened up two hats. The new evidence here is that two hats have been opened and no coins have shown up. Also the probability of seeing such evidence is 1 if the host has not put coin under any hat. So the way to frame the sought probability is
$$P(H|E) = \frac{P(E|H)\times P(H)}{P(E|H)\times P(H) + P(E|\neg H)\times (1 - P(H))}$$
From the above: $$P(E|\neg H) = 1$$ and $$P(E|H) = \frac{2}{3} \times \frac{1}{2} = \frac{1}{3}$$. It is important to explain how we got to $$P(E|H) = \frac{1}{3}$$. Given that the host has put in a coin somewhere there is a $$\frac{2}{3}$$ chance that the first hat flip would reveal nothing and given this empty flip there is a $$\frac{1}{2}$$ chance that the next flip would reveal nothing. Putting it all together gives
$$P(H|E) = \frac{\frac{1}{3}\times \frac{9}{10}}{\frac{1}{3} \times \frac{9}{10} + 1 \times (1 - \frac{9}{10})}\\ = \frac{3}{4}$$
Now that you have the answer, it is also intuitive to see why it remains high despite two empty flips. The probability that the host would insert is a coin is high to begin with.

Some good books to learn the art of probability

Fifty Challenging Problems in Probability with Solutions (Dover Books on Mathematics)

This book is a great compilation that covers quite a bit of puzzles. What I like about these puzzles are that they are all tractable and don't require too much advanced mathematics to solve.

Introduction to Algorithms
This is a book on algorithms, some of them are probabilistic. But the book is a must have for students, job candidates even full time engineers & data scientists

An Introduction to Probability Theory and Its Applications, Vol. 1, 3rd Edition

The Probability Tutoring Book: An Intuitive Course for Engineers and Scientists (and Everyone Else!)

Introduction to Probability, 2nd Edition

The Mathematics of Poker
Good read. Overall Poker/Blackjack type card games are a good way to get introduced to probability theory

Bundle of Algorithms in Java, Third Edition, Parts 1-5: Fundamentals, Data Structures, Sorting, Searching, and Graph Algorithms (3rd Edition) (Pts. 1-5)
An excellent resource (students/engineers/entrepreneurs) if you are looking for some code that you can take and implement directly on the job.

Understanding Probability: Chance Rules in Everyday Life A bit pricy when compared to the first one, but I like the look and feel of the text used. It is simple to read and understand which is vital especially if you are trying to get into the subject

Data Mining: Practical Machine Learning Tools and Techniques, Third Edition (The Morgan Kaufmann Series in Data Management Systems) This one is a must have if you want to learn machine learning. The book is beautifully written and ideal for the engineer/student who doesn't want to get too much into the details of a machine learned approach but wants a working knowledge of it. There are some great examples and test data in the text book too.

Discovering Statistics Using R
This is a good book if you are new to statistics & probability while simultaneously getting started with a programming language. The book supports R and is written in a casual humorous way making it an easy read. Great for beginners. Some of the data on the companion website could be missing.

## Friday, May 3, 2013

### Polya's Urn

Q: An urn has $$r$$ red balls and $$b$$ blue balls. Someone draws a ball at random, its colour observed and put back into the urn. You do not know what was observed. However that person puts back $$x$$ balls of the same colour back into the urn. Now, you draw a second ball from this urn. What is the probability that it is red?

A: The framing of this puzzle follows directly from the "Polya's Urn" process. It presents yet another surprising result from Bayesian reasoning. Intuitively, it appears that the act of putting in new balls of the same colour would tamper with the probability of drawing a red ball for the second draw. But does it? Lets take a look.

The probability that a red ball is drawn from the urn in the first draw is $$\frac{r}{r+b}$$ and for a blue ball would be $$\frac{b}{r+b}$$. The second draw, if it is a red ball, could be a consequence of either a red ball being drawn the first time or a blue ball.

For the second draw, the probability that a red ball is drawn if a red ball is drawn the first time, would be $$\frac{r+x}{r + b + x}$$. The probability that a red ball is drawn if a blue ball is drawn the first time, would be $$\frac{b}{r+b+x}$$. This layout is shown in the figure below.

The probability that a red ball is drawn on the second draw is
$$P(\text{Red: Draw=2})=\frac{r + x}{r + b + x}\times\frac{r}{r+b} + \frac{r}{r+b+x}\times\frac{b}{r+b}$$
The above simplifies as
$$\frac{(r+b+x)r}{(r+b+x)(r+b)} = \frac{r}{r+b}$$
Note, the probability remains exactly the same!
As with most Bayesian stuff, the result eventually becomes intuitive when we spend more time thinking of the problem. The act of adding $$x$$ balls based on the outcome of the first draw, is really meaningless!

Some good books to learn the art of probability
Fifty Challenging Problems in Probability with Solutions (Dover Books on Mathematics)

This book is a great compilation that covers quite a bit of puzzles. What I like about these puzzles are that they are all tractable and don't require too much advanced mathematics to solve.

Introduction to Algorithms
This is a book on algorithms, some of them are probabilistic. But the book is a must have for students, job candidates even full time engineers & data scientists

An Introduction to Probability Theory and Its Applications, Vol. 1, 3rd Edition

The Probability Tutoring Book: An Intuitive Course for Engineers and Scientists (and Everyone Else!)

Introduction to Probability, 2nd Edition

The Mathematics of Poker
Good read. Overall Poker/Blackjack type card games are a good way to get introduced to probability theory

Bundle of Algorithms in Java, Third Edition, Parts 1-5: Fundamentals, Data Structures, Sorting, Searching, and Graph Algorithms (3rd Edition) (Pts. 1-5)
An excellent resource (students/engineers/entrepreneurs) if you are looking for some code that you can take and implement directly on the job.

Understanding Probability: Chance Rules in Everyday Life A bit pricy when compared to the first one, but I like the look and feel of the text used. It is simple to read and understand which is vital especially if you are trying to get into the subject

Data Mining: Practical Machine Learning Tools and Techniques, Third Edition (The Morgan Kaufmann Series in Data Management Systems) This one is a must have if you want to learn machine learning. The book is beautifully written and ideal for the engineer/student who doesn't want to get too much into the details of a machine learned approach but wants a working knowledge of it. There are some great examples and test data in the text book too.

Discovering Statistics Using R
This is a good book if you are new to statistics & probability while simultaneously getting started with a programming language. The book supports R and is written in a casual humorous way making it an easy read. Great for beginners. Some of the data on the companion website could be missing.

## Wednesday, May 1, 2013

### The Restricted Boltzmann Machine (RBM) and Dreams

This write up is less of a puzzle and more a tutorial. Without further ado, here you go.

The Restricted Boltzmann Machine (RBM) has become increasingly popular of late after its success in the Netflix prize competition and other competitions. Most of the inventive work behind RBMs was done by Geoffrey Hinton. In particular the training of RBMs using an algorithm called "Contrastive Divergence" (CD). CD is very similar to gradient descent. A good consequence of the CD is its ability to "dream". Of the various machine learning methods out there, the RBM is the only one which has this capacity baked in implicitly.

Machine Learning: A Probabilistic Perspective (Adaptive Computation and Machine Learning series)

Energy Based Models:

The RBM is an energy based model. If we have a training data where we have two classes to classify, the RBM during its training phase, tries to assign a lower "energy" to a class than other classes. It achieves this by introducing an architecture of visible and hidden units. This is shown in the figure below.

There are also bias units but I've excluded them from the diagram above to demonstrate the uniqueness of RBMs. Notice, the visible units do not have any links amongst them. The hidden units also do not have connections amongst them. This is where the "Restricted" in name RBM comes from. They are restricted in whom they are connected with. The architecture, in that sense, is fixed. Each visible unit connects to all hidden units and each hidden unit connects to all visible units. Now onto to the next logical question. What are these units? The visible units are simply floating/integer value numbers. They can be represented by any vector or matrix of numbers. They are the gateway through which we enter training examples and make predictions/dreams etc. The hidden units are like neurons in a typical neural network. They aggregate all the data through the visible units. What about those connecting lines? They represent weights by which each each visible unit is connected to a hidden unit. The hidden units are sigmoid functions which take the form
$$P(x) = \frac{1}{1 + e^{-x}}$$
If $$P(x)$$ is $$\gt 0.5$$ then you can think of these units as "lighting up".
Data Mining: Practical Machine Learning Tools and Techniques, Third Edition (The Morgan Kaufmann Series in Data Management Systems)

Training & Learning:

In order to train an RBM we follow the CD algorithm. Note, RBMs are known to work best with binary input features and if we want to do classification of some sort. You could still do regression or deal with real valued features, but that is beyond the scope of this write up. For more information you can read up this excellent write up by Geoffrey Hinton. The CD algorithm works by initializing the weights to random values. Next we give the visible units a training example by turning on (setting to 1) the visible units which have the feature and compute how the hidden units light up. Each of the hidden units get a flow of values come in from each visible unit and they are aggregated by their corresponding weights. For the $$i^{th}$$ hidden unit, if $$w_{ij}$$ represents all the weights pointing towards it, the total value coming into it would be
$$\sum_{j}^{N}w_{ij}v_{j}$$
where $$N$$ is the number of visible units. This causes some of the hidden units to light up. This is what gets called as "positive phase" (it doesn't matter if it is called negative phase, its just a name). Next, using the lit up hidden units, the visible units are recreated. This step is crucial to understand. Also note that the first time this is done, the visible units most likely will not look anything like the set you just gave it. However, based on how different the visible units look to the ones that you just gave it, the weights are adjusted. The degree to which this is done is decided by the learning rate, typically set to some small number. This method is very similar to gradient ascent. The smaller you set the learning rate at, the better the learning and the longer it would take to eventually finish. The entire process is repeated for the next training example and then the following one and so on until all examples are completed. All of this counts as one iteration. You do several hundreds of these iterations until the visible units that are generated by the RBM during the negative phase closely resemble the input ones during the positive phase.

Example

The following is an example implementation of the RBM done in R. I've tried to make it slightly more efficient by using R's built in apply functions This is by no means complete but should hopefully serve as an inspiration for someone to build out a package for this method in R and put that on CRAN.

Before getting into the example implementation, here is an example dataset I have created. It has just two classes of examples. They are several instances of an arrow pointing up and an arrow pointing down.

An example up arrow would look like the following. Notice, the ascii art :)

0,0,0,0,1,0,0,0,0
0,0,0,1,1,1,0,0,0
0,0,1,1,1,1,1,0,0
0,1,1,1,1,1,1,1,0
1,1,1,1,1,1,1,1,1

All the rows are merged to one long row for training purposes. Similarly a down arrow would look like the following.

1,1,1,1,1,1,1,1,1
0,1,1,1,1,1,1,1,0
0,0,1,1,1,1,1,0,0
0,0,0,1,1,1,0,0,0
0,0,0,0,1,0,0,0,0

To create a training set, simply flip a few 1s to 0s at random and create equal instances of each.

The R code for the RBM is shown below.

The R Book

act.f <- function(a){
p = 1/(1 + exp(-a))
ifelse(p > runif(1,0,1),return(1),return(0))
}

dream <- function(xx,w,nc){
a1 = w %*% xx
hidden.state = as.matrix(apply(a1,c(1,2),act.f),nrow=nc)
pos.e = hidden.state %*% t(xx)
a1 = t(w) %*% hidden.state
visible.state = as.matrix(apply(a1,c(1,2),act.f),nrow=1)
return(visible.state)
}

update.wgt <- function(xx,w,nc){
a1 = w %*% xx
hidden.state = as.matrix(apply(a1,c(1,2),act.f),nrow=nc)
pos.e = hidden.state %*% t(xx)
a1 = t(w) %*% hidden.state

visible.state = as.matrix(apply(a1,c(1,2),act.f),nrow=1)
a1 = w %*% visible.state

hidden.state = as.matrix(apply(a1,c(1,2),act.f),nrow=1)
neg.e = hidden.state %*% t(visible.state)
pos.e.iter <<- pos.e.iter + pos.e
neg.e.iter <<- neg.e.iter + neg.e
}

x = cbind(rep(1,nrow(x)),x)
x = as.matrix(x[,1:(length(x))],ncol=length(x))
num.hiddenstates = 3
w = matrix(runif(ncol(x)*num.hiddenstates,-1,1),nrow=num.hiddenstates)

system.time({
for(iter in seq(1:2000)){
pos.e.iter = matrix(rep(0,ncol(x)*num.hiddenstates),nrow=num.hiddenstates)
neg.e.iter = matrix(rep(0,ncol(x)*num.hiddenstates),nrow=num.hiddenstates)
apply(x,1,update.wgt,w,num.hiddenstates)
w = w + 0.01*(pos.e.iter - neg.e.iter)
}
})
print(t(w))

x.test1 = as.matrix(c(1,1,0,1,1,1,1,1,1,1,0,1,1,1,1,1,1,1,0,0,0,1,1,1,1,1,0,0,0,0,0,1,1,1,0,0,0,0,0,0,0,1,0,0,0,0),ncol=1)
x.test2 = as.matrix(c(1,0,0,0,0,1,0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,1,1,1,1,1,0,0,0,1,1,0,1,1,1,1,0,1,1,1,0,1,1,1,1,1),ncol=1)

a1 = w %*% x.test1
a1

a1 = w %*% x.test2
a1

d.vector = dream(x.test2,w,3)
d.vector = d.vector[-1]
d.state = matrix(d.vector,byrow=TRUE,nrow=5)
print(d.state)
d.vector = dream(x.test2,w,3)
d.vector = d.vector[-1]
d.state = matrix(d.vector,byrow=TRUE,nrow=5)
print(d.state)

The above code spends most of its time in the training phase. Also included are two example cases. The number of iterations are hard wired into this code segment at 2000 for simplicity. Three hidden units are chosen but you can change that by changing the variable num.hiddenstates. x.test1 and x.test2 are two matrices which have these example cases put in. The output of the hidden node values while under the two test cases are also written out. When you run this code, notice that the value of a1 for both cases will always be different when viewed in combination. Due to the random nature of the hidden units you would not get the same numbers for the hidden units every time you run this code. This is important to understand and note. A typical (and a good) way RBMs are used is to use this output as input to another classifier (a neural network or decision tree say) which in turn does the prediction. Such architectures are called "deep learning" architectures.

Dreaming

The final leg of the RBM code does a cool trick. It can "dream"! This simply means that the hidden units are used to regenerate the visible units.  To seed the dreaming process we give it one example from the training set (in the code I give it the test set itself) and see what units in the hidden layer get activated. Then the negative phase of the CD algorithm is run to regenerate the visible units. This can be done as many different times as one pleases. I've set it to dream up just two cases each of an ASCII art up-arrow. Here is how it showed up on my machine.

[1,]    0    0    0    0    1    0    0    0    0
[2,]    0    0    0    1    1    1    0    0    0
[3,]    0    0    1    1    0    1    1    0    0
[4,]    0    1    1    1    1    1    1    1    0
[5,]    1    1    1    0    1    1    0    1    1

[1,]    0    0    0    0    1    0    0    0    0
[2,]    0    0    1    1    1    1    0    0    0
[3,]    0    0    1    0    0    1    1    0    0
[4,]    0    1    1    1    1    1    1    1    0
[5,]    1    1    1    1    1    1    1    1    1

Hopefully this write up was helpful in gaining a better understanding of this nice machine learning algorithm. If you are looking to learn the art of machine learning here are some good books to own

Bayesian Reasoning and Machine Learning

Ensemble Methods: Foundations and Algorithms (Chapman & Hall/Crc Machine Learnig & Pattern Recognition)

Neural Networks for Pattern Recognition
A good book to understand why RBMs and Neural Networks are able to fit into functions we want to learn.

Recommender Systems Handbook

Collective Intelligence in Action

## Friday, April 26, 2013

### The Diet Problem

Q: You are creating a batch of protein bars and want your product to have as much protein in it as possible using two food sources A & B. Source A provides 5g of protein per pound and source B provides 4g of protein per pound. In a batch of the protein bar you do not want more than 4 pounds in total weight. Source A costs $2/pound and B$1/pound. You also want to keep the price of the entire batch to be lesser than \$5.

Practical Optimization

A: This is a good example of an application for the simplex algorithm. The simplex algorithm works quite well for problems that can be formulated in a linear manner with linear constraints. For example, if we assume the optimal amount of source A is $$x$$ pounds and source B is $$y$$ pounds, the objective function we want to maximize (the protein in the bars) can be formulated as follows
$$\text{Protein} = 5x + 4y\\$$
subject to constraints
$$x + y \le4\\ 2x + y\le 5\\$$
The optimal solution can be found using the simplex algorithm. The algorithm is readily available in R under the package "boot". The function "simplex" solves this readily for you. Here is the R code for the same, yielding an optimal solution of A=1,B=3

#!/usr/bin/Rscript
library(boot)
a.vec  = c(5,4)
simplex(a=a.vec,
A1=matrix(c(1,1,2,1),byrow=T,nrow=2),
b1=c(4,5),
A2=NULL,
b2=NULL,
A3=NULL,
b3=NULL,maxi=TRUE) 

The simplex algorithm is widely used and you can extend its application to other areas too. Some good books to learn the art of optimization

Convex Optimization

Optimization in Operations Research

Nonlinear Multiobjective Optimization: A Generalized Homotopy Approach (International Series of Numerical Mathematics)

## Sunday, April 21, 2013

### Divisibility by Nine Number Trick

A well known divisibility trick exists to tell if a number is divisible by 9. First add all the digits of that number and if the result is divisible by 9, then that number is also divisible by 9. How does one prove this?

Elementary Number Theory (Springer Undergraduate Mathematics Series)

For convenience let us assume it is a 4 digit number we are testing, say $$abcd$$. This can be expressed as
$$\text{abcd} = 1d + 10c + 100b + 1000a$$
This in turn can be expressed as
$$d + c + b + a + 9c + 99b + 999a \\ \text{or}\\ d + c + b + a + 9\times(c + 11b + 111a)\\$$
Note that the second part of the number $$9(c + 11b + 111a)$$ is divisible by 9. If $$d+c+b+a$$ is divisible by 9, then the whole number is divisible by 9. Hence the proof. By extension, as 9 is divisible by 3, the method applies to 3 as well.
The following are some good books on learning number theory

Number Theory: A Lively Introduction with Proofs, Applications, and Stories

An Introduction to the Theory of Numbers

A Classical Introduction to Modern Number Theory (Graduate Texts in Mathematics)

## Thursday, April 18, 2013

### Escaping from a Forest

Q: You are stuck in a forest. You have no information whatsoever on where you are in the forest, however you do know that the forest is shaped in the form of a very long rectangular strip of width $$b$$. You decide to walk out of the forest. What strategy would you adopt? If you were on the edge, what is the average expected distance you would walk? Assume you can measure what distance you can walk and can hold the orientation.

The Probability Tutoring Book: An Intuitive Course for Engineers and Scientists

A: A reasonable escape strategy is a fairly simple one. Since you are surrounded by the forest you do not know your orientation w.r.t the forest (see fig). Put in other words, you do not know $$\theta$$.

You pick a direction to walk and walk up a distance $$b$$. Once you have walked that distance you make a $$90^{o}$$ turn to the right and continue to walk a distance (of at most) $$b$$. Using this strategy it is guaranteed that you would hit the boundary of the forest strip!
To Compute the average distance travelled assuming you are already at the edge of the forest strip you can proceed as follows: from the figure above, we can see that $$\angle BAC = \angle DBE$$ and
$$\overline{AB} = b \\ \overline{BC} = b \sin(\theta) \\ \overline{DB} = b - b\sin(\theta)\\ \overline{BE} = \frac{b - b\sin(\theta) }{\cos(\theta)}$$
The distance travelled assuming you are at the edge of the forest is
$$d = \overline{AB} + \overline{BE}$$

Assuming $$b=1$$ this simplifies to
$$d = 1 + \frac{1 - \sin(\theta)}{\cos(\theta)}$$
The above function only covers the case when $$\theta$$ runs between $$[0,\frac{\pi}{2}]$$. For the case when $$\theta$$ runs from $$[\frac{\pi}{2},\pi]$$ the corresponding function works out to
$$d = 1 - \tan\theta$$

Trigonometry

The average distance you would expect to walk is the average of the above function $$d$$ with $$\theta$$ running from $$[0,\pi]$$. This in turn is given by
$$d = \frac{2}{\pi}\big( \int_{0}^{\pi} 1 + \frac{1 - \sin \theta}{\cos \theta}d\theta + \int_{\frac{\pi}{2}}^{\pi} 1 - \tan\theta d\theta \big)$$

Here is a script in R that evaluates this integral.
Discovering Statistics Using R
#!/usr/bin/Rscript

mval = 0
count = 0
for(theta in seq(0,pi/2-0.001,by=0.001)){
t1 = 1 + ((1 - sin(theta))/cos(theta))
mval = mval + t1
count = count + 1
}
for(theta in seq(pi/2+0.001,pi,by=0.001)){
t1 = 1 - tan(theta)
mval = mval + t1
count = count + 1
}
mval = mval / count
cat("Mean Value = ",mval,"\n")


The above script yields a value of 3.6. Also note, the above summation runs from $$[0,\pi]$$ whereas in reality it should run from $$[0,2\pi]$$. But the range from $$[\pi,2\pi]$$ the distance function is practically 0. So the average value would work out to $$\frac{3.6 + 0}{2} = 1.8$$

Some good books to learn the art of probability
Fifty Challenging Problems in Probability with Solutions (Dover Books on Mathematics)

This book is a great compilation that covers quite a bit of puzzles. What I like about these puzzles are that they are all tractable and don't require too much advanced mathematics to solve.

Introduction to Algorithms
This is a book on algorithms, some of them are probabilistic. But the book is a must have for students, job candidates even full time engineers & data scientists

An Introduction to Probability Theory and Its Applications, Vol. 1, 3rd Edition

The Probability Tutoring Book: An Intuitive Course for Engineers and Scientists (and Everyone Else!)

Introduction to Probability, 2nd Edition

The Mathematics of Poker
Good read. Overall Poker/Blackjack type card games are a good way to get introduced to probability theory

Bundle of Algorithms in Java, Third Edition, Parts 1-5: Fundamentals, Data Structures, Sorting, Searching, and Graph Algorithms (3rd Edition) (Pts. 1-5)
An excellent resource (students/engineers/entrepreneurs) if you are looking for some code that you can take and implement directly on the job.

Understanding Probability: Chance Rules in Everyday Life A bit pricy when compared to the first one, but I like the look and feel of the text used. It is simple to read and understand which is vital especially if you are trying to get into the subject

Data Mining: Practical Machine Learning Tools and Techniques, Third Edition (The Morgan Kaufmann Series in Data Management Systems) This one is a must have if you want to learn machine learning. The book is beautifully written and ideal for the engineer/student who doesn't want to get too much into the details of a machine learned approach but wants a working knowledge of it. There are some great examples and test data in the text book too.

Discovering Statistics Using R
This is a good book if you are new to statistics & probability while simultaneously getting started with a programming language. The book supports R and is written in a casual humorous way making it an easy read. Great for beginners. Some of the data on the companion website could be missing.