ShareButton

Thursday, December 13, 2012

The Naive Bayesian Approach to Machine Learning

On this write up, I'll explain the Naive Bayesian (NB) approach that is used in machine learning. First of, the "naive" in NB is part of the name, not an adjective added. The method is simple, robust and fairly effective for a lot of cases. Every engineer dealing with data absolutely must know this technique. It is one of those techniques which is simple and computable yet fairly simple to explain to people without a machine learning background.

Machine Learning: A Probabilistic Perspective (Adaptive Computation and Machine Learning series)

To start with, let us look at some data. Assume you run  a coffee shop. You keep a log of the gender of every customer along with their age which you estimate. Granted this estimation may not be accurate, but it should be reasonably within range. You also keep track of whether the customer bought a coffee cup that you had put up prominently on display. The data is shown in the table below


Gender Age Buy Cup (Y/N)
M teen n
M teen n
M middle y
M middle y
M middle n
M middle n
M elder y
M elder y
M elder y
F teen n
F teen y
F middle y
F middle y
F middle y
F middle n
F elder y
F elder y
F elder n
F elder n

You want to be able to predict, given you see a customer walk in, and you are able to estimate the gender & age, the probability that the customer would buy the cup?

The first assumption in the NB approach is that the features are independent. Implying having a feature (gender or age) take on a certain value has no bearing on how the other feature will turn out to be.

Next, assume a customer walks in and that customers gender is known to be 'M' and age to be 'middle'. We make a Bayesian estimation as follows



Next we exploit the fact that the features are independent, so the above equation simplifies to


One more simplification: we do not really need P(M,middle) because we are going to estimate P(N|M,middle) in a similar way and combine P(Y|M,middle) & P(N|M,middle) to get a final estimate. In this approach P(M,middle) becomes a normalizing factor (or would cancel out in the final estimate, whichever way you want to think about it). P(N|M,middle) works out as follows


Let us estimate each of the components:

P(Y)               = 11/19      = 57.8%
P(M | Y)         = 5/(5 + 6) = 45.4%
P(middle | Y)  = 5/11       = 45.4%
P(N)               = 8/19       = 42.2%
P(M | N)         = 4/8         = 50.0%
P(middle | N)  = 3/8         = 37.5%

The sought probability is simply


Plug in all these numbers and you get ~ 60%

What happens if one of the features are numeric in value (and not factors as in the example above). The method still remains the same, simply choose the distribution that best describes the data, usually Gaussian. Next for the given test feature, find the probability density at that point (the feature that is numerical) and use that for P(Feature | Y) & P(Feature | N).

The NB method is quite robust and can be scaled easily. There are some scenarios where we don't really want any of the probability numbers to be zero or 100%, just so that the method predicts a non-zero probability for unseen cases and never 100% for any particular case.

Some must buy books on probability
Fifty Challenging Problems in Probability with Solutions (Dover Books on Mathematics)
This book is a great compilation that covers quite a bit of puzzles. What I like about these puzzles are that they are all tractable and don't require too much advanced mathematics to solve.

Introduction to Algorithms
This is a book on algorithms, some of them are probabilistic. But the book is a must have for students, job candidates even full time engineers & data scientists

Introduction to Probability Theory

An Introduction to Probability Theory and Its Applications, Vol. 1, 3rd Edition

The Probability Tutoring Book: An Intuitive Course for Engineers and Scientists (and Everyone Else!)

Introduction to Probability, 2nd Edition

The Mathematics of Poker
Good read. Overall Poker/Blackjack type card games are a good way to get introduced to probability theory

Let There Be Range!: Crushing SSNL/MSNL No-Limit Hold'em Games
Easily the most expensive book out there. So if the item above piques your interest and you want to go pro, go for it.

Quantum Poker
Well written and easy to read mathematics. For the Poker beginner.


Bundle of Algorithms in Java, Third Edition, Parts 1-5: Fundamentals, Data Structures, Sorting, Searching, and Graph Algorithms (3rd Edition) (Pts. 1-5)
An excellent resource (students/engineers/entrepreneurs) if you are looking for some code that you can take and implement directly on the job.

Understanding Probability: Chance Rules in Everyday Life A bit pricy when compared to the first one, but I like the look and feel of the text used. It is simple to read and understand which is vital especially if you are trying to get into the subject

Data Mining: Practical Machine Learning Tools and Techniques, Third Edition (The Morgan Kaufmann Series in Data Management Systems) This one is a must have if you want to learn machine learning. The book is beautifully written and ideal for the engineer/student who doesn't want to get too much into the details of a machine learned approach but wants a working knowledge of it. There are some great examples and test data in the text book too.

Discovering Statistics Using R
This is a good book if you are new to statistics & probability while simultaneously getting started with a programming language. The book supports R and is written in a casual humorous way making it an easy read. Great for beginners. Some of the data on the companion website could be missing.


No comments:

Post a Comment