Skip to main content

Estimating Unseen Bugs in Software





submit to reddit

Q: Two engineers independently do quality assurance testing a large swath of code and discover \(e_1\) and \(e_2\) number of bugs of which \(e_c\) are common to both. The probability that each of them would find a bug given a large swath of code is \(p_1\) and \(p_2\) respectively. What is your best estimate of the number of unseen bugs in the code?

Toshiba Satellite C55D-A5240NR 15.6-Inch Laptop (Satin Black in Trax Horizon)

A: This puzzle is inspired from W Feller's book on introduction to probability. The total number of unique bugs identified are \(e_1 + e_2 - e_c\). Let \(B_0\) represent the total number of bugs in the software application. We could make the following statements
$$
e_1 = p_1 \times B_0 \\
e_2 = p_2 \times B_0 \\
e_c = p_1 p_2 \times B_0 = \frac{e_1 e_2}{B_0}
$$
The unseen bugs are simply
$$
\text{Unseen Bugs} = B_0 - (e_1 + e_2 - e_c)
$$
Combining the above two equations yields
$$
\text{Unseen Bugs} = \frac{e_1 e_2}{e_c} - (e_1 + e_2 - e_c)
$$
which simplifies to
$$
\text{Unseen Bugs} = \frac{(e_1 - e_c)(e_2 - e_c)}{e_c}
$$
Notice, the final result is independent of \(B_0\). Obviously, this may not be accurate. Assume both engineers found exactly the same bugs, i.e. \(e_1 = e_2 = e_c\), then the number of unseen bugs would become 0 which need not always be true. Also, the above equation is undefined when \(e_c = 0\). Nevertheless, this does provide a good way to estimate the number of unseen bugs in software. The original example done by Polya & Feller were on proof readers reading text and spotting spell errors.

You may also like A Bayesian Treasure Hunt

If you are looking to buy some books in probability here are some of the best books to learn the art of Probability


Fifty Challenging Problems in Probability with Solutions (Dover Books on Mathematics)
This book is a great compilation that covers quite a bit of puzzles. What I like about these puzzles are that they are all tractable and don't require too much advanced mathematics to solve.

Introduction to Algorithms
This is a book on algorithms, some of them are probabilistic. But the book is a must have for students, job candidates even full time engineers & data scientists

Introduction to Probability Theory
Overall an excellent book to learn probability, well recommended for undergrads and graduate students

An Introduction to Probability Theory and Its Applications, Vol. 1, 3rd Edition
This is a two volume book and the first volume is what will likely interest a beginner because it covers discrete probability. The book tends to treat probability as a theory on its own

The Probability Tutoring Book: An Intuitive Course for Engineers and Scientists (and Everyone Else!)
A good book for graduate level classes: has some practice problems in them which is a good thing. But that doesn't make this book any less of buy for the beginner.

Introduction to Probability, 2nd Edition
A good book to own. Does not require prior knowledge of other areas, but the book is a bit low on worked out examples.

Bundle of Algorithms in Java, Third Edition, Parts 1-5: Fundamentals, Data Structures, Sorting, Searching, and Graph Algorithms (3rd Edition) (Pts. 1-5)
An excellent resource (students, engineers and even entrepreneurs) if you are looking for some code that you can take and implement directly on the job

Understanding Probability: Chance Rules in Everyday Life
This is a great book to own. The second half of the book may require some knowledge of calculus. It appears to be the right mix for someone who wants to learn but doesn't want to be scared with the "lemmas"

Data Mining: Practical Machine Learning Tools and Techniques, Third Edition (The Morgan Kaufmann Series in Data Management Systems)
This one is a must have if you want to learn machine learning. The book is beautifully written and ideal for the engineer/student who doesn't want to get too much into the details of a machine learned approach but wants a working knowledge of it. There are some great examples and test data in the text book too.

Discovering Statistics Using R
This is a good book if you are new to statistics & probability while simultaneously getting started with a programming language. The book supports R and is written in a casual humorous way making it an easy read. Great for beginners. Some of the data on the companion website could be missing.

A Course in Probability Theory, Third Edition
Covered in this book are the central limit theorem and other graduate topics in probability. You will need to brush up on some mathematics before you dive in but most of that can be done online

Probability and Statistics (4th Edition)This book has been yellow-flagged with some issues: including sequencing of content that could be an issue. But otherwise its good


Comments

Popular posts from this blog

The Best Books to Learn Probability

If you are looking to buy some books in probability here are some of the best books to learn the art of Probability

The Probability Tutoring Book: An Intuitive Course for Engineers and Scientists (and Everyone Else!)
A good book for graduate level classes: has some practice problems in them which is a good thing. But that doesn't make this book any less of buy for the beginner.

An Introduction to Probability Theory and Its Applications, Vol. 1, 3rd Edition
This is a two volume book and the first volume is what will likely interest a beginner because it covers discrete probability. The book tends to treat probability as a theory on its own

Discovering Statistics Using R
This is a good book if you are new to statistics & probability while simultaneously getting started with a programming language. The book supports R and is written in a casual humorous way making it an easy read. Great for beginners. Some of the data on the companion website could be missing.

Fifty Challenging Probl…

The Three Magical Boxes



Q: You are playing a game wherein you are presented 3 magical boxes. Each box has a set probability of delivering a gold coin when you open it. On a single attempt, you can take the gold coin and close the box. In the next attempt you are free to either open the same box again or pick another box. You have a 100 attempts to open the boxes. You do not know what the win probability is for each of the boxes. What would be a strategy to maximize your returns?

Machine Learning: A Probabilistic Perspective (Adaptive Computation and Machine Learning series)

A: Problems of this type fall into a category of algorithms called "multi armed bandits". The name has its origin in casino slot machines wherein a bandit is trying to maximize his returns by pulling different arms of a slot machine by using several "arms". The dilemma he faces is similar to the game described above. Notice, the problem is a bit different from a typical estimation exercise. You co…

The Best Books for Time Series Analysis


If you are looking to learn time series analysis, the following are some of the best books in time series analysis.

Introductory Time Series with R (Use R!)
This is good book to get one started on time series. A nice aspect of this book is that it has examples in R and some of the data is part of standard R packages which makes good introductory material for learning the R language too. That said this is not exactly a graduate level book, and some of the data links in the book may not be valid.

Econometrics
A great book if you are in an economics stream or want to get into it. The nice thing in the book is it tries to bring out a oneness in all the methods used. Econ majors need to be up-to speed on the grounding mathematics for time series analysis to use this book. Outside of those prerequisites, this is one of the best books on econometrics and time series analysis.

Pattern Recognition and Machine Learning (Information Science and Statistics)
This is excelle…