Skip to main content

The Forgotten Geometric Mean.

Often times a lot of people working with data are trying to create an index of some sort. Something that captures a set of key business metrics. If you are a site (or an app) you want to create some sort of an engagement index, which if trending up implies good things are happening, bad if it is trending down. The creators of such metrics (think analysts) tend to prefer a weighted arithmetic mean of the influencing factors. If the influencing factors are f1,f2, f3 (say) with weights w1, w2, w3 then the index would be computed as

However, what does not get factored in are the final consumers of the index (think product managers) and there could be many. They will invariably try to check it with something else they have handy. For example, if clicks on a site went up 20% the index may be up by just 5% (say) or vice-versa. If resources are being allocated based on the movement of such an index, it will invariably lead to contention on what is the right weighting to be given to each factor.

This is meant to be a short write up on some really cool features of the geometric mean. The geometric mean is not meant to replace a simple arithmetic mean based index, but it is definitely worth the thought. To illustrate what this aspect is, lets take a look at a simple two feature index. If the features are X and Y the arithmetic mean index can be represented as

To see how it responds to changes, lets take the derivative.

Clearly the derivative is dependent on the chosen weight. Lets see what happens when we choose the geometric mean.

Again, to see how it responds to change, lets take the derivative.

which can be further simplified to

The result is a useful derivable condition

i.e. the percentage change in the index is directly proportional to the percentage change in the feature.
Note, there are no hand chosen weights here. A five percent change in one of the influencing factors will result in a proportional percent change in the index. Extremely useful !

Yet another aspect consumers like to quantify is growth. If the index went up by x1 and x2 in consecutive years, what is the average quarterly/annual growth? If we took it as the average of x1 and x2, then the growth after two years (say) would be estimated as

Contrast that to the actual growth

Clearly some terms cancel out. We are left comparing

Notice one of them is the arithmetic mean and the other is the geometric mean. We also know from a well established theorem that the arithmetic mean is always greater than the geometric mean described here. So we would always end up overestimating the growth!

So how would we choose a value to project as an average growth rate? We are looking for a beta in the below equation

Yet again stating the average growth as the geometric mean gives the end user a handy metric to work with.

If you are interested in learning probability here are a set of good books to choose and buy from.


Popular posts from this blog

The Best Books to Learn Probability

If you are looking to buy some books in probability here are some of the best books to learn the art of Probability

The Probability Tutoring Book: An Intuitive Course for Engineers and Scientists (and Everyone Else!)
A good book for graduate level classes: has some practice problems in them which is a good thing. But that doesn't make this book any less of buy for the beginner.

An Introduction to Probability Theory and Its Applications, Vol. 1, 3rd Edition
This is a two volume book and the first volume is what will likely interest a beginner because it covers discrete probability. The book tends to treat probability as a theory on its own

Discovering Statistics Using R
This is a good book if you are new to statistics & probability while simultaneously getting started with a programming language. The book supports R and is written in a casual humorous way making it an easy read. Great for beginners. Some of the data on the companion website could be missing.

Fifty Challenging Probl…

The Three Magical Boxes

Q: You are playing a game wherein you are presented 3 magical boxes. Each box has a set probability of delivering a gold coin when you open it. On a single attempt, you can take the gold coin and close the box. In the next attempt you are free to either open the same box again or pick another box. You have a 100 attempts to open the boxes. You do not know what the win probability is for each of the boxes. What would be a strategy to maximize your returns?

Machine Learning: A Probabilistic Perspective (Adaptive Computation and Machine Learning series)

A: Problems of this type fall into a category of algorithms called "multi armed bandits". The name has its origin in casino slot machines wherein a bandit is trying to maximize his returns by pulling different arms of a slot machine by using several "arms". The dilemma he faces is similar to the game described above. Notice, the problem is a bit different from a typical estimation exercise. You co…

The Best Books for Time Series Analysis

If you are looking to learn time series analysis, the following are some of the best books in time series analysis.

Introductory Time Series with R (Use R!)
This is good book to get one started on time series. A nice aspect of this book is that it has examples in R and some of the data is part of standard R packages which makes good introductory material for learning the R language too. That said this is not exactly a graduate level book, and some of the data links in the book may not be valid.

A great book if you are in an economics stream or want to get into it. The nice thing in the book is it tries to bring out a oneness in all the methods used. Econ majors need to be up-to speed on the grounding mathematics for time series analysis to use this book. Outside of those prerequisites, this is one of the best books on econometrics and time series analysis.

Pattern Recognition and Machine Learning (Information Science and Statistics)
This is excelle…