## Thursday, December 29, 2011

### Baby...or Bathwater?

Envision the following scenario:

One of your risk professionals comes up to you and says “We should purchase this analytic system because it successfully predicted the downfall of the financial institutions that failed during 2008, it’s great!”
How should we respond?

To answer this question, we need to take a small detour into the wonderful world of statistics.

Who is Null and Why Did He or She Have a Hypothesis?

Statistical comparisons involve two sets of data, the “control” group and the “treatment” group. There are two possible relationships between these two groups - either there is not anything conclusive to show that they are really that different (the Null Hypothesis) or there is (the Alternative Hypothesis).

The person doing the investigating is usually looking to provide evidence for the Alternative Hypothesis. Alternative Hypothesis’ are based on questions such as: Does this drug work? Are guilty people sent to jail? Does this model adequately predict firms’ financial ruin?
The answer to this is determined using statistical techniques. Two outcomes are possible, 1) we “fail to reject the Null hypothesis”, meaning there is no evidence to support that the two groups are not the same, or 2) we “reject the Null hypothesis”, meaning the two groups are significantly different.
Poor Mr. or Ms. Null!

Imagine proposing to someone that way – nice dinner, get down on your knee, bring out the ring box, and say “I fail to reject you as a spouse”! Pretty romantic, isn’t it?

Why the Funny Terminology?

I am sure some know the technical reason for this (if so, please leave a comment!), but the thing I think of (maybe because it is easier to remember) is Nassim Taleeb’s discussion in “The Black Swan”, which went something like the following:
We can count 10,000 swans, or as many as we have ever seen (if more), and they can all be white, but that does not prove “all swans are white”. It just means the ones we have observed are.
However, we can count 10 swans, or as few as 2, and if one of them is black, that does prove that “not all swans are white”.
One thing is provable, one is not. Thus the funky terminology about “rejecting or failing to reject” the Null hypothesis.

Enter Reality

So we have two statistical outcomes of the data based on the Null Hypothesis, and two sets of data in real life that may or may not be different.

Any consultant knows that this should become a 2x2 matrix! This matrix will have one axis cover the statistical conclusions regarding the data and the other axis describe the actual reality of the data.

In two of the four boxes, reality matches the conclusion, and in the other two it does not.
The Romance Continues - Adding Error to Rejection

Statisticians have come up with some great terms for the other two boxes in this matrix. It is with great pride that I present to you these two inspired terms:
·        Type I error
·        Type II error
We can all see why they aware PhD’s for this kind of stuff!

Our matrix is now complete, as follows:

Going back to our original scenario, we have been presented with this fantastic but costly software that is able to predict with 100% accuracy a firms’ financial failure.

The question that we need to ask is “what is the full set of prediction data this model generated?”, and how does this compare to the alternative?

In the actual outcome of the above, this financial model predicted that over 30 financial institutions were going to fail! In other words, it threw out the baby with the bathwater!
Key Takeaways

While we do not want to “throw out the baby with the bathwater”, nor do we want to “keep the bathwater with the baby!”
Questions

·         What has been your experience with Type I and Type II error
·         What has been your experience regarding the omission of one of the two?