Friday, January 31, 2014

Cholesky To The Rescue!

Gadil Nazari hit the speaker button and dialed. He looked back at his computer screen while the phone's ring played through the speaker.

"This is Mei"

"Ah Mei, I'm so happy you're in. This is Gadil in Engineering. How are you today?"

"Well, all the kids have a touch of the flu, but with the amount of soup we have, they'll be better in a jiffy! Besides that things are going really well. How are you?"

"Good, good"

"And Jana? Was she able to finish her business plan?"

Gadil's wife had recently put together a proposal which she was going to vet with some local venture capital firms. A real entrepreneuer's entrepreneuer. He liked that Mei had remembered.

"Yes! She got a lot of great feedback and decided to pivot a bit and is now prototyping some of the new concepts." Gadil decided to shift the conversation back to the task at hand. "Listen, the reason I'm calling is that we got a consultant's proposal for a project which has some high level simulations. Would you be able to review it? I think your input would help us make sure we are understanding what we are getting."

"Yes, absolutely! I love to see what others are doing in that field. Do you have something you can send to me or would you like to meet later today or tomorrow?" Mei asked.

Gadil moved his hand to the computer's keyboard and hit 'send'. "I just emailed you their presentation. Once you've looked at it can you give me a call and we can talk about how to proceed?"

"Yes, sounds good Gadil. I'll look at it later this morning and get back to you. Talk with you soon!" she said cheerfully as she rung off.

When we perform a Monte Carlo simulation using more than one variable, we need to account for the interplay of these factors during the simulation process.

One means to do this, which we have utilized in prior posts (see Mei's Monte Carlo Adventure or Should You Rebalance Your Investment Portfolio?), is to use the Cholesky process.

Who the heck is this Cholesky guy and what process did he develop?

The Multi-Variable Problem

One of the most common statistical distributions we simulate is the standard normal distribution. Random draws from this will have an average of 0 and a standard deviation of 1 - nice, easy numbers to work with.

Figure A
The Standard Normal Distribution
The standard normal curve's pattern has a distinctive bell shape. The average of 0 is the most likely occurrence, and decreases as we travel further from it.

The shape of the normal distribution's probability density function (stat speak for "what are the odds a certain number shows up?") is the bell curve, which is shown in Figure A.

My sons and I are in a program called Indian Guides (a program promoting father-son relationships), and our 'tribe' recently participated in a volunteer activity for the Feed My Starving Children organization.

Our task that night entailed filling bags with a concoction of vitamins, vegetables, protein, and rice. These bags were then sealed, boxed, and palleted, ready to be shipped the following day to any of a number of locations around the world the charity serves (Haiti, Phillipines, Somalia etc.).

Figure B
Helping to Feed Starving Children
Indian Guides filling 'Manna Packs' to be shipped around the world to feed those in need.

Let's say that at each step of this production process there was a 1 gram standard deviation from the target amount of the ingredient. In other words, 1 gram deviation each of vitamins, vegetables, protein, and rice.

What is the standard deviation of the package?

Under purely additive conditions, this would be 4 grams. However, the combination of the two samples produces something much less than that. Figure C shows the statistics for this process. While all the ingredients have means close to 0, as does the total bag, and while the standard deviations of all the ingredients is approximately 1 individually, the standard deviation of the total bag is only about 2, not 4!

Figure C
Bag Fill Statistics

In order to understand why this is the case, we can think of what happens with dice. If we roll one die, there is an equal 1/6 probability of each number coming up. This is what is called the uniform distribution. If we roll two dice, while each one of them has a 1/6 chance of turning up a certain number, the sum of the two together is no longer uniform distributed. There is a much greater probability of coming up with a 7 as opposed to a 2 or 12, because there are many more ways to make the 7 (3+4, 4+3, 2+5, etc.) than a two (1+1 only).

Figure D
Dice Probabilities
While each individual die has an equal probability for each of its outcomes (the green and red), the combination (gold) is no longer uniform.

Figure D shows a representation of these probabilities.

The same phenomenon occurs with our simulated normal distributions. If we imagine two bell shaped curves side by side, the combined curve will be like the dice, where there is greater probability of middle numbers and less of extremes, thus our combined standard deviation of 4 standard normal curves is only 2 instead of 4.


Enter Correlation

Mei sat across from Gadil in his office.

"The consultant analysis is in some ways inconsistent with our experience" Gadil explained. "And we are not sure why. The are convinced that they have modeled the correct parameters and therefore the results are the results"

"Gadil, is our experience that things fluctuate more widely or less?" Mei asked.

"Oh, definitely more widely" he replied.

"I see. I wonder if we could talk a little about these different variables and what they mean"

Up to this point we have considered the fluctuation of each of our variables to be independent, which means each one varies of its own accord without any consideration of the other, just as if one of our die shows up with a 2, the other is still equally likely to be any number between 1 and 6. The second die does not care that the first one came up with a 2 - it's thinking on its own!

What happens when our variables are no longer "independent", but the one impacts the other?

We can think of common situations where this occurs. The chance that we will get in a car accident is influenced by how good of a driver we are. Under normal conditions, the 'how good of a driver we are' factor will dominate. But when the weather is bad - snow, ice, rain - our chances of getting in an accident will increase. Our overall 'how good of a driver we are' ability is correlated with the weather conditions.

Correlation is the statistician's term for 'one thing moves in a relation to another'. However, we must be careful with correlation because some people confuse it with causation. Two or more things may vary in a relation, but it is not necessarily the case that one may be the cause of the other. There are five reasons why two factors may be correlated, only one of them being that A caused B (see this Wikipedia entry for more on this).

For our simulation purposes, we want to ensure that we create the correct correlation without modeling causation. We are able to accomplish this through the Cholesky decomposition.

The Cholesky Decomposition

Figure E
Correlation Matrix and Notation
The correlation matrix is shown with the numbers and the symbols

Andre Cholesky was a French mathematician who developed the matrix decomposition for which he is known as part of his surveying work in the military.

The 'decomposition' he created comes from the insight that a matrix, such as C, can be broken down into two separate matrices, T (a Lower Triangular matrix) and T transposed (transposing a matrix means reversing its order, in this case resulting in an Upper Triangular matrix). Let's unpack this very dense definition.

Let's say we have a correlation matrix with 4 variables from our Feed My Starving Children process. We can identify the components of the matrix by using row and column notation in the subscripts. Figure E shows our correlation matrix in numerical and symbolic form.

Figure F
Triangular Matrices
The Upper Triangular Matrix (top) and Lower Triangular Matrix (bottom) in symbolic form

Triangular matrices have values in one part of the matrix and 0's in the other, thus creating a triangular pattern. Figure F shows these symbolically.

In the Cholesky decomposition, we can break down our correlation matrix into a Lower Triangular Matrix and an Upper Triangular Matrix with transposed values. In other words, the value in row 2, column 1 in the Lower Triangle becomes the value in row 1, column 2 in the Upper Triangle. You can think about these matrices as being similar to square roots of numbers.

To show the entire decomposition then, we have the matrix equation shown in Figure G.

Figure G
Cholesky Matrix Equation
The Correlation matrix is the product of a Lower Triangular Matrix multiplied by the same values transposed into an Upper Triangular Matrix

Figure H
Cholesky Factors' Formulas
On diagonal factors (where the row equals the column) use one equation, while the other factors use a second. Since it is a Triangular matrix, the other part is simply 0. Inputs to the equations are either the Correlation matrix - C , or the Cholesky Triangular Matrix - T.

The elements for each part of the Lower Triangular Matrix can be calculated using the formulas in Figure H. The equations vary depending on whether the element is "on the diagonal" or not.

The website Rosetta Code has code for the calculation of these factors in a number of languages (Python, PERL, VBA, etc.). I made a spreadsheet that lays out both the covariance and cholesky matrics based on the inputs of weights, standard deviations and correlations which you can get here.

In R (R is an open source statistical software) it can be calculated using the chol() function. However, in order to ensure I could calculate the equations without assistance, and to practice my R skills, I also programmed the formulas "by hand". If you would like that code, along with the rest of the code used in this post, you can get it here. As always, good analytic practice requires that you check your work, and I verified that the "by hand" formula did indeed match the chol() function's results.

Now What?

Mei was seated in a conference overlooking the city below.

"How do you control for the fact that mixing distributions lowers the standard deviation?" she asked the consultants in the room.

"We don't have to do that because the factors are independent." one of the consultants, George, replied. "Each distribution stands on its own."

"Perhaps, but then why is it that the results do not match up with our data?" she continued.

Now that we have a Cholesky matrix, we can continue with our simulation process.

Matrix multiplication requires that the first matrix has the same number of columns as the second matrix has rows. The resulting matrix will be a one with the same number of rows as the first and the same number of columns as the second. Figure I shows this pictorally.

Figure I
Basic Matrix Multiplication
A matrix with m rows and n columns multiplying a matrix with n rows and c columns results in an m row, c column matrix.

With a row of random numbers (4 in our Feed My Starving Children example), we will have a 1 x 4 matrix for the variables, a 4 x 4 Cholesky matrix, with an output matrix of 1 x 4. Figure J is an example of one calculation using this method. Notice that the Lower Triangular Cholesky matrix we created has been transposed so that it is Upper Triangular.

Figure J
Simulation Multiplication Example
A 1x4 vector of random numbers is multiplied by one of the Cholesky columns (a 4x1 vector), resulting in a single value (i.e. a 1x1 matrix) for the new variable.

If we calculated the Cholesky values using the correlation matrix, the resulting values (we can call them "Adjusted Random Variables") are then multiplied by each variables' standard deviation and added to the mean for that variable, which completes the result for one simulation. The spreadsheet I mentioned earlier has an example of this calculation for 1000 random variables.

If we calculated the Cholesky values using the covariance matrix, then the standard deviations have already been "scaled in", so we merely need to add the mean to the Adjusted Random Variable.

Figure K shows the results for each variable using R for the simulation, along with the correlation matrix. Note that these values are similar (as they should be, as this is the whole point!) to the correlation matrix in Figure E.

Figure K
Simulation Comparison
First summary is the Standard Normal Variables, whose means are close to 0 and standard deviations are close to 1. The second summary is the Adjusted Random Variables. These means are also close to 0 and standard deviations close to 1. Due to correlation, the Totals differ between the two. Because of the correlation impact, the standard deviation of the 2nd group is higher (2.58... vs. 1.93...), even though the individual elements have essentially the same means and standard deviations!. The first correlation matrix shows the Standard Normal Variables to be uncorrelated, since off-diagonal elements are near 0. The second correlation matrix shows the simulated results for the Adjusted Random Variables, which are close to the values of the 3rd matrix, which is the correlation matrix we used to construct the Cholesky factors.

What Can We Do?

The Cholesky process allows us to model correlation patterns without disrupting the statistical characteristics of each of the individual elements. How can we use this in the 'real world'?

Improve Modeling Accuracy of Processes with Multiple Variables - rather than accept as fact the variables are uncorrelated, as the consultants did in the vignette in this post, we can use our data to ensure that any correlations that are present are factored into account as our model is developed.

Establish Non-Conventional Probability Patterns - given the ability to create correlated variables, we can use multiple variables to create probability patterns that are unique. If we want 3 "humps" or 5 in our pattern, we can create these by building up several variables and tying them together via correlations. The techniques to do this will need to be discussed in another post.

Solve Data and Mathematical Problems - the Cholesky decomposition is quite similar to taking the square root of a matrix. If we are presented with a set of data, and would like to use it in an equation, the decomposition can be useful to help us solve the equation mathematically.

Key Takeaways

The Cholesky process is used to ensure that our simulation of multiple variables evidences our desired pattern of correlation. It is also a tool to create model parameters and to solve data/mathematical problems.

You May Also Like

Usage of Cholesky matrix in an investment portfolio setting

Mei's Monte Carlo Adventure

Should You Rebalance Your Investment Portfolio?

Downloads of tools used in thie post

Excel Spreadsheet

R Code

Questions
    ::Do you have a story about using the Cholesky decomposition in a model creating situation?
    ::What other methods have you seen to account for correlation in developing models?
    ::Can the Cholesky process be used for situations where the distributions are not normal?

Add to the discussion with your thoughts, comments, questions and feedback! Please share Treasury Café with others. Thank you!

6 comments:

  1. Hi David. Thanks for this post - I enjoy trying to wrap my head around complex math topics. I *think* I understand the purpose of the Cholesky process from your description. Is this a correct interpretation?: A modeler can convert uncorrelated standard normal RVs into adjusted RVs that reflect correlations in real data using the Cholesky matrix built from the data's correlation matrix. Best regards, Dave

    ReplyDelete
    Replies
    1. David,

      Yes, that would be a correct interpretation. I share your interest in looking into math topics, my limitation appears to be that I have a strong need for tangibility - a lot of the math folks use symbols and shorthand, with the result for me being I cannot answer the question "that's great, now how do I make that work in a spreadsheet?" Hence every once in a while I try to tackle that on my own through these blog posts :-)

      Delete
  2. This comment has been removed by the author.

    ReplyDelete
  3. Hi David,

    I would like to thank you for an excellent post. I have studied this topic years ago at the Uni, but forgotten much of that. Thank you for making article so readable and interesting :)

    ReplyDelete
    Replies
    1. This comment has been removed by the author.

      Delete
  4. Hi, David, great post though i have a question that i ope you will answer.
    I'm still trying to understand the reason for using the Cholesky decomposition. I understand it is required from a statistical/mathematical point of view, but i don't get the underlying intuition for using it, rather than using the original correlation matrix (i.e. why should i use a lower/upper triangular matrix instead of the full symmetrical matrx?)

    hope you'll get a chance to see this,
    Thanks

    ReplyDelete