*Gadil Nazari hit the speaker button and dialed. He looked back at his computer screen while the phone's ring played through the speaker.*

*"This is Mei"*

*"Ah Mei, I'm so happy you're in. This is Gadil in Engineering. How are you today?"*

*"Well, all the kids have a touch of the flu, but with the amount of soup we have, they'll be better in a jiffy! Besides that things are going really well. How are you?"*

*"Good, good"*

*"And Jana? Was she able to finish her business plan?"*

*Gadil's wife had recently put together a proposal which she was going to vet with some local venture capital firms. A real entrepreneuer's entrepreneuer. He liked that Mei had remembered.*

*"Yes! She got a lot of great feedback and decided to pivot a bit and is now prototyping some of the new concepts." Gadil decided to shift the conversation back to the task at hand. "Listen, the reason I'm calling is that we got a consultant's proposal for a project which has some high level simulations. Would you be able to review it? I think your input would help us make sure we are understanding what we are getting."*

*"Yes, absolutely! I love to see what others are doing in that field. Do you have something you can send to me or would you like to meet later today or tomorrow?" Mei asked.*

*Gadil moved his hand to the computer's keyboard and hit 'send'. "I just emailed you their presentation. Once you've looked at it can you give me a call and we can talk about how to proceed?"*

*"Yes, sounds good Gadil. I'll look at it later this morning and get back to you. Talk with you soon!" she said cheerfully as she rung off.*

When we perform a Monte Carlo simulation using more than one variable, we need to account for the interplay of these factors during the simulation process.

One means to do this, which we have utilized in prior posts (see Mei's Monte Carlo Adventure or Should You Rebalance Your Investment Portfolio?), is to use the Cholesky process.

Who the heck is this Cholesky guy and what process did he develop?

**The Multi-Variable Problem**

One of the most common statistical distributions we simulate is the **standard normal distribution**. Random draws from this will have an average of 0 and a standard deviation of 1 - nice, easy numbers to work with.

*The standard normal curve's pattern has a distinctive bell shape. The average of 0 is the most likely occurrence, and decreases as we travel further from it.*

The shape of the normal distribution's **probability density function** (stat speak for "what are the odds a certain number shows up?") is the bell curve, which is shown in Figure A.

My sons and I are in a program called Indian Guides (a program promoting father-son relationships), and our 'tribe' recently participated in a volunteer activity for the Feed My Starving Children organization.

Our task that night entailed filling bags with a concoction of vitamins, vegetables, protein, and rice. These bags were then sealed, boxed, and palleted, ready to be shipped the following day to any of a number of locations around the world the charity serves (Haiti, Phillipines, Somalia etc.).

*Indian Guides filling 'Manna Packs' to be shipped around the world to feed those in need.*

Let's say that at each step of this production process there was a 1 gram standard deviation from the target amount of the ingredient. In other words, 1 gram deviation each of vitamins, vegetables, protein, and rice.

What is the standard deviation of the package?

Under purely additive conditions, this would be 4 grams. However, the combination of the two samples produces something much less than that. Figure C shows the statistics for this process. While all the ingredients have means close to 0, __as does the total bag__, and while the standard deviations of all the ingredients is approximately 1 individually, the standard deviation of the total bag is only about 2, __not 4__!

In order to understand why this is the case, we can think of what happens with dice. If we roll one die, there is an equal 1/6 probability of each number coming up. This is what is called the __uniform distribution__. If we roll two dice, while each one of them has a 1/6 chance of turning up a certain number, the sum of the two together **is no longer uniform** distributed. There is a much greater probability of coming up with a 7 as opposed to a 2 or 12, because there are many more ways to make the 7 (3+4, 4+3, 2+5, etc.) than a two (1+1 only).

*While each individual die has an equal probability for each of its outcomes (the green and red), the combination (gold) is no longer uniform.*

Figure D shows a representation of these probabilities.

The same phenomenon occurs with our simulated normal distributions. If we imagine two bell shaped curves side by side, the combined curve will be like the dice, where there is greater probability of middle numbers and less of extremes, thus our combined standard deviation of 4 standard normal curves is only 2 instead of 4.

**Enter Correlation**

*Mei sat across from Gadil in his office.*

*"The consultant analysis is in some ways inconsistent with our experience" Gadil explained. "And we are not sure why. The are convinced that they have modeled the correct parameters and therefore the results are the results"*

*"Gadil, is our experience that things fluctuate more widely or less?" Mei asked.*

*"Oh, definitely more widely" he replied.*

*"I see. I wonder if we could talk a little about these different variables and what they mean"*

Up to this point we have considered the fluctuation of each of our variables to be __independent__, which means each one varies of its own accord without any consideration of the other, just as if one of our die shows up with a 2, the other is still equally likely to be any number between 1 and 6. The second die does not care that the first one came up with a 2 - it's thinking on its own!

What happens when our variables are no longer "independent", but the one impacts the other?

We can think of common situations where this occurs. The chance that we will get in a car accident is influenced by how good of a driver we are. Under normal conditions, the 'how good of a driver we are' factor will dominate. But when the weather is bad - snow, ice, rain - our chances of getting in an accident will increase. Our overall 'how good of a driver we are' ability is **correlated** with the weather conditions.

Correlation is the statistician's term for 'one thing moves in a relation to another'. However, we must be careful with correlation because some people confuse it with __causation__. Two or more things may vary in a relation, but it is not necessarily the case that one may be the cause of the other. There are five reasons why two factors may be correlated, __only one__ of them being that A caused B (see this Wikipedia entry for more on this).

For our simulation purposes, we want to ensure that we create the correct correlation without modeling causation. We are able to accomplish this through the Cholesky decomposition.

**The Cholesky Decomposition**

*The correlation matrix is shown with the numbers and the symbols*

Andre Cholesky was a French mathematician who developed the matrix decomposition for which he is known as part of his surveying work in the military.

The 'decomposition' he created comes from the insight that a matrix, such as C, can be broken down into two separate matrices, T (a Lower Triangular matrix) and T transposed (transposing a matrix means reversing its order, in this case resulting in an Upper Triangular matrix). Let's unpack this very dense definition.

Let's say we have a correlation matrix with 4 variables from our Feed My Starving Children process. We can identify the components of the matrix by using row and column notation in the subscripts. Figure E shows our correlation matrix in numerical and symbolic form.

*The Upper Triangular Matrix (top) and Lower Triangular Matrix (bottom) in symbolic form*

Triangular matrices have values in one part of the matrix and 0's in the other, thus creating a triangular pattern. Figure F shows these symbolically.

In the Cholesky decomposition, we can break down our correlation matrix into a Lower Triangular Matrix and an Upper Triangular Matrix with transposed values. In other words, the value in row 2, column 1 in the Lower Triangle becomes the value in row 1, column 2 in the Upper Triangle. You can think about these matrices as being similar to square roots of numbers.

To show the entire decomposition then, we have the matrix equation shown in Figure G.

*The Correlation matrix is the product of a Lower Triangular Matrix multiplied by the same values transposed into an Upper Triangular Matrix*

*On diagonal factors (where the row equals the column) use one equation, while the other factors use a second. Since it is a Triangular matrix, the other part is simply 0. Inputs to the equations are either the Correlation matrix - C , or the Cholesky Triangular Matrix - T.*

The elements for each part of the Lower Triangular Matrix can be calculated using the formulas in Figure H. The equations vary depending on whether the element is "on the diagonal" or not.

The website Rosetta Code has code for the calculation of these factors in a number of languages (Python, PERL, VBA, etc.). I made a spreadsheet that lays out both the covariance and cholesky matrics based on the inputs of weights, standard deviations and correlations which you can get here.

In R (R is an open source statistical software) it can be calculated using the *chol()* function. However, in order to ensure I could calculate the equations without assistance, and to practice my R skills, I also programmed the formulas "by hand". If you would like that code, along with the rest of the code used in this post, you can get it here. As always, good analytic practice requires that you check your work, and I verified that the "by hand" formula did indeed match the *chol()* function's results.

**Now What?**

*Mei was seated in a conference overlooking the city below.*

*"How do you control for the fact that mixing distributions lowers the standard deviation?" she asked the consultants in the room.*

*"We don't have to do that because the factors are independent." one of the consultants, George, replied. "Each distribution stands on its own."*

*"Perhaps, but then why is it that the results do not match up with our data?" she continued.*

Now that we have a Cholesky matrix, we can continue with our simulation process.

Matrix multiplication requires that the first matrix has the same number of columns as the second matrix has rows. The resulting matrix will be a one with the same number of rows as the first and the same number of columns as the second. Figure I shows this pictorally.

*A matrix with m rows and n columns multiplying a matrix with n rows and c columns results in an m row, c column matrix.*

With a row of random numbers (4 in our Feed My Starving Children example), we will have a 1 x 4 matrix for the variables, a 4 x 4 Cholesky matrix, with an output matrix of 1 x 4. Figure J is an example of one calculation using this method. Notice that the Lower Triangular Cholesky matrix we created has been transposed so that it is Upper Triangular.

*A 1x4 vector of random numbers is multiplied by one of the Cholesky columns (a 4x1 vector), resulting in a single value (i.e. a 1x1 matrix) for the new variable.*

If we calculated the Cholesky values using the __correlation matrix__, the resulting values (we can call them "Adjusted Random Variables") are then multiplied by each variables' standard deviation and added to the mean for that variable, which completes the result for one simulation. The spreadsheet I mentioned earlier has an example of this calculation for 1000 random variables.

If we calculated the Cholesky values using the __covariance matrix__, then the standard deviations have already been "scaled in", so we merely need to add the mean to the Adjusted Random Variable.

Figure K shows the results for each variable using R for the simulation, along with the correlation matrix. Note that these values are similar (as they should be, as this is the whole point!) to the correlation matrix in Figure E.

*First summary is the Standard Normal Variables, whose means are close to 0 and standard deviations are close to 1. The second summary is the Adjusted Random Variables. These means are*

__also__close to 0 and standard deviations close to 1. Due to correlation, the Totals differ between the two. Because of the correlation impact, the standard deviation of the 2nd group is higher (2.58... vs. 1.93...),__even though the individual elements have essentially the same means and standard deviations!__. The first correlation matrix shows the Standard Normal Variables to be uncorrelated, since off-diagonal elements are near 0. The second correlation matrix shows the simulated results for the Adjusted Random Variables, which are close to the values of the 3rd matrix, which is the correlation matrix we used to construct the Cholesky factors.**What Can We Do?**

The Cholesky process allows us to model correlation patterns without disrupting the statistical characteristics of each of the individual elements. How can we use this in the 'real world'?

**Improve Modeling Accuracy of Processes with Multiple Variables** - rather than accept as fact the variables are uncorrelated, as the consultants did in the vignette in this post, we can use our data to ensure that any correlations that are present are factored into account as our model is developed.

**Establish Non-Conventional Probability Patterns** - given the ability to create correlated variables, we can use multiple variables to create probability patterns that are unique. If we want 3 "humps" or 5 in our pattern, we can create these by building up several variables and tying them together via correlations. The techniques to do this will need to be discussed in another post.

**Solve Data and Mathematical Problems** - the Cholesky decomposition is quite similar to taking the square root of a matrix. If we are presented with a set of data, and would like to use it in an equation, the decomposition can be useful to help us solve the equation mathematically.

**Key Takeaways**

**The Cholesky process is used to ensure that our simulation of multiple variables evidences our desired pattern of correlation. It is also a tool to create model parameters and to solve data/mathematical problems.**

**You May Also Like**

Usage of Cholesky matrix in an investment portfolio setting

Should You Rebalance Your Investment Portfolio?

Downloads of tools used in thie post

**Questions**

**::**Do you have a story about using the Cholesky decomposition in a model creating situation?

**::**What other methods have you seen to account for correlation in developing models?

**::**Can the Cholesky process be used for situations where the distributions are not normal?

**Add to the discussion** with your thoughts, comments, questions and feedback! **Please share Treasury CafĂ©** with others. Thank you!

Hi David. Thanks for this post - I enjoy trying to wrap my head around complex math topics. I *think* I understand the purpose of the Cholesky process from your description. Is this a correct interpretation?: A modeler can convert uncorrelated standard normal RVs into adjusted RVs that reflect correlations in real data using the Cholesky matrix built from the data's correlation matrix. Best regards, Dave

ReplyDeleteDavid,

DeleteYes, that would be a correct interpretation. I share your interest in looking into math topics, my limitation appears to be that I have a strong need for tangibility - a lot of the math folks use symbols and shorthand, with the result for me being I cannot answer the question "that's great, now how do I make that work in a spreadsheet?" Hence every once in a while I try to tackle that on my own through these blog posts :-)

This comment has been removed by the author.

ReplyDeleteHi David,

ReplyDeleteI would like to thank you for an excellent post. I have studied this topic years ago at the Uni, but forgotten much of that. Thank you for making article so readable and interesting :)

This comment has been removed by the author.

DeleteThis comment has been removed by a blog administrator.

ReplyDelete