Friday, January 31, 2014

Cholesky To The Rescue!

Gadil Nazari hit the speaker button and dialed. He looked back at his computer screen while the phone's ring played through the speaker.

"This is Mei"

"Ah Mei, I'm so happy you're in. This is Gadil in Engineering. How are you today?"

"Well, all the kids have a touch of the flu, but with the amount of soup we have, they'll be better in a jiffy! Besides that things are going really well. How are you?"

"Good, good"

"And Jana? Was she able to finish her business plan?"

Gadil's wife had recently put together a proposal which she was going to vet with some local venture capital firms. A real entrepreneuer's entrepreneuer. He liked that Mei had remembered.

"Yes! She got a lot of great feedback and decided to pivot a bit and is now prototyping some of the new concepts." Gadil decided to shift the conversation back to the task at hand. "Listen, the reason I'm calling is that we got a consultant's proposal for a project which has some high level simulations. Would you be able to review it? I think your input would help us make sure we are understanding what we are getting."

"Yes, absolutely! I love to see what others are doing in that field. Do you have something you can send to me or would you like to meet later today or tomorrow?" Mei asked.

Gadil moved his hand to the computer's keyboard and hit 'send'. "I just emailed you their presentation. Once you've looked at it can you give me a call and we can talk about how to proceed?"

"Yes, sounds good Gadil. I'll look at it later this morning and get back to you. Talk with you soon!" she said cheerfully as she rung off.

When we perform a Monte Carlo simulation using more than one variable, we need to account for the interplay of these factors during the simulation process.

One means to do this, which we have utilized in prior posts (see Mei's Monte Carlo Adventure or Should You Rebalance Your Investment Portfolio?), is to use the Cholesky process.

Who the heck is this Cholesky guy and what process did he develop?

The Multi-Variable Problem

One of the most common statistical distributions we simulate is the standard normal distribution. Random draws from this will have an average of 0 and a standard deviation of 1 - nice, easy numbers to work with.

Figure A
The Standard Normal Distribution
The standard normal curve's pattern has a distinctive bell shape. The average of 0 is the most likely occurrence, and decreases as we travel further from it.

The shape of the normal distribution's probability density function (stat speak for "what are the odds a certain number shows up?") is the bell curve, which is shown in Figure A.

My sons and I are in a program called Indian Guides (a program promoting father-son relationships), and our 'tribe' recently participated in a volunteer activity for the Feed My Starving Children organization.

Our task that night entailed filling bags with a concoction of vitamins, vegetables, protein, and rice. These bags were then sealed, boxed, and palleted, ready to be shipped the following day to any of a number of locations around the world the charity serves (Haiti, Phillipines, Somalia etc.).

Figure B
Helping to Feed Starving Children
Indian Guides filling 'Manna Packs' to be shipped around the world to feed those in need.

Let's say that at each step of this production process there was a 1 gram standard deviation from the target amount of the ingredient. In other words, 1 gram deviation each of vitamins, vegetables, protein, and rice.

What is the standard deviation of the package?

Under purely additive conditions, this would be 4 grams. However, the combination of the two samples produces something much less than that. Figure C shows the statistics for this process. While all the ingredients have means close to 0, as does the total bag, and while the standard deviations of all the ingredients is approximately 1 individually, the standard deviation of the total bag is only about 2, not 4!

Figure C
Bag Fill Statistics

In order to understand why this is the case, we can think of what happens with dice. If we roll one die, there is an equal 1/6 probability of each number coming up. This is what is called the uniform distribution. If we roll two dice, while each one of them has a 1/6 chance of turning up a certain number, the sum of the two together is no longer uniform distributed. There is a much greater probability of coming up with a 7 as opposed to a 2 or 12, because there are many more ways to make the 7 (3+4, 4+3, 2+5, etc.) than a two (1+1 only).

Figure D
Dice Probabilities
While each individual die has an equal probability for each of its outcomes (the green and red), the combination (gold) is no longer uniform.

Figure D shows a representation of these probabilities.

The same phenomenon occurs with our simulated normal distributions. If we imagine two bell shaped curves side by side, the combined curve will be like the dice, where there is greater probability of middle numbers and less of extremes, thus our combined standard deviation of 4 standard normal curves is only 2 instead of 4.

Enter Correlation

Mei sat across from Gadil in his office.

"The consultant analysis is in some ways inconsistent with our experience" Gadil explained. "And we are not sure why. The are convinced that they have modeled the correct parameters and therefore the results are the results"

"Gadil, is our experience that things fluctuate more widely or less?" Mei asked.

"Oh, definitely more widely" he replied.

"I see. I wonder if we could talk a little about these different variables and what they mean"

Up to this point we have considered the fluctuation of each of our variables to be independent, which means each one varies of its own accord without any consideration of the other, just as if one of our die shows up with a 2, the other is still equally likely to be any number between 1 and 6. The second die does not care that the first one came up with a 2 - it's thinking on its own!

What happens when our variables are no longer "independent", but the one impacts the other?

We can think of common situations where this occurs. The chance that we will get in a car accident is influenced by how good of a driver we are. Under normal conditions, the 'how good of a driver we are' factor will dominate. But when the weather is bad - snow, ice, rain - our chances of getting in an accident will increase. Our overall 'how good of a driver we are' ability is correlated with the weather conditions.

Correlation is the statistician's term for 'one thing moves in a relation to another'. However, we must be careful with correlation because some people confuse it with causation. Two or more things may vary in a relation, but it is not necessarily the case that one may be the cause of the other. There are five reasons why two factors may be correlated, only one of them being that A caused B (see this Wikipedia entry for more on this).

For our simulation purposes, we want to ensure that we create the correct correlation without modeling causation. We are able to accomplish this through the Cholesky decomposition.

The Cholesky Decomposition

Figure E
Correlation Matrix and Notation
The correlation matrix is shown with the numbers and the symbols

Andre Cholesky was a French mathematician who developed the matrix decomposition for which he is known as part of his surveying work in the military.

The 'decomposition' he created comes from the insight that a matrix, such as C, can be broken down into two separate matrices, T (a Lower Triangular matrix) and T transposed (transposing a matrix means reversing its order, in this case resulting in an Upper Triangular matrix). Let's unpack this very dense definition.

Let's say we have a correlation matrix with 4 variables from our Feed My Starving Children process. We can identify the components of the matrix by using row and column notation in the subscripts. Figure E shows our correlation matrix in numerical and symbolic form.

Figure F
Triangular Matrices
The Upper Triangular Matrix (top) and Lower Triangular Matrix (bottom) in symbolic form

Triangular matrices have values in one part of the matrix and 0's in the other, thus creating a triangular pattern. Figure F shows these symbolically.

In the Cholesky decomposition, we can break down our correlation matrix into a Lower Triangular Matrix and an Upper Triangular Matrix with transposed values. In other words, the value in row 2, column 1 in the Lower Triangle becomes the value in row 1, column 2 in the Upper Triangle. You can think about these matrices as being similar to square roots of numbers.

To show the entire decomposition then, we have the matrix equation shown in Figure G.

Figure G
Cholesky Matrix Equation
The Correlation matrix is the product of a Lower Triangular Matrix multiplied by the same values transposed into an Upper Triangular Matrix

Figure H
Cholesky Factors' Formulas
On diagonal factors (where the row equals the column) use one equation, while the other factors use a second. Since it is a Triangular matrix, the other part is simply 0. Inputs to the equations are either the Correlation matrix - C , or the Cholesky Triangular Matrix - T.

The elements for each part of the Lower Triangular Matrix can be calculated using the formulas in Figure H. The equations vary depending on whether the element is "on the diagonal" or not.

The website Rosetta Code has code for the calculation of these factors in a number of languages (Python, PERL, VBA, etc.). I made a spreadsheet that lays out both the covariance and cholesky matrics based on the inputs of weights, standard deviations and correlations which you can get here.

In R (R is an open source statistical software) it can be calculated using the chol() function. However, in order to ensure I could calculate the equations without assistance, and to practice my R skills, I also programmed the formulas "by hand". If you would like that code, along with the rest of the code used in this post, you can get it here. As always, good analytic practice requires that you check your work, and I verified that the "by hand" formula did indeed match the chol() function's results.

Now What?

Mei was seated in a conference overlooking the city below.

"How do you control for the fact that mixing distributions lowers the standard deviation?" she asked the consultants in the room.

"We don't have to do that because the factors are independent." one of the consultants, George, replied. "Each distribution stands on its own."

"Perhaps, but then why is it that the results do not match up with our data?" she continued.

Now that we have a Cholesky matrix, we can continue with our simulation process.

Matrix multiplication requires that the first matrix has the same number of columns as the second matrix has rows. The resulting matrix will be a one with the same number of rows as the first and the same number of columns as the second. Figure I shows this pictorally.

Figure I
Basic Matrix Multiplication
A matrix with m rows and n columns multiplying a matrix with n rows and c columns results in an m row, c column matrix.

With a row of random numbers (4 in our Feed My Starving Children example), we will have a 1 x 4 matrix for the variables, a 4 x 4 Cholesky matrix, with an output matrix of 1 x 4. Figure J is an example of one calculation using this method. Notice that the Lower Triangular Cholesky matrix we created has been transposed so that it is Upper Triangular.

Figure J
Simulation Multiplication Example
A 1x4 vector of random numbers is multiplied by one of the Cholesky columns (a 4x1 vector), resulting in a single value (i.e. a 1x1 matrix) for the new variable.

If we calculated the Cholesky values using the correlation matrix, the resulting values (we can call them "Adjusted Random Variables") are then multiplied by each variables' standard deviation and added to the mean for that variable, which completes the result for one simulation. The spreadsheet I mentioned earlier has an example of this calculation for 1000 random variables.

If we calculated the Cholesky values using the covariance matrix, then the standard deviations have already been "scaled in", so we merely need to add the mean to the Adjusted Random Variable.

Figure K shows the results for each variable using R for the simulation, along with the correlation matrix. Note that these values are similar (as they should be, as this is the whole point!) to the correlation matrix in Figure E.

Figure K
Simulation Comparison
First summary is the Standard Normal Variables, whose means are close to 0 and standard deviations are close to 1. The second summary is the Adjusted Random Variables. These means are also close to 0 and standard deviations close to 1. Due to correlation, the Totals differ between the two. Because of the correlation impact, the standard deviation of the 2nd group is higher (2.58... vs. 1.93...), even though the individual elements have essentially the same means and standard deviations!. The first correlation matrix shows the Standard Normal Variables to be uncorrelated, since off-diagonal elements are near 0. The second correlation matrix shows the simulated results for the Adjusted Random Variables, which are close to the values of the 3rd matrix, which is the correlation matrix we used to construct the Cholesky factors.

What Can We Do?

The Cholesky process allows us to model correlation patterns without disrupting the statistical characteristics of each of the individual elements. How can we use this in the 'real world'?

Improve Modeling Accuracy of Processes with Multiple Variables - rather than accept as fact the variables are uncorrelated, as the consultants did in the vignette in this post, we can use our data to ensure that any correlations that are present are factored into account as our model is developed.

Establish Non-Conventional Probability Patterns - given the ability to create correlated variables, we can use multiple variables to create probability patterns that are unique. If we want 3 "humps" or 5 in our pattern, we can create these by building up several variables and tying them together via correlations. The techniques to do this will need to be discussed in another post.

Solve Data and Mathematical Problems - the Cholesky decomposition is quite similar to taking the square root of a matrix. If we are presented with a set of data, and would like to use it in an equation, the decomposition can be useful to help us solve the equation mathematically.

Key Takeaways

The Cholesky process is used to ensure that our simulation of multiple variables evidences our desired pattern of correlation. It is also a tool to create model parameters and to solve data/mathematical problems.

You May Also Like

Usage of Cholesky matrix in an investment portfolio setting

Mei's Monte Carlo Adventure

Should You Rebalance Your Investment Portfolio?

Downloads of tools used in thie post

Excel Spreadsheet

R Code

    ::Do you have a story about using the Cholesky decomposition in a model creating situation?
    ::What other methods have you seen to account for correlation in developing models?
    ::Can the Cholesky process be used for situations where the distributions are not normal?

Add to the discussion with your thoughts, comments, questions and feedback! Please share Treasury Café with others. Thank you!

Wednesday, January 15, 2014

Is the CFO's Organization a Professional Service Firm?

"All of my directors seem to want to promote people, but this just isn't realistic. If they have their way pretty soon we'll end up as an organization of managers with nobody to manage!"

Aisha Sarin, VP of Human Resources, listened patiently as Tuck Wallace, the company's CFO, discussed his current staffing situation.

"Do you need more managers, Tuck?" she asked.

"No, I really don't think we do. What's that saying? Too many cooks in the kitchen spoil the soup?"

Aisha smiled. "Yes, something like that. I'd like to understand the situation better. Sometimes managers feel as if they do not have a lot of 'tools in the toolbox' when it comes to rewarding great work, but there are also instances where people who can truly perform at the 'next level' are held back...or not challenged enough."

Tuck thought for a moment. "Probably a little of both in this case, actually. Where do we go with this?"

In prior Treasury Cafe posts we've discussed adopting the viewpoint that our finance organization is an independent company (see A Matter of Perspective).

Thinking in this way changes our viewpoint. If that snarky remark we heard yesterday came from our boss, we might be de-motivated, resentful, and dis-empowered, and our thought bubbles might read "After all I've done...", "what a %*#@", "I'm not paid enought to take this kind of crap", etc.

When that same remark comes from a customer, we attempt to identify potential problem areas and invent ways of providing even better service. We think "there is some kind of problem that needs to be solved here because my customer is unhappy".

A much more productive approach.

If we are to regard our finance organization in this light, is it possible that there are lessons we can learn from the employment models that are used in professional service firms?

How Many People?

David Maister says there are three levels within a professional service firm: "Finders, Minders, and Grinders".

Figure A
Personnel Requirements Equations
The number of each level of personnel is determined by the number and length of projects, time requirements per project, and the utilization rate of that persons time

These names come about because the partners, who are the seniors in the firm, usually are responsible for finding the business, the mid-level pros are the ones who manage the various projects and engagements (thus the minders), and the junior / entry-level folks are the ones plugging away on specific project tasks.

How do the economics of this model work?

Figure A shows the equations that go in to determining the number of personnel required in the firm. The number of projects the firm has each year and the length of each project determine the number of project hours the firm has. For a given project a percentage of time is required for each level. Finally, each level has a certain number of 'billable hours' capacity. These factors, when combined, allow us to determine a staffing level.

Figure B
Personnel Requirements Calculation
The average project takes 3 months. On a 50 week per year basis, and 40 hour work week, each project takes 500 hours. At 12 projects per year, there is 6000 project hours per year. Each project requires 50% of a Sr's time, 1 Mid-level and 3 Jr's. Each level has a certain amount of time available for projects per year, which is the annual project capacity. The required time for each level divided by the per person capacity yields the required number of personnel. In this example, the firm requires staffing of 2 Srs, 4 Mids, and 10 Jrs.

Figure B shows a calculation of the personnel requirements based on the equations in Figure A. The firm has 6000 project hours per year.

Each project requires 50% of time from the Sr level. so the firm needs 3000 hours of time from people at this level. In similar fashion, it needs 6,000 hours of Mid time and 18,000 hours of Jr time.

Based on the utilization rates for each level, this translates into 2 Srs, 4 Mids, and 10 Jrs for the firm.

Whenever we are creating a framework for analysis it is helpful to "calibrate the model" - which is analysis-speak for asking "are we getting the answers we think we should get?". In other words, if I enter "2 + 2" does the model give me 4?

In Figure B we use the same project time requirements and utilization levels as in Maister's example, and end up with the same ratio of Mids to Srs and Jrs to Srs as he did, which should give us confidence that we got the equations right in Excel.

Now that we have established the method for determining staffing levels, we can now turn to the issue of what happens through time.


"Ultimately, the rate we promote people depends on two things - the number of people leaving the firm due to retirement, etc. and the growth of the firm" Tuck opined.

"Yes, that is essentially right, Tuck." Aisha replied. "The only other thing might be if the type of work changes. This can impact the mix sometimes as well."

"I see your point Aisha. I can think of that in terms of the span of control. Some people have 3 direct reports, others have 10, which is somewhat determined by the nature of the work they perform."

Nobody is going to want to remain a Junior forever.

At some point they expect to move up. At some point their employer expects them to move up...or out.

Employers don't want Juniors forever either.

In Maister's example, an average of 4 years is the point of progression. He uses 80% as the transition rate for Jr to Mid and 50% for Mid to Sr.

Using the information in Figure B as an example, that means we will eventually have 4 Srs (the original 2 plus 50% of the 4 Mids) and 8 Mids (80% of the original 10 Jrs), and have to hire 20 new Jrs.

A very important point - in order to sustain this rate of advancement we need to double the number of projects!

If we do not grow the business, then the Mids will not become Srs. Presumably, they will become Srs somewhere else.

The Jrs, seeing that Mids do not move up but all move out, will likely begin spending some of their time finding new gigs as well. What's the point of becoming a Mid if it is a dead end?

So the critical issue here is that firm growth and employee advancement are inextricably linked. Without one you cannot sustain the other.

A benefit to establishing the equations as we did in Figure A is to help us determine these factors. Because we have a set of equations, we can use mathematical techniques to assist in our understanding. In this case, because we are concerned with change, we can borrow from calculus and use the derivatives of these functions to help us determine the dynamics of progression within our firm.

Figure C
Rate of Personnel Change Due to Projects
The required number of personnel will increase or decrease for a given change in the number of projects is determined by the length of project, the time requirements for a project at that level, and the utilization rate of people at that level.

Figure C shows the derivatives of the Figure A equations with respect to projects. They answer the question "how does a change in the number of projects impact the number of Srs, Mids, and Jrs required by our firm?" The absence of any exponents indicates that this is a linear relationship. Using the numbers in Figure B, an addition or subtraction of one project requires a change of about 17% Srs, 33% Mids and 83% Jrs.

We can use this information to calculate a growth requirement. If we want to move someone into a Midlevel role (without any transition of current Midlevels), then we need to add 3 projects (since each project requires 1/3 of a Midlevel) to our existing 12 - a 25% rate of growth.

We can extrapolate this math further. Say we know our firm is growing at 10%. This implies that by the end of the next year we will have 13.2 projects (10% increase from 12).

Figure D
Midlevel Staff
Horizontal lines are current Midlevel staff and current Jr staff. At 10% growth, it will take 15 years to promote the entire batch of current Jrs to Midlevel.

Figure D shows the number of Mids required at 10% growth through time. In the first 8 years, we will need 4 additional ones, a promotion rate of 1 every 2 years. It will take about 15 years to promote all Jrs to Mids at this rate of growth, an average of one promotion every 1 and 1/2 years.

The information generated by this basic model helps provide insights as to how our organization will evolve through time. Will it be helpful to provide some additional dynamics?

Bells and Whistles

"I understand that the project model provides some insights as to how our organization will progress, Aisha. But it is a little simplistic."

"What are you thinking about that makes you say that, Tuck?" Aisha asked.

"Well, for one we have not accounted for migration out of the firm. People eventually retire, or their spouse gets a new job thousands of miles away and they need to relocate, so positions can be generated by this as well."

"Yes, I see your point. Some people move out of finance to other areas of the company as well, don't they?" she queried.


"Perhaps adding a factor in the model can account for some of these issues?"

Whenever we are creating a model, it helps to keep in mind that all models are wrong. The purpose of a model is to provide a framework of understanding in the most economical/efficient/simple way possible.

We cannot perfectly model reality - there are too many factors. To do so we would need to get to the level where we are simulating "butterfly movements in China" (to use the chaos theory metaphor that a butterfly's movements can create hurricanes on the other side of the world).

Every model is a tradeoff between the benefits of the information produced and the costs (time, maintenance, etc.) required to produce it..

Figure E
Incorporating Additional Information
The number of Srs required is the change in projects plus a probability factor that attrition occurs amongst the currrent staff.

Should we decide that the costs are worthwhile, we can consider some of the following options.

Figure E shows a modified personnel requirement equation which has included an attrition factor in addition to the change due to change in number of projects. As noted in the conversation above, there are a number of factors that will create attrition. The probability factor for the equation can be based on historical experience, a forecast by company personnel or 'experts', an industry average, or on some other plausible basis.

Another possibile approach would be to create the basic model at a lower point in the organization, and then aggregate it up to the CFO level.

Maister suggests that there are different types of projects, some requiring more execution, some more diagnosis, some requiring lots of contact with customers, others requiring little. Each of these project types will suggest a different construction of the different levels of time required. Thus, one department might be at the 50%/100%/300% ratio as the example in Figure B, while other departments might be at 100%/100%/100%. So we might assess the organization as a compliation of several different organizations rather than doing it as a whole.

Figure F
Disaggregation of Staff Requirements
Staffing requirements for a CFO organzation with 6000 hours in total divided into three types of activities with various staffing requirements. Total requirements are 3 and 1/3 Srs, 3 and 1/3 Mids, and 5 Jrs.

Figure F provides an example of a group with different staffing requirements for each project.

Another modification we may wish to consider is the number of levels. In large corporate organizations the division of roles may be different than the three-level system Maister considers.

For example, there may be one to four levels for individual contributors, and then several levels for managment afterwards (supervisor, manager, director, etc.). Some firms consider separate tracks, one for management and one for specialists, each with a separate level system. Adding layers and levels helps to customize the professional service firm model to our own particular organization, though again at the cost of additional modeling complexity.

What Can We Do?

We can improve our organization using the professional service firm employment model in a number of ways, such as:

Change Perspective: View Your Organization as a Series of Projects - the fact that the basic model derives from a professional service organization can sometimes throw people off - "But I do not have 100% projects". Some of this is a matter of interpretation. If our accounting organization closes the books every month, we can look at this as a routine process or reoccuring procedure, or we can consider it as a successive series of projects. Each month closing the books is a project, with certain requirements, certain actions, and certain desired outcomes. The fact that it is repeated does not necessarily invalidate viewing each one as a separate project.

Inventory Our "Projects" - breaking each of our organizations projects and tasks down to a specific allocation of time requirements from the various levels will allow us to understand where the time demands occur. Where is the greatest number of Jrs required? Srs? Understanding this can help us to determine different career path patterns within the firm. Jrs might rotate from a "Grinder" group (i.e. a group with a high Jr to Sr ratio) to one that requires more involvement from Mids and Srs. This would be a benefit to the Jr, even though they remain at the Jr level, because they are involved in higher order projects, and will thus be more likely to pick up on the different skill requirements of the levels above. Their development progresses even as they remain at the Jr level for a couple of more years.

Communicate Realistic Expectations - because we can assess the number of additional personnel required given a growth level, we can use this information to set expectations realistically within the organization. Going back to the state of our organization at 10% growth shown in Figure D, if each of Tuck's managers expects to promote someone every year, there is going to be a lot of disappointment. Tuck can use the analysis presented here to guide his organization to a more realistic expectation of how progression will occur, thereby nipping an annual cycle of demotivation "in the bud".

Develop Alternative Tools - Figure D illustrated that it will take some people 15 years to move up a level. This seems like a painfully long period of time. Developing alternative tools to recognize and reward people will therefore be required. Can we implement gamification elements that will make the job different from month to month or year to year? Rotation is another tool that can be used. New jobs are often considered rewards even if they are lateral from the org chart perspective. Can we create alternate heirarchies or project allocation systems? In colleges class selection is often done using a bidding system. Could we implement something similar with respect to some of our projects? People would end up working on things that are more meaningful to them, which is a potent form of reward.

Use as an Interpretation Framework - the professional service firm model can be a useful way to process information. For example, the consulting firm Treasury Strategies noted an evolution of Finance areas from one that was more transaction oriented to one that was more analytically oriented, moving from a traditional pyramid to an inverted one (see example here ). Using our framework, we can interpret this as one of moving from projects with a high Jr to Mid or Jr to Sr ratio to ones where the ratios are a lot lower, indicating that the skill sets of our organization will need to evolve and develop in order to keep pace with the demands from our 'customers'.

Key Takeaways

The professional service firm employment model is a useful perspective with which to view our organization. It allows us to identify issues that need to be addressed, such as employee transition rates, growth requirements and constraints, and as a convenient method to assess how changes in the industry and professional landscape can impact the value of our organization's human capital.

You May Also Like

Quantifying Productivity and Leadership Capacity

In Search of the Talent Equation

How Hiring Practices Can Thwart Our Objectives

Opportunity, Opportunity Everywhere

Gettin on the Bus

    ::What additional factors would you add to the basic model?
    ::How might this model be used to improve your organization?

Add to the discussion with your thoughts, comments, questions and feedback! Please share Treasury Café with others. Thank you!