Friday, August 10, 2012

Analytic Reminders You Can Learn From an 8-Year Old

My wife’s family is huge. She is one of eight, and many of her siblings have carried on this tradition in their families as well, the result being that her parents have over 30 grandkids.
This provides me ample opportunity to play all sorts of games. On our recent annual vacation to a camping cabin “resort” in the north woods of Minnesota (where cell phone reception is spotty at best), on one of those days when we are trapped indoors because the rain prevents any playground, beach, biking, fishing, or golf activities, I settled down to a game of Monopoly with an 8-year old.
The problem is…I should have been thinking more than I was.

Background
Monopoly is a board game that takes you around a square with 10 spaces on each side, for a total of 40 spaces in all. Most of the spaces represent “properties”, which you can buy and own, and when other players land on them you collect “rent” from them. Each turn you roll 2 dice and move your token accordingly. If you land on someone else’s property, you must pay them the rent for that property.
Most of the properties belong to color groups of 2 or 3. If you own all the properties in a color group, you have a “road”. When this occurs, you are able to invest in houses and hotels, which significantly increases the rental income you collect from other players. The only exception to this color group is the railroad and utility groups, which cannot be improved upon, though by owning more than one rental income increases.
It is only with a large amount of luck that you can obtain a road on your own movements. Because of this, at some point in the game players start to make trades – combinations of one or more properties and perhaps cash in exchange for others.

The Situation
After having traversed the board quite a few times, we arrived at the situation where all the properties were purchased but nobody owned a road. The 8-year old I was playing with proposed a trade, whereby I would give him the two railroads I owned (he owned the other 2) in exchange for Pacific Avenue, one of the three properties making up the “green” color group. Since I owned Pennsylvania Avenue and North Carolina Avenue (the other two greens), I would be in a position to build houses and hotels to increase my income while others would not.
Given the properties I held, it was also not possible for anyone else to get a road, so this trade appeared to me to be one where I would be able to slowly establish a juggernaut that all would succumb to.
Unfortunately for me, I did not take “Expected Value” into account, whereby my 8-year old opponent did (though maybe not consciously).

What is Expected Value?
Figure A
“Expected Value” is a term used in statistics and probability theory. It represents the "payoff" of a certain event multiplied by the probability of that event occurring. For a coin flip, this would be represented by the equation in Figure A.
For example, if I receive ²1 ( the symbol ² stands for Treasury Cafe Monetary Units, or TCMU's, freely exchangeable into any currency of your choosing at any exchange rate you desire) should a coin flip result in heads, and pay ²1 should the coin flip result in tails, given a 50/50 chance of each occurring, then my "Expected Value" is ²0 (.5 * 1 + .5 * (-1) = 0).
What if I receive ²1 on heads and pay ²0.50 on tails? Then my expected value is ²0.25 (.5 * 1 + .5 * (-.5) = 0.25).
What if I receive ²1 on heads and pay ²0 on tails? Then my expected value is ²0.50 (.5 * 1 + .5 * 0 = .5).
Figure B
More generally, the equations above can be represented by the formula in Figure B, which simply says for all events "i" whose probabilities total to 1, the expected value is the sum of the probability of that event occurring times the value of that event should it occur.

Applying Expected Value
Now that we understand expected value, we can apply this knowledge to my Monopoly trade with my 8-year old opponent.
The Monopoly board has 40 spaces, so if we assume that landing on each one is equally likely, then the probability of landing on each space is simply 1/40, or 0.025. After the trade, my opponent will have 4 railroad properties, each requiring a payment of 200 from the person landing on them. Using Formula B, they will have an expected value of 20 (.025 * 200 + .025 * 200 + .025 * 200 + .025 * 200 = 20).
I had the funds to put up one house on my green properties. Those landing on green properties with one house need to pay rent of 130 for two of the three and 150 for the other. Thus, my expected value, using the formula in Figure B, is 11.5 (.025 * 130 + .025 * 130 + .025 * 150 = 10.25).
Since we were playing each other, my opponents expected value is also my expected payment, and vice versa. Unfortunately for me, this means that I can expect to pay 20 while receiving only 10.25, and thus my net expected value is -9.75. Because of this, the possibility of amassing enough cash to buy another round of houses for my green properties (which would require 450) is quite unlikely. If I could achieve this, it would put me in a positive position, as the expected value with two houses is 30.75 (0.25 * 390 + 0.25 * 390 + 0.25 * 450 = 30.75).
The lesson from this basic analysis was that in order for me to make the trade, I needed enough cash on hand to build two rounds of houses on the green properties immediately in order to make a positive expected value. Lacking that, I should not have made the trade.

Path Dependence
Since movement in Monopoly is governed by the roll of 2 die, the 1/40 probability assumption we used in the last section is somewhat inaccurate. If our token is on the Board in space #1, then it is more likely that on the next roll we will land on space #8 (i.e. rolling a 7) rather than space #3 (i.e rolling a 2), so each of these spaces have different probabilities (the odds of a 7 are 7/36, while those of a 2 are 1/36).
Similarly, on the next turn, the probabilities of different properties being landed on will depend on where we landed the turn before. Had we rolled a 2 last time and moved to space #3, on our next turn it is now more likely we will land on space #10 (rolling a 7) instead of space #15 (rolling a 12).
This concept, that the future outcome is determined by the past, is known by the term “path dependence”.

A Trip to Monte-Carlo
One solution to estimating results in a path dependent situation is to perform a Monte-Carlo simulation. Monte-Carlo models use statistically based random numbers to project the future over and over. By doing this, we can develop an estimate of the probabilities of events we are concerned about occurring.
In order to accomplish the simulation, we program the movement process around the Monopoly board, taking into account the die roll (in this setting the random element of the Monte Carlo), and the game elements that impact position (e.g. the “Go to Jail” space, Chance and Community Chest cards, etc.). This was done using a combination of Excel and Visual Basic for Applications (I am happy to email this spreadsheet and code to you if you’d like, simply connect with me on LinkedIn and provide me an email address).
We then simulate 1,000 games from every one of the 40 possible starting positions. In each simulation, all players had to go around the board at least 5 times. This results in 40,000 data elements to analyze.

Evaluating the Data
Figure C
For the analysis portion, we import the results into R (an open-source statistical program). I prefer R to Excel for this phase as it is in more robust in handling the data and offers a wider variety of analysis and graphics options (I could have set up the Monte Carlo in R, but selfishly wanted to practice my VBA skills).
The t-test is a statistical metric that determines whether the average of one set of data is significantly different (meaning likelihood is set to a high threshold, such as less than 5% chance) than the average of another set. Figure C shows the formula for a t-test statistic for equal sample sizes with an assumed equal variance.
One thing I like to do as an analyst is verify that my understanding of equations is sound and that the programs I am using are performing calculations according to that understanding. For that reason, I calculated in Excel the t statistic for the test between Expected Value results for starting position #1 and starting position #16 (Figure D), and then compared that to the R output (Figure E).
Figure D
Once the t-statistic is calculated, it compared to a table (which is based on the number of observations) that determine its “p-value”. The p-value represents the probability that the results are from the same data set (the “null hypothesis”). If the p-value is very low, this means that there is only a slight chance the data are the same, or in other words it is likely the data are from “different” value distributions.
Figure E
Figure F
Figure F shows the p-values for the average Expected Value of landing on the green properties from three positions – Go, Pennsylvania Railroad, and Pennsylvania Avenue compared to all the other starting positions. The black line at the bottom is the .05 p-value threshold. Items below this line are significantly different (notice also that the p-value is 1 in spots where the distribution is comparing results to itself, remembering that the p-value is a measure of the likelihood the data are from the same distribution).
Looking at the orange line (starting position is “Go”, space #1), there is not a significant difference in Expected Values for this position vs. its near neighbors (up to around space #10, and space #30 and up) but is significantly different from Expected Values in the 10’s and 20’s.
Conversely, the brown line (starting position “Pennsylvania Railroad”, space #16) shows no significant difference between its near neighbors but significant difference from starting spaces further away (spaces #1-#10 and #30 +).
This process confirms the path dependency of Expected Value, they are different depending on where you start are on the board.
Figure G
However, now let’s take a look at Figure G. The Expected Value of each of the starting positions is around 10, and the range is not very great. The lowest expected value is 9.51 and the highest is 10.90. So even though they are significantly different in the statistical sense, the difference is not really great enough to change the value of the Railroad for Green Properties trade with the 8-year old.
The mean of the Expected Values in Figure G is 10.18, surprisingly close to our initial pass estimate of 10.25. So while path dependence does occur in the game, in this case it is not great enough to affect the outcome over a simpler set of assumptions.
Finally, I looked at the number of times the simulation resulted in a higher expected value for my side of the trade vs. that of my 8-year old opponent. On average, across all starting positions, only 13% of the time did I come out ahead (ranging between 10% and 16% depending on starting position). Based on this, I was very unlikely to win the game.

Conclusion
My failure to utilize the tool of Expected Value cost me the game. Sitting in a cabin near a lake in the woods is not the first place one might think to utilize analytical tools, however, in this case it would have helped.
The tools we have learned and deploy can often be used in more settings than we might think, so long as we are willing to be a little creative with them.
Next time I play that kid I might bring my computer!

Key Takeaways
Calculating Expected Value is a tool that can be used to assess alternative situations in order to inform decisions about what to do or not. Through the use of Monte Carlo simulation, even situations involving path dependent factors can utilize the Expected Value tool. As always, judgment needs to be utilized in assessing data and output of these calculations.

Questions
·         Have you encountered game situations where statistical concepts have been useful?
·         How have you deployed analytic tools in unusual situations or settings?

Add to the discussion with your thoughts, comments, questions and feedback! Please share Treasury Café with others. Thank you!

10 comments:

  1. Remind me to never play a board or card game with you ever again!

    ReplyDelete
    Replies
    1. Don't worry, I am working on how we might apply this to golf as well!

      Delete
  2. Dave
    thanks for creating value by teaching me some new tricks. Reminding me why statistical analysis proves valuable, how it clarifies choices and thus improves decision-making; even if it's in the woods with an 8 year old!

    ReplyDelete
    Replies
    1. Undimed,

      While I am happy to remind you that statistical analysis proves valuable, we must also keep in mind that there are "lies, damn lies, and statistics", so I might say it can prove valuable if used in the right way. We must be cautious consumers of statistical data.

      Thanks for reading and taking the time to comment!



      Delete
    2. Great read- thanks. A couple of strategic questions:

      Did you consider mortgaging all of your other properties to buy houses?

      How high was the risk of other players making similar trades thereby leaving you out in the cold? Perhaps taking the inferior end of this trade gave you a higher chance of winning than if the 8 year old had traded with another player.

      Delete
    3. Had I done this analysis prior I might have mortgaged properties to get to two houses, but there would have been little left over to cover what I would owe if I landed on the railroads.

      The third opponent (a 7-year old!) could make a road trade with the 8-year old, but every time that was proposed I talked him out of it - for the basic reason that the 8-year old had enough cash to put up hotels and the 7-year old could not afford to put up more than 1 house. Other than that, the trades would have had to be 3-ways since we all held one of the colors.

      Delete
  3. For the parents in the group, Yahtzee also gets to be an excellent game in teaching expected values.

    ReplyDelete
    Replies
    1. That is a great point - as I recall there are decisions made during that game about where to score the itmes. For instance if I roll 3 three's and 2 6's, I can choose to score it on the 3's line, the 6's line, the full house line, or the sum total of the five dice line. Thanks for the comments!

      Delete
  4. A very interesting idea to use Monopoly to demonstrate the usefulness of statistics.

    I would be curious to hear how you would further adjust your model of path dependence for the fact that spaces on the board - such as "Go to Jail" and "Jail" - alter the path and the number of times you might land on any particular space. (I can recall as a child being grateful to be in jail when I was low on cash as I could not land on others’ properties and this prolonged the game. I could wear an older opponent down since I had more time to kill than they did!) There are also cards in the deck that direct you to various spots on the board. You get these cards by landing on certain spaces which adds yet another layer to the path dependence.

    Even “simple” problems are tough to define. As my statistics professor said many (many, many) years ago, solving the problem is easy, defining the problem is the hard part. Thanks for an interesting post.

    ReplyDelete
    Replies
    1. Pat,

      I love that statement from your professor - "defining the problem is the hard part". Thanks for sharing that!

      With respect to your observations of the Monopoly board, I attempted to include in the Monte Carlo the factors you mentioned, going to jail, the various cards in the Chance and Community Chest decks.

      I offered to send the spreadsheet to readers if they would like it. If you would like to verify that I included those "wrinkles", I am happy to send it to you. Either leave an email address here (if you are not afraid of spammers), or connect with me on LinkedIn and then send me your email "in private".

      Thank you again for your comments and sharing your experience!

      Delete