I’ve been playing around with a projection model based on comparing golfers to past golfers to attempt to predict performance for the upcoming season. These types of models are common; most notably, Baseball Prospectus has PECOTA for MLB and Kevin Pelton has SCHOENE for the NBA. They work by certain identifying characteristics (stats, physical measurements, age, etc.) and then generating a list of comparable players who are most similar to each player being projected. From those comparable players you can generate statistical projections, confidence intervals, breakout/decline probabilities, etc. The hope is that by comparing players to thousands of past player seasons you can identify characteristics of players who improve or decline that aren’t immediately obvious if you just have point estimates of their talent level.
For my model, I selected six inputs (N=season prior to the one being projected): performance in seasons N, N-1, and N-2, change in performance between seasons N-2 and N-1 and between N-1 and N, and age in the middle of season N+1. For this year’s projections that means I’m using performance in 2012, 2013, and 2014, change in performance between 2012 to 2013 and 2013 to 2014, and age as of 7/1/2015. Based of these inputs I’ve generated a list of the 100 most comparable golfers from 1992-2013 (or fewer if that golfer had fewer comparables that met my threshold). All projections are based on the subsequent performance of those comparable golfers.
I’ve generated probabilities for each golfer to breakout (improve by 0.5 strokes/round), collapse (decline by 0.5 strokes/round), improve (any improvement), and decline (any decline). I’ve also generated a mean projection and 95% confidence interval for each golfer’s performance this season. Each golfer’s five most comparable seasons is listed, as well as the number of comparable golfers that met my threshold (no one reaches 100 because of the presence of a handful of 2014 seasons; these are ignored). I’ve attached the results in the Google Doc at the end.
Back-testing using the projections for last season this model predicted that 18% of golfers would breakout and 22% would collapse. In reality, 16% broke-out and 24% collapsed. 94% of projected golfers fell within their 95% confidence intervals. The correlation between projected and actual mean performance was 0.75, which is slightly stronger than past attempts to project simply using prior performance. The mean absolute error was 0.5 strokes, meaning the average projection missed by 0.5 strokes in either direction.
Most likely to breakout:
1. Webb Simpson
2. Charl Schwartzel
3. Hideki Matsuyama
4. Ryo Ishikawa
5. Brandt Snedeker
My use of breakout is to say a golfer will play significantly better (0.5 strokes) than they did the year before. Examples are G-Mac, Jason Kokrak, and Ryan Palmer last year. Simpson, Schwartzel, and Snedeker all had down years last year, but the model sees outstanding prior performance and ideal age and predicts a comeback. Matsuyama’s 2014 compared very favorably to a lot of outstanding seasons by young players including McIlroy, Sergio, Jason Day, and Justin Leonard. Ishikawa gets credit for consistently average play at a young age.
Most likely to collapse:
1. Rory McIlroy (50%)
2. Angel Cabrera (44%)
3. Jim Furyk (44%)
4. Bubba Watson (42%)
5. Jason Kokrak (41%)
I define a collapse when a golfer declines in performance by 0.5 strokes. Examples are Tiger Woods, Jason Dufner, and Nicolas Colsaerts last year. McIlroy’s place here will likely look stupid in 12 months, but his projection is generated from a very short list of comparables. Almost no one has had the up-and-down last three seasons that he has had while also playing at an absurdly high level. The model is most unsure about his projection of any of the 200+ guys projected. The model hates older golfers – it gives Angel only a 28% chance of playing better than he did last year – and John Senden, Thongchai Jaidee, and Robert Karlsson all appear near the top.
Link to projections (Google Doc)