Golf Analytics

How Golfers Win

Tag Archives: bayesian

Will Jimmy Walker Continue to Putt at an Elite Level?

I got some push-back from Chris Catena on twitter today about my contention that Jimmy Walker’s recent run of great play was driven by lucky putting. In that post, I showed that Walker had established himself recently as an above-average, but not elite putter (a strokes gained putting of around +0.25-0.30/round for the last five years). During Walker’s recent run (Frys.com Open through Northern Trust Open), he’s putted at a +1.20 level. That +0.9 strokes/round improvement is entirely what carried him to three wins in the last four months. I also contended that Walker continuing to putt at this level is very unlikely, simply because no one ever has for a full-season. Moreover, Walker’s best putting season (+0.46) and average putting season (+0.26 from 2009-2013) are far short of the kind of elite, sustained level of play we often see out of the golfers who lead the Tour in strokes gained putting. This post is to defend those claims in more depth and show why I think it’s very unlikely that Jimmy Walker will continue putting and playing as well as he has in the last four months.

JimmyWalkerSGP2012-14

Above is a graph of Walker’s strokes gained putting performance per tournament in every tournament the PGA Tour has data for since the start of 2012. The red dashed line is a linear trendline of his performance. It has basically zero (R=0.03) relationship with the passage of time, indicating that on the whole, Walker’s performance hasn’t improved over time. This is important to note because if we hypothesize that Walker changed something in his ability to putt, it clicked in only weeks after his worst putting stretch of the past 2+ years. Now, poor performance is certainly a motivator to change and try to improve, but a simpler explanation is that Walker got unlucky during the summer, and has been riding a combination of luck and improved putting since.

What Walker has done in the past 23 rounds on Tour isn’t unprecedented even during the 2013 season. I divided the tournaments in 2013 (Hyundai ToC to Tour Championship) into four quartiles with 7-8 tournaments in each quartile. I then found the golfers who had participated in 4+ tournaments in each bucket and averaged their SGP for each quartile. I gathered all golfers who had qualifying consecutive quartiles and compared them using Q1->Q2, Q2-Q3, etc. For Q4, I compared it to performance so far in 2013-14 from the Frys.com Open to the Northern Trust Open. From all that, I had 365 pairs of quartiles where a golfer had played at least four tournaments during each quartile. A graph of of those pairs is follows.

pairs of SGP quartiles

There was very little relationship between a golfer’s performance in one set of tournaments and their performance in the following set of tournaments (R=0.04, indicating a tenuous at best relationship). I had 61 quartiles with a performance > +0.50, averaging 0.72. Those quartiles played to only +0.12 in the next set of tournaments. In fact, in only 12 of those samples of > +0.50 performance did a golfer again average > +0.50 the next quartile. None of the six samples of > +1.00 SGP had > +0.52 SGP in the following quartile. In short, we should be very skeptical of elite putting performances over fairly short periods of time.

Now, when I said that Jimmy Walker’s performance was largely driven by luck I meant the “largely” part. I think it’s extremely unlikely that all of his putting performance can be explained by variance alone. Jimmy Walker has +1.20 strokes gained putting/round in 23 measured rounds so far this season. The observed standard deviation between 23 round samples for PGA Tour players is around 0.35 strokes. That means if an average (+0.00) putter plays an infinite number of 23 round samples, 68% of them will yield an SGP average of -0.35 to +0.35, while 95% of them will yield an SGP average of -0.70 to +0.70. In short, there’s a ton of variation between 23 round samples. For an average golfer, it wouldn’t be shocking for them to putt extremely poorly or very well over 23 rounds. Plugging that standard deviation (0.35), Walker’s 2013-14 SGP (+1.20) and Walker’s five year SGP average (+0.26) into a Z-score equation yields a Z of 2.7 which indicates <1% chance that Walker’s SGP is entirely due to chance. That means there is some signal in all that noise.

But how much? I consider myself a Bayesian in that I think it’s very important to compare any observed performance to our prior expectation for that performance. Up until October 2013, Jimmy Walker was an above-average, but not elite putter. Since then, in 23 rounds, Walker has putted out of his mind. Surely we should consider Walker a better putter than we did in October, but how much better? Fortunately, there’s a simple equation we can use to estimate how the 23 round sample should change our expectation for him. It’s ((Prior Performance)/(Prior variance) + (Sample performance)/(Sample variance))/((1/Prior variance)+(1/Sample variance)). Basically, this equation tells us how confident, statistically, we should be about a golfer establishing a new level of performance based on how far his performance is from the prior expectation and how large of a sample we’re dealing with.

We know the prior performance and sample performance from the previous paragraph. The sample variance is simply the 23 round standard deviation from above (0.35) squared (0.12). To find the prior variance, I was forced to run some simulations as my data was limited. I know the variance for 100 round sample is around 0.025, so the prior variance for Walker over his >300 rounds in 2009-2013 must be no greater than that. Simulations indicated to use a figure around 0.02.

Plugging those values into the equation yielded a new expectation for Walker of around +0.40. That’s significantly higher than his five year average, but also much less than what he’s done recently. The equation is saying that Walker’s been much better, but that 23 rounds isn’t nearly enough to say that he should be expected to continue to putt at an elite level. If we had just seen Walker putt at a +1.20 SGP level for 80 rounds, we’d be much more confident in him continuing to putt at an elite level.

The tl;dr here is that extremely good SGP performances over small samples (~4-8 tournaments) sharply regress to the mean in the following 4-8 tournaments. Sustaining the kind of putting Walker has shown recently is unprecedented over a large sample of rounds from 2013-14. Moreover, the expected level of variance of 23 rounds is very large. It would not be abnormal for an average putter to putt at a top 20 or bottom 20 level over 23 rounds. Considering all that, we should expect Walker to putt better over the rest of the season than he did in 2009-2013, but not nearly as well as he has since October.

Advertisements

Regression Rules Everything

This post will be number/graph heavy, but it explains perhaps the most important concept in predicting golf performance – everyone regresses to the mean, no matter their performance. The below are two charts that show this effect in action. The first uses large buckets and compares all players performance in seasons with N > 50 rounds with their performance (regardless of N) in the subsequent season. The following shows similar data, broken down more at a more granular level, which also includes which percentage of seasons meet the criteria. Read the buckets as seasons within 0.05 standard deviations.

initialtosubseqseasons

tableofsubsequentseasons

In the first graph, all golfers better than +0.30 (approximately Web.com Tour average) in year 1 declined in year 2. Those worse (think Challenge Tour average) did not improve or decline, on average. Only those who performed very poorly in year 1 actually improved. For those better than PGA Tour average, the decline was fairly uniform (~0.05 to ~0.10 standard deviations). Remember, these are the aggregation of huge samples; many players improved at all skill levels, but on average regression/decline ruled everything.

In the second graph, the most important lesson is how rare the truly elite seasons are. Only roughly 1/4 of seasons came in below -.15 (which is roughly the talent level of the average PGA Tour card holder). The cut-off for the top 5% of seasons (2010-2012) came in at -0.45. Also, the regression of almost all players is evident; no bucket better than +0.35 improved in the subsequent season.

This data is fairly strong evidence that we should expect decline from most performances, on average. In fact, based on the rarity of rounds and the demonstrated regression, we should be skeptical about predicting any elite performance to be repeated the following season.

Bayesian Prediction of Golfer Performance (Individual Tournament)

I’ve posted several studies attempting to predict golfer performance. This attempted to find the importance of the previous week when predicting the following week. The study was not particularly sophisticated (simple linear regression), but the results indicated that the previous week’s performance should be valued at around 10% of the projection for the golfer the following week (90% would be the two-year performance). This other study attempted to predict golfer performance for an entire season using prior season data. That study found that no matter how many years are used or whether those years are weighted for recency, the resulting correlation is ~70%. Doing better than that for full-season prediction would indicate an additional level of sophistication beyond aggregating prior seasons or weighted data for recency.

This post, however, concerns predicting individual tournament performance using my Bayesian rankings. These rankings are generated each week by combining prior performance and sample performance using the equation ((prior mean/prior variance)+(observed mean/observed variance))/((1/prior variance)+(1/observed variance)). In this way, each golfer’s prediction for a week is updated when new information is encountered. The prior mean for a week is the Bayesian mean generated the prior week. My rankings also slowly regress to a golfer’s two-year performance if they are inactive for a period of weeks. For each week, the prior mean is calculated using the equation  (((Divisor – (Weeks since competed)) / Divisor) * (Prior Mean)) + ((1 – ((Divisor – (Weeks since competed)) / Divisor)) * (Two-year Z-Score)). I use 50 as the Divisor, which weights two-year performance at 2% for 1 week off, 27% for 5 weeks off, and 69% for 10 weeks off.

To measure how predictive these rankings were, I gathered data for all golfers who had accumulated 100 rounds on the PGA, European, Web.com, or Challenge Tour between 1-2010 and 7-2013. My sample was 643 golfers. I then examined performance in all tournaments between the 3-28-2013 and 8-8-2013. My sample was 6246 tournaments played. I then generated Bayesian rankings predicting performance before each of these tournaments played. The mean of my predictions was +0.08, indicating I expected the sample to be slightly worse than PGA average. I then compared each prediction to the golfer’s actual performance.

The table below shows the performance of Bayesian and pure Two-year predictions by including all predictions within +/- 0.05 from the displayed prediction (ie, -0.50 includes all predictions between -0.45 and -0.55). The accompanying graph shows the same information with best-fit lines.

BayesianPredictions

BayesianPredictionsGraph

Obviously, the Bayesian and Two-year predictions perform similarly. To test which is better I tested the mean square error. This shows how closely the prediction matched actual performance. I also included “dumb” predictions which simply predict all rounds will perform to the mean of all predictions (+0.08 for Bayesian, +0.055 for Two-year). The “dumb” predictions are the baseline for judging any predictions. If a prediction can’t beat it, it’s worthless.

The mean square error for the Bayesian predictions was 0.381 and 0.446 for the “dumb” predictions. The mean square error for the Two-year predictions was 0.389 and 0.452 for the “dumb” predictions. So both sets of predictions provide value over the “dumb” predictions, but both perform fairly similarly when compared to the “dumb” predictions (-0.065 for Bayesian and -0.063 for Two-year).

This study indicates two things; first, using Bayesian methods to predict golfer performance doesn’t substantially improve accuracy relative to unweighted aggregation of the last two years of performance, and second, that predicting golfer performance in individual tournaments is very difficult. A mean square error of 0.38 indicates an average miss of 3.5 strokes for golfers playing four rounds and 2.5 strokes for golfers playing two rounds.

The Aging Curve for PGA Tour Golfers (Part III) – Using Bayesian Prior

Several weeks ago I posted a two studies on aging among PGA Tour golfers, the most recent of which compared sequential seasons, regressing both seasons to PGA Tour average based on the number of rounds a golfer had played in the seasons. DSMok1 suggested modifying the amount and degree of regression by including a better prior, which makes more sense than regressing every golfer to the same mean. Instead of simply adding 25.5 rounds of average play to each golfer’s season, I found a Bayesian prior based on play in the prior season and measured the change in performance from that prior in the following season.

Sample and Design:

I included every player with >20 PGA Tour rounds in a season for 2010, 2011, and 2012. This limited my sample to 703 seasons. I then gathered data for YR N-1, YR N, and YR N+1 (ie, 2009, 2010, and 2011 for golfers with >20 rounds in 2010) on all major Tours (PGA, European, Web.com, and Challenge).

Using the equation ((prior mean/prior variance)+(observed mean/observed variance))/((1/prior variance)+(1/observed variance)) I found my prior expectation on performance, inputting data from YR N-1 for prior mean and variance and from YR N for observed mean and variance. That equation adjusts the observed performance based on what we’ve observed in the prior season to generate a true-talent level (True YR N) for YR N+1. I used the same equation to find the true-talent level for YR N+1. I inputted the prior generated from YR N-1 and YR N as the prior mean and the data for YR N+1 as the observed mean. This produced True YR N+1. I then compared both True YR N and True YR N+1to find the change in true-talent for each age group.

I weighted the results using the harmonic mean rounds played in YR N and YR N+1. For example, there were 18 golfers for age 26, so I took the sum of each harmonic mean of rounds and divided each golfer’s change in true talent by their share of the total rounds. This produced my total change in true-talent due to age for each age-group.

If a golfer had no performance in YR N-1 I used +0.08 (slightly below PGA Tour average) as their YR N-1 prior. In most cases, these players qualified via Qualifying School and +0.08 is the observed true-talent for Q-School golfers for 2009-2013. Only 8 golfers had 0 rounds in YR N-1 however.

Results:

20    -0.05    2
21    -0.06    3
22    -0.01    6
23    -0.05    8
24    -0.07    9
25    -0.11    11
26    -0.13    18
27    -0.13    23
28    -0.14    29
29    -0.12    36
30    -0.13    34
31    -0.11    39
32    -0.12    36
33    -0.11    34
34    -0.13    34
35    -0.12    36
36    -0.11    37
37    -0.10    42
38    -0.08    26
39    -0.05    30
40    -0.01    21
41    0.03    35
42    0.07    28
43    0.12    19
44    0.13    17
45    0.15    13
46    0.21    17
47    0.25    19
48    0.31    13
49    0.36    12
50    0.35    9
51    0.45    4
52    0.47    2

bayesian aging

Discussion:

The curve generated is very similar to that of the prior study regressing to a mean of +0.00. The peak is slightly lower and the decline is deeper in the late 40s, but otherwise this study supports my prior conclusion of aging with a peak in the mid 30s and subsequent decline.