Golf Analytics

How Golfers Win

Tag Archives: z-score model

Bayesian Prediction of Golfer Performance (Individual Tournament)

I’ve posted several studies attempting to predict golfer performance. This attempted to find the importance of the previous week when predicting the following week. The study was not particularly sophisticated (simple linear regression), but the results indicated that the previous week’s performance should be valued at around 10% of the projection for the golfer the following week (90% would be the two-year performance). This other study attempted to predict golfer performance for an entire season using prior season data. That study found that no matter how many years are used or whether those years are weighted for recency, the resulting correlation is ~70%. Doing better than that for full-season prediction would indicate an additional level of sophistication beyond aggregating prior seasons or weighted data for recency.

This post, however, concerns predicting individual tournament performance using my Bayesian rankings. These rankings are generated each week by combining prior performance and sample performance using the equation ((prior mean/prior variance)+(observed mean/observed variance))/((1/prior variance)+(1/observed variance)). In this way, each golfer’s prediction for a week is updated when new information is encountered. The prior mean for a week is the Bayesian mean generated the prior week. My rankings also slowly regress to a golfer’s two-year performance if they are inactive for a period of weeks. For each week, the prior mean is calculated using the equation  (((Divisor – (Weeks since competed)) / Divisor) * (Prior Mean)) + ((1 – ((Divisor – (Weeks since competed)) / Divisor)) * (Two-year Z-Score)). I use 50 as the Divisor, which weights two-year performance at 2% for 1 week off, 27% for 5 weeks off, and 69% for 10 weeks off.

To measure how predictive these rankings were, I gathered data for all golfers who had accumulated 100 rounds on the PGA, European, Web.com, or Challenge Tour between 1-2010 and 7-2013. My sample was 643 golfers. I then examined performance in all tournaments between the 3-28-2013 and 8-8-2013. My sample was 6246 tournaments played. I then generated Bayesian rankings predicting performance before each of these tournaments played. The mean of my predictions was +0.08, indicating I expected the sample to be slightly worse than PGA average. I then compared each prediction to the golfer’s actual performance.

The table below shows the performance of Bayesian and pure Two-year predictions by including all predictions within +/- 0.05 from the displayed prediction (ie, -0.50 includes all predictions between -0.45 and -0.55). The accompanying graph shows the same information with best-fit lines.

BayesianPredictions

BayesianPredictionsGraph

Obviously, the Bayesian and Two-year predictions perform similarly. To test which is better I tested the mean square error. This shows how closely the prediction matched actual performance. I also included “dumb” predictions which simply predict all rounds will perform to the mean of all predictions (+0.08 for Bayesian, +0.055 for Two-year). The “dumb” predictions are the baseline for judging any predictions. If a prediction can’t beat it, it’s worthless.

The mean square error for the Bayesian predictions was 0.381 and 0.446 for the “dumb” predictions. The mean square error for the Two-year predictions was 0.389 and 0.452 for the “dumb” predictions. So both sets of predictions provide value over the “dumb” predictions, but both perform fairly similarly when compared to the “dumb” predictions (-0.065 for Bayesian and -0.063 for Two-year).

This study indicates two things; first, using Bayesian methods to predict golfer performance doesn’t substantially improve accuracy relative to unweighted aggregation of the last two years of performance, and second, that predicting golfer performance in individual tournaments is very difficult. A mean square error of 0.38 indicates an average miss of 3.5 strokes for golfers playing four rounds and 2.5 strokes for golfers playing two rounds.

Advertisements

Predicting Professional Performance of Collegiate Players (REDUX)

Let’s start with some tough love. This was a laughably lazy attempt at performing this study. Hopefully this effort will be less awful, considering I think I have a somewhat proper sample.

Quickly, I’m looking for the correlation between Jeff Sagarin’s college golf rankings and my own Z-Score Model Ratings. Sagarin publishes the best (only?) math based ranking of college golfers. There’s obviously issues of sample size in the college game (most teams play <15 tournaments at normally three rounds), but Rankings are fairly strongly correlated between seasons (R=.62) even though players are in a volatile period in their golf development. To combat concerns about sample-size, I’ve averaged the golfer’s Ranking over their college career. This isn’t ideal either, but, again, a max of 45 rounds isn’t something I’m comfortable using.

Once I had those, I looked in my Z-Score database for the first instance of those players playing >20 rounds in one season and took their Z-Score from that season. A few concerns about this method of finding seasons: 1) if a golfer has less than 20 rounds in every season, they won’t show up at all, 2) if a golfer has less than 20 rounds before getting greater than 20 rounds in a subsequent season, that first season will be ignored, and 3) it can often be several years before a golfer accumulates >20 rounds on the PGA/Web.com/European Tours (I do not have eGolf/NGA/Challenge/Asian/etc. Tour Ratings). Of these concerns, #1 isn’t that big of a deal. Plenty of collegians don’t have the game for high level pro golf – there are less than 1,000 guys who play regularly on the three Tours I track and I’ve gathered data on the top 500 golfers from each season. #2 isn’t very concerning. The sample has to be set somewhere. #3 concerns me the most because comparing a 26 year old with three seasons of minor tour golf to a 21 year old right out of college is kind of apples and oranges, but perhaps I’ll run another study in the future that excludes those data points.

First information about my sample. N=80, all but three golfers had at least 2 seasons of college golf (average was 3.2 seasons) – Spieth, Todd Baek, and Roger Sloan had the single seasons, the average performance in college was a 71.1, and the average performance in pro golf was a +0.15 Z-Score (below average). I’ve chosen to display the pro results in terms of strokes better than/worse than average. Divide by 3 to get the corresponding Z-Score.

The results were much less irrelevant than that turd I linked above:

college golf regression

The correlation was R=.49, which indicates that we can predict pro performance on a roughly 50% Sagarin/50% mean basis. The equation to use is y=0.47x-32.9 where y is pro performance in Strokes to average (divide by 3 to get Z-Score) and x is Sagarin Rating. For comparison, I’ve found that the correlation between back-to-back professional seasons is about 70% (70% Year 1 + 30% mean) and correlation between back-to-back college seasons is about 63% (63% Year 1 + 37% mean). Based on the concerns I laid out above, I think that’s not terrible.

Unsurprisingly, this method of predicting would have misses. Sagarin did not think highly of Keegan Bradley coming out of St. John’s. Whether it was poor play or an awful schedule, Keegan averaged a 72.5 over three years in school. Keegan was one of those guys who turned pro and who took several seasons to record Major Tour rounds. He graduated in 2008 and didn’t record a Major Tour round until 2010.

I do take solace in the fact that no golfer who averaged better than a 70.0 (basically in the top 15 each season) failed to perform better than the sample average in their first season. This indicates that success in college is correlated with success in professional golf. In fact, only a single player with a Sagarin below 70.0 in the entire 2005-2013 sample (who has graduated) has failed to record 20 or more Major Tour rounds – (Arnond Vongvanij, who has exclusively played on the Asian Tour, has a professional win, and is ranked 218th by the Official World Goal Ranking).

This method predicts success for the best collegian golfer not currently in pro golf, Justin Thomas (69.2), who plans to turn pro after the Walker Cup. He’s recorded a -0.06 Z-Score in 12 rounds dating back to 2012.

Predicting Golfer Performance in Full Seasons

This is a long overdue post. My main interest in golf analytics is predicting future performance and this post will lay-out how well I can predict full seasons from prior data. The inputs are simple: 1) my Z-Score data from the 2009-2012 seasons collected from the PGA, Web.com, and European Tours and 2) my Z-Score data for all rounds on those three Tours so far in 2013. For all of these regressions I’ve limited my analysis to golfers with greater than 25 rounds in 2012 and 2013. I’ve run regressions using 1 Year (2012), 2 Years (2011-12), 3 Years (2010-12), and 4 Years (2009-12), all on 2013. Also, I’ve weighted the 1 Year and 2 Years samples by recency and regressed those on 2013.

Image

Prior work I’ve done suggests that simply adding 25 rounds of PGA Tour average (0.00) to a golfer’s sample is a good prediction of what they will do going forward. That means a golfer with 50-100 rounds in a season (pretty much all Tour regulars) will be regressed anywhere from 20% to 33% to the mean. The above results are completely in line with that prior finding, showing regression of anywhere from 28% to 31%.

What surprised me though was the similarity of the results. This suggests that if you’re not going to weight data, it doesn’t actually help to include more data in your results. You’re pretty much stuck with ~30% regression no matter what. However, even weighting the data barely improves the overall accuracy of the results.

Now, this is a fairly basic regression of data. It includes golfers with very few rounds in both seasons (though increasing the rounds requirement to >50 with the weighted 2 Year data produced no improvement in accuracy), it includes golfers who were playing regularly on all three Tours, and it ignores a host of other information like age, status on the Tours, injuries, etc. But the results are pretty compelling – predicting golfer performance is difficult even over a full season, so much so that we should regress observed performance 30% to the mean to best predict future performance.

Predicting College Players in Pro Golf

With the recent end of the NCAA Golf season, top collegiate golfers are terminating their amateur status and entering pro tournaments with the aim of earning enough money to earn status on one of the major tours for 2014. Jordan Spieth was the first to turn pro this season, entering several PGA Tour tournaments early in the year and earning Special Temporary Member status. Just this past week, Alabama’s Justin Thomas and Washington’s Chris Williams turned pro at the Travelers Championship and both made the cut. The main question with these freshly minted pros from my point of view is just how good are they compared to a PGA regular?

Luckily, Golf Week/Jeff Sagarin publishes yearly rankings of college golfers measuring their performance throughout the college season. This season, the top ranked golfer was Michael Kim, the low amateur at the US Open. Sagarin reports the ratings on a scale mirroring a typical golf score, with the most elite golfers earning ratings in the 68.0s while the 25th best golfer in a season will rate around a 70.0-70.5. How do those ratings translate into the Z-Score Rating System I use?

To find out I constructed a list of golfers that both were ranked in Sagarin’s top 25 for 2010-11 or 2011-12 before turning pro and had at least 8 rounds on the Web.com, European, or PGA Tours in the year after their college season ended (basically June to May). There were 18 such golfers that met both criteria including notables like Harris English, Bud Cauley, and Jordan Spieth. For those 18, I gathered Sagarin ratings for their last two seasons in college (or only one season if they left after freshman year) and their Z-Scores, adjusted for strength of field, for all rounds they played in the year following the end of their last college season. The N for the golfers ranged from 8 rounds to 108 rounds.

I then ran a linear regression with the average Sagarin rating of the final two seasons in college as the independent variable and their Z-Score in the year following as the dependent variable. For the full data-set of 18 golfers, the R2 was only 0.09 (graph of data is below) – meaning for our data collegiate performance explained about 30% of performance as a pro.

collegegolfergraph

Now, what can we learn from such a small sample? Perhaps not much, especially since there is likely to be survivor bias in the data. The golfers who perform best in their first few tournaments are more likely to get additional sponsor’s exemptions, gain status on a major tour, and get additional rounds of data. Golfers who struggle early will be forced to play on the NGA Tour or other lower tours that I do not have data for. Perhaps the most important thing is that a golfer would have to play to a rating of 69.3 in college to be projected as a 0.00 as a pro (basically average for the PGA Tour). Only Michael Kim (who is staying an amateur) and Brandon Stone (who just turned pro at this past week’s BMW International and finished T10) were that good in 2012-13.

I will return to this subject when I finish compiling Challenge Tour (European minor league) and NGA Tour and eGolf Tour data for past seasons which will give me a much larger sample to work with and from which to draw conclusions.