Golf Analytics

How Golfers Win

Tag Archives: regression

Predicting Golfer Performance in Full Seasons

This is a long overdue post. My main interest in golf analytics is predicting future performance and this post will lay-out how well I can predict full seasons from prior data. The inputs are simple: 1) my Z-Score data from the 2009-2012 seasons collected from the PGA,, and European Tours and 2) my Z-Score data for all rounds on those three Tours so far in 2013. For all of these regressions I’ve limited my analysis to golfers with greater than 25 rounds in 2012 and 2013. I’ve run regressions using 1 Year (2012), 2 Years (2011-12), 3 Years (2010-12), and 4 Years (2009-12), all on 2013. Also, I’ve weighted the 1 Year and 2 Years samples by recency and regressed those on 2013.


Prior work I’ve done suggests that simply adding 25 rounds of PGA Tour average (0.00) to a golfer’s sample is a good prediction of what they will do going forward. That means a golfer with 50-100 rounds in a season (pretty much all Tour regulars) will be regressed anywhere from 20% to 33% to the mean. The above results are completely in line with that prior finding, showing regression of anywhere from 28% to 31%.

What surprised me though was the similarity of the results. This suggests that if you’re not going to weight data, it doesn’t actually help to include more data in your results. You’re pretty much stuck with ~30% regression no matter what. However, even weighting the data barely improves the overall accuracy of the results.

Now, this is a fairly basic regression of data. It includes golfers with very few rounds in both seasons (though increasing the rounds requirement to >50 with the weighted 2 Year data produced no improvement in accuracy), it includes golfers who were playing regularly on all three Tours, and it ignores a host of other information like age, status on the Tours, injuries, etc. But the results are pretty compelling – predicting golfer performance is difficult even over a full season, so much so that we should regress observed performance 30% to the mean to best predict future performance.