This weekend marks the eighth weekend of professional golf in 2014 and the beginning of the Florida Swing of the PGA Tour schedule. So far this season, guys like Jimmy Walker, Patrick Reed, and Harris English have started off playing like top 20 players in the world. With almost two months of the year (not to mention five months of the Tour season) in the books, it feels like we’re reaching the point where we can start to tell who’s struggling, who’s excelling, who’s going to contend for a Major, who’s the next Big Thing, etc. This post is designed to throw some water on those ideas. Two months doesn’t tell us very much about how the rest of the season will play out, at least when compared to a much larger sample of past tournaments.
To test how predictive the first two months of the golf season are of the rest of the season, I gathered all players 2010 to 2013 who played at least 50 rounds in the two seasons prior to the season in question (ie, 2008-09 for 2010, 2009-10 for 2011, etc.) and who played any rounds in the first two months of the season in question and in the remaining calendar year of the season in question. I kept these requirements fairly loose, but tested other combinations. In total, I found 1984 seasons from players on the PGA Tour, European Tour, Web.com Tour, and Challenge Tour. I found the average performance in z-score for the two years prior to the season in question, the first two months of the season in question, and the remainder of the season in question.
To test whether the first two months were predictive of the remainder of the season I first simply found the correlation between the first two months and the remainder of the season. I found a strong correlation (R=0.57) between the two, indicating that the first two months were highly predictive of the rest of the season. However when I examined the correlation between the prior two seasons and the remainder of the season in question (while ignoring the most recent two months), the correlation grew even larger (R=0.68). Note again that this ignores the first two months of the season. That is, if you lock me in a room from New Years until March without access to any information about professional golf I will do a better job predicting the season than someone who only relies on who’s playing well in January and February.
Now, obviously you don’t have to ignore one set of data (two year average) in favor of another (two month average). I ran a simple linear regression of the two variables (two year average and two month average) on the rest-of-season average to see if the accuracy would improve. Indeed, including both variables increased the correlation slightly to 0.72, meaning that this model explains over half of the variance in the remainder of the season (this again shows how random golf is). More interesting are the coefficients the regression spits out: Y = (0.72*Two Year)+(0.20*Two Month)+0.04. That is, the two year average is 3.5 times more important than the two month average.
I followed this study up with another that repeated the methodology, but winnowed the sample down by restricting it to players with > 100 rounds in the previous two season, > 5 rounds in the first two months, and > 19 rounds in the remainder of the season. The results were consistent with what was earlier observed. Using the smaller sample (N=1300 now) slightly improved the predictive strength and also slightly increased the importance of the two year average relative to the two month average.
However, I conducted a further study that showed that drastic improvements in z-score in the first two months were much less predictive of the remainder of the season than the general sample. Using the stricter sampling method above, I split the seasons into those where the two month average was 0.30 standard deviations better than the two year average (basically the sixth of the sample that improved the most), where the two month average was 0.30 standard deviations worse than the two year average (basically the sixth of the sample that declined the most), and the remaining 2/3rds of the sample. I ran the regression using only the data from the +0.30, -0.30, and middle groups.
The results showed that when considering only those who improved the most, you should almost completely ignore what happened in the first two months and rely on the two year average to predict going forward. For the other two groups, the results were largely consistent with what was observed in the previous studies – two year average is roughly 3 times more important than two month average.
Now, I have to stress that the sample of those who improved the most is only 192 seasons and that the standard errors of the coefficients are large (0.11). The confidence interval for the two year coefficient is 0.66 – 1.13, centered on 0.90 while the confidence interval for the two month coefficient is -0.19 to 0.24, centered on 0.03. The standard errors for the previous studies were much smaller (0.02 to 0.03). The finding that two month average should be largely ignored for those who showed the most improvement certainly needs to be tested further with more data.
I am much more confident in the main conclusions, however. When attempting to predict performance over the rest of the season – like who will contend for Majors, Ryder Cup berths, and the FedEx Cup – weigh more heavily how a golfer has played in the prior few seasons than how they’ve started off the calendar year. If that means we pump the brakes a little on Walker, Reed, and English, so be it. And don’t write off Tiger, Kuchar, Poulter, and Luke Donald for a poor couple months. Those guys have shown for years that they belong in the world’s elite; that’s worth more than a cold start.