Golf Analytics

How Golfers Win

Monthly Archives: October 2013

The Aging Curve for PGA Tour Golfers (Part III) – Using Bayesian Prior

Several weeks ago I posted a two studies on aging among PGA Tour golfers, the most recent of which compared sequential seasons, regressing both seasons to PGA Tour average based on the number of rounds a golfer had played in the seasons. DSMok1 suggested modifying the amount and degree of regression by including a better prior, which makes more sense than regressing every golfer to the same mean. Instead of simply adding 25.5 rounds of average play to each golfer’s season, I found a Bayesian prior based on play in the prior season and measured the change in performance from that prior in the following season.

Sample and Design:

I included every player with >20 PGA Tour rounds in a season for 2010, 2011, and 2012. This limited my sample to 703 seasons. I then gathered data for YR N-1, YR N, and YR N+1 (ie, 2009, 2010, and 2011 for golfers with >20 rounds in 2010) on all major Tours (PGA, European, Web.com, and Challenge).

Using the equation ((prior mean/prior variance)+(observed mean/observed variance))/((1/prior variance)+(1/observed variance)) I found my prior expectation on performance, inputting data from YR N-1 for prior mean and variance and from YR N for observed mean and variance. That equation adjusts the observed performance based on what we’ve observed in the prior season to generate a true-talent level (True YR N) for YR N+1. I used the same equation to find the true-talent level for YR N+1. I inputted the prior generated from YR N-1 and YR N as the prior mean and the data for YR N+1 as the observed mean. This produced True YR N+1. I then compared both True YR N and True YR N+1to find the change in true-talent for each age group.

I weighted the results using the harmonic mean rounds played in YR N and YR N+1. For example, there were 18 golfers for age 26, so I took the sum of each harmonic mean of rounds and divided each golfer’s change in true talent by their share of the total rounds. This produced my total change in true-talent due to age for each age-group.

If a golfer had no performance in YR N-1 I used +0.08 (slightly below PGA Tour average) as their YR N-1 prior. In most cases, these players qualified via Qualifying School and +0.08 is the observed true-talent for Q-School golfers for 2009-2013. Only 8 golfers had 0 rounds in YR N-1 however.

Results:

20    -0.05    2
21    -0.06    3
22    -0.01    6
23    -0.05    8
24    -0.07    9
25    -0.11    11
26    -0.13    18
27    -0.13    23
28    -0.14    29
29    -0.12    36
30    -0.13    34
31    -0.11    39
32    -0.12    36
33    -0.11    34
34    -0.13    34
35    -0.12    36
36    -0.11    37
37    -0.10    42
38    -0.08    26
39    -0.05    30
40    -0.01    21
41    0.03    35
42    0.07    28
43    0.12    19
44    0.13    17
45    0.15    13
46    0.21    17
47    0.25    19
48    0.31    13
49    0.36    12
50    0.35    9
51    0.45    4
52    0.47    2

bayesian aging

Discussion:

The curve generated is very similar to that of the prior study regressing to a mean of +0.00. The peak is slightly lower and the decline is deeper in the late 40s, but otherwise this study supports my prior conclusion of aging with a peak in the mid 30s and subsequent decline.

Advertisements

What Stats Don’t Suggest

Posts like this really agitate me as a golf fan interested in analytics. I’m not sure what annoys me more: the complete disregard for stats as more than trivia or the complete lack of insight it provides. Whichever, this post is a perfect example of everything that’s wrong with how people talk about golf stats. Nothing in that in post has even a whiff of predictive value; there’s no attempt to actually figure out what the stats suggest about players who should be more successful at Summerlin. Instead, we get pseudo-insight like “par 4 scoring demands our attention” and “all five winners ranked inside the top 20 in Strokes Gained Putting”. Well, of course par 4 scoring is important – over half the holes on every course on Tour are par 4s. Guys who can’t score on par 4s can’t be successful on Tour. And of course strokes gained putting is important. Gaining strokes on the field is a certain prerequisite for winning or contending in a tournament.

That’s enough picking on Rob Bolton, who might have a handle on fantasy golf, but is monumentally out of his league when forced to discuss stats. What this post is for is to discuss how successful a golfer has to be at putting to contend at or win a golf tournament. There’s nothing predictive here; everything I’m going to talk about is descriptive.  I downloaded the per tournament results for every 2013 PGA Tour player, including their finishing position and their average Strokes Gained Putting/round during the tournament. Using that data I found how successfully golfers who finished highly putted. Results below in bullet-form.

  • Tournament winners exceeded their season average for Strokes Gained Putting/round by 1.3 strokes/round – 5.2 strokes/tournament.
  • Those finished T10 or better exceeded their season average for SGP by 0.9 strokes/round – 3.6 strokes/tournament.
  • Tournament winners averaged 1.44 strokes gained/round; while those T10 or better averaged 0.92 strokes gained/round.
  • Tournament winners averaged finishing 13th in Strokes Gained Putting for the week while those T10 or better averaged finished 27th for the week.

Clearly, putting very well is necessary to contend for or win a tournament. Nothing in that is novel in the least. Nothing about that is predictive in the least. It just indicates that guys who win do so because they’re playing and putting better than they normally do. Claiming that SGP is important this week ignores completely that it’s important every single week. Moreover, it’s not more important in birdie-fests like this week. SGP counts strokes gained on the field. Golfers this week are going to hole a lot more putts than normal on Tour both because they’ll have disproportionately closer putts and putts of a certain length will be disproportionately easier than normal. However, that just means the threshold for gaining putts on the field is higher.

The Aging Curve for PGA Tour Golfers (Part II)

Yesterday I posted the results of my study on aging among PGA Tour members. You can read the methodology at the link, but basically it compared pairs of seasons by age to find how much a player should be expected to improve or decline based solely on age (I included a mechanism to regress performance in an attempt to find “true talent”).  At the end I said I’d like to try a different regression mechanism that I hoped would produce a more accurate representation of true talent.

I’ve found before that it’s correct to regress PGA Tour performance around 30% to the mean to find true talent. However, that’s most accurate for golfers who play something like a full season (ie, 50-100 rounds worldwide/season). For regular Tour members, regressing 30% is correct, but for golfers playing only partial seasons it’s likely not regressing enough. A performance over 20 rounds is more likely to be the product of luck than a performance over 60 rounds. That’s problematic for this study because it doesn’t regress more extreme good or bad performances enough to the mean. You’ll see the errors that result when I compare the two studies below.

In prior research comparing sets of rounds [1], I’ve found that adding 25.5 rounds of average (0.00) performance properly regresses a performance to the mean. This means for a player with around 60 rounds, the 30% figure quoted above is accurate. For those playing more, like Brendon de Jonge’s 118 rounds in 2012, regressing 30% is way too much. We know a lot more about de Jonge’s true talent in 118 rounds than we do about, say, Jason Day’s 60 round sample in 2012, enough to regress de Jonge only 18%. Similarly, Hideki Matsuyama’s 26 major tour rounds in 2013 tell us much less about his true talent, and by adding 25.5 rounds of average he gets regressed 50% to the mean.

Sample & Design:

The same sample and methodology as the above quoted study were used, except instead of regressing using the equation True Talent=(.6944*Observed)+0.01, I simply added 25.5 rounds of average performance to every observed performance: True Talent=((Observed Z*Observed N)/(Observed N + 25.5)).

I still did not weight my data.

Results:
age         delta      N
19           0.02        3
20           -0.02      2
21           -0.03      4
22           0.01        8
23           -0.03      8
24           -0.01      11
25           -0.06      16
26           -0.02      23
27           -0.01      30
28           -0.01      39
29           -0.03      46
30           0.04        45
31           0.00        49
32           -0.01      44
33           -0.02      43
34           0.04        46
35           0.01        46
36           -0.02      49
37           0.01        51
38           0.04        38
39           0.03        34
40           0.03        38
41           0.05        40
42           0.03        28
43           0.01        27
44           0.04        21
45           0.10        18
46           0.00        28
47           0.03        22
48           0.06        15
49           0.03        16
50           0.02        10
51           0.00        6
52           0.07        2

aging w25.5regression

The smoothed curve averages the improvement of year N-1, N, and N+1.

The results here were much different using a more accurate regression mechanism. There is an observed slow increase in true talent of around -0.02/season from 19 to 29. Between 30 and 37 the curve is more or less flat, declining almost imperceptibly. Beginning in the late 30s is the steady decline of around 0.04/season that was also observed (though to a greater extent) in the previous study.

Discussion:
With this more accurate methodology, I think the previous study can be discarded. There IS age related improvement in a golfer’s twenties. Golfers tend to peak between 29 and 34, with a sharp decline around 38 onwards. This study does not necessarily disprove my prior hypothesis that there is a decline based on lessened commitment to practice/preparation among the more transient PGA Tour members, but it certainly means there is a larger improvement in the 20s being observed among the more permanent members.

[1] This study ordered PGA Tour rounds for a large group of golfers over a full-season from oldest to newest. I then selected two samples – one comprised of the even number rounds and one of odd number rounds – and compared them to see how predictive one half was of the other. I expect to reproduce that study with a larger sample of seasons and golfers soon.

Predicting the Professional Performance of Collegiate Golfers (Part IV)

Earlier this week I posted the latest version of a study measuring how collegiate golfers perform in their first professional season compared to their average Sagarin rating in college. That study used every season of collegiate data, but considering golfers typically improve from freshman to senior year do the final two seasons of college predict pro performance better than using up to four seasons?

My sample was the same as the previous study linked above, except I used only the final two seasons of collegiate competition. For golfers like Rickie Fowler, who played only two seasons, the observed college performance didn’t change. For others it did.

N=52. Average Sagarin rating=70.47. Average pro performance=+0.15.

college golf regression 4

The results were slightly less predictive (R^2=0.294, R=0.54) than using all four seasons of data (R^2=0.356, R=0.59), suggesting that including the earlier data provides some value in predicting later results. I would guess this is because the college season is so short (around 40 rounds); using four seasons provides twice the sample size and a more reliable observation of performance, even if the overall performance was worse. For the record, using only the final season gives R^2=0.205, R=0.45.

The Aging Curve for PGA Tour Golfers

This is a short study I conducted on the typical aging curve for PGA Tour golfers. I stress again, this is the typical aging curve for the average PGA Tour member. As I discuss below, it is not likely to reflect the aging curves of the most elite golfers.

Sample & Design:
All PGA Tour golfers who in Year 1 played in >20 PGA Tour [1] rounds and who in Year 2 played at least 1 round of golf worldwide. I studied 2009-2010, 2010-2011, 2011-2012, and 2012-2013. My sample included 916 pairs of seasons.

I then compared these golfers in all worldwide rounds in Year 1 and in Year 2. I regressed each Year 1 and Year 2 to PGA Tour Average (0.00) using the equation Y=(.6944*X)+0.01. I regressed because I want the best estimate of a golfer’s “true talent”. Golf performance is heavily influenced by luck; over a normal 85 round season, a golfer’s displayed performance represents approximately 70% skill and 30% luck.

The delta of Year 2 – Year 1 provided my comparison point. I did not weight my data.

I included only golfers who appeared in >20 PGA Tour rounds in Year 1 because it is rare for a golfer to accumulate >20 PGA Tour rounds and subsequently fail to record a single round worldwide because of the nature of the international golf tour structure. Golfers who fail to re-qualify for the PGA Tour almost always are able to play on the Web.com Tour the following season. If I had used all golfers with >20 rounds in Year 1, many golfers who performed poorly on the Web.com Tour would’ve fallen completely out of my sample because they would have played on minor tours for which I do not gather data. By measuring only PGA Tour players I ensure that no matter how lucky or unlucky, good or bad a player was in Year 1, it’s very likely they will be included in the data for Year 2.

AGE       delta      N
19           0.02        3
20           -0.02      2
21           -0.01      4
22           0.00        8
23           -0.02      8
24           0.02        11
25           -0.04      16
26           0.00        23
27           0.01        30
28           0.00        42
29           -0.01      47
30           0.04        45
31           0.01        49
32           0.00        45
33           -0.01      44
34           0.04        46
35           0.00        47
36           0.00        51
37           0.06        51
38           0.04        38
39           0.05        35
40           0.02        39
41           0.07        41
42           0.04        29
43           0.00        27
44           0.07        22
45           0.13        18
46           0.07        28
47           0.09        22
48           0.08        15
49           0.08        16
50           0.13        11
51           0.01        6
52           0.04        2

aging2

Discussion:
The aging curve for this sample is basically flat from the age 21 to age 34, with a significant year-by-year decline beginning in the late 30s. This indicates that the golfers in this sample did not generally improve or decline due to age until the mid-30s. The sample is small until age 26, but it’s possible to observe a slight improvement of -0.01/season. From age 26-36 the decline is less than 0.01/season. From 37-47 the decline accelerates to 0.06/season. After 47, the sample is relatively small, but shows continued significant decline.

Obviously this is surprising, as I anticipated finding a normal aging curve where an athlete reaches peak performance in the late 20s before declining beginning in the mid-30s. Instead, the sample hardly improved through the late 20s and even slightly declined by the mid-30s. After that, the sample followed the sharp decline in the late 30s and 40s which is anticipated from other athletics-focused aging studies.

My main hypothesis about why golfers show no age related improvement relates to the sample I chose to work with. This study measures the typical PGA Tour professional. Most of the public is familiar with golfers who have remained on Tour for many years, decades even, like Tiger Woods, Phil Mickelson, and Ernie Els. However, the PGA Tour is a very transitory competition. Around 225 golfers play more than 20 rounds in a season, but only 125 retain full playing privileges the following seasons; the rest attempt to qualify via Q-School or, failing that, play with reduced status or on the minor league Web.com Tour. Playing on the PGA Tour is very lucrative – purses are on average ten times larger than Web.com Tour purses, meaning players earn approximately ten times more money on the PGA Tour. The Web.com Tour qualifies only the best 25 golfers to the PGA Tour every season, meaning not even 10% of the Web.com golfers receive promotion to the PGA Tour.

Because of this financial disparity, only a third of golfers who competed regularly on the Web.com Tour in 2013 earned more than the US median household income for 2013 (~$51,000). Professional golf requires endless hours of practice, separation from family/friends, and constant travel between tournament venues that regularly cover at least three or four continents. It may be that the average PGA Tour golfer just cannot handle the constant grind of professional golf and his skills slowly deteriorate from very early in his career. Because it’s unlikely that the average PGA Tour pro will even maintain their membership from year, most professional golfers face years of yo-yoing up and down between the lucrative PGA Tour and the relative penury of the Web.com Tour. Viewed like that, it’s understandable why the typical player does not improve.

Understand that there are many forces at work to produce the small improvements or declines due to age. Golfers certainly become more experienced at reading greens, making club decisions, and choosing how to play shots as they play deeper into their careers. At the same time, the athletic decline observed in other sports affects a golfer’s ability to generate club head speed or repeat their swing. Many commentators talk about how older players get the “yips” and putt worse than they did when younger. At the same time, golf requires constant dedication to practice and preparation. A golfer that isn’t prepared to commit hours to practice each day is going to watch his skills erode. It is likely that the aging curve observed above is a combination of all these factors.

Again, I have to stress how I looked at typical PGA Tour professionals. There are likely many different aging curves based on ability. I would be stunned if the aging curve for elite golfers resembled this slow decline. Golfers who are elite can expect significant and sustained rewards for high levels of performance. Elite golfers are unlikely to lose their playing privileges on the PGA Tour, so they know that by maintaining their practice and preparation they can expect to earn more than a million dollars in prize money per season plus endorsements and appearance fees. That is what fuels golfers like Mickelson and Vijay Singh to take care of their bodies, to practice, to prepare for each tournament, and to withstand the weekly grind of playing in different tournaments.

Future Work:
I’d like to follow this study up with one that does weight the data by rounds played. I’m also less comfortable with my regression technique than I would like. Instead of regressing every observed value by a fixed ~30% to the mean, I’ll regress the observed by adding a certain number of rounds of average play. For example, past work I’ve done estimates that adding 25.5 rounds of 0.00 properly regresses the observed data.

[1] – I defined PGA Tour rounds as any PGA Tour (co-)sponsored tournament plus the World Golf Championships and Majors.

Predicting Professional Performance of Collegiate Golfers (Part III)

Last month I posted several studies which measured how well collegiate golfers performed once they reached the professional level, compared to their Sagarin Rating during college. I updated my database with Challenge Tour results from 2011-2013 so this post is an update of those prior studies with slightly larger samples. Later this week I’ll post the results of a study using only the final two years of college performance to see if that predicts professional performance better.

This study uses the sample methodology as the study linked above in Part II. The sample size is 52, average # of college seasons was 3.4, average college performance was 70.8, average professional performance in Z-Score was +0.15.

college golf regression 3

 

The results were less predictive with the larger sample, but still R=0.59 stands as fairly predictive of professional performance. The equation to use is Pro Performance = (0.2113*Avg Sagarin) – 14.796.

Using this predictor my projections for several golfers who recently turned pro follow:
Justin Thomas -0.17
Chris Williams -0.03
T.J. Vogel +0.14
Cody Gribble +0.17
Pedro Figueiredo +0.18
Max Homa +0.21
Kevin Phelan +0.27
Jace Long +0.29
Steven Fox +0.43

The Intersection of Driving Distance & Accuracy

If you have ever watched televised golf I’m sure you have heard an announcer bemoan the wildness of a golfer’s drive. Tiger Woods and Phil Mickelson in particular seem to dogged by comments about how often they end up in the rough compared to the field.  However, I cannot recall hearing much talk at all about the distance golfers are hitting the ball. Now, a lot of that is due to it being easy to convey the advantage of hitting an approach shot from the fairway rather than the rough. We see the thick rough and remember the times golfers have been forced to pitch out into the fairway when they are behind obstructions. On the other hand, it’s difficult to convey the advantage hitting an approach shot from 20 yards provides to a golfer. However, that advantage is very real.

The 2013 ShotLink data shows that, on average, PGA golfers hit the green on 71% of their shots from 125-150 yards, but on only 64% of their shots from 150-175 yards. In his seminal Assessing Golfer Performance on the PGA Tour, Mark Broadie shows that, on average, a golfer will take 2.89 shots to finish a hole from 137.5 yards, but 3.00 shots to finish from 162.5 yards. In other words, driving the ball 25 yards further provides a substantial advantage in hitting greens and scoring low. There is certainly an advantage to avoiding the rough also. According to ShotLink data, golfers hit the green nearly 76% of the time from the fairway, but only 51% of the time when they missed the fairway. Birdies are 50% more likely when you hit the fairway versus the rough (21% to 14% of holes).

However, almost every golfer is forced to choose which skill – distance or accuracy – they want to attempt to excel at. Driving Distance and Driving Accuracy are strongly negatively correlated (R = -0.51), meaning that very few players perform well in both categories. For example, of the 216 golfers who exceeded 10 tournaments played or finished in the FedEx Cup top 200, Dustin Johnson ranked 1st in driving distance and 195th in driving accuracy. Rory McIlroy followed at 2nd in distance, but 181st in accuracy. Opposite those two, Russell Knox finished 1st in accuracy, but only 135th in distance, while Chez Reavie was 5th in accuracy, but only 159th in distance. As the following graph shows, only one of six PGA golfers exceed the mean for distance and accuracy (shown in red) and no one is +1 standard deviation from the mean in both distance and accuracy (shown in yellow).

2013 Driving Distance Accuracy Correlation

However, knowing that it is important to do both well, but difficult to do both well, is their one skill that predominates? To determine just how important each factor was to analyzing driving skill, I set-up a regression of driving distance and driving accuracy on a golfer’s greens in regulation (GIR). Because the courses played can vary in difficulty, I used my course adjusted stats which determines how much better or worse than field average a golfer performed each week in each stat. These adjust most slightly, but for golfers like Tiger Woods who typically play tougher courses than average the adjustment can be significant. I’ve attached a Google Doc of every PGA player to finish in the FedEx Cup top 200 plus anyone else with >10 tournaments entered showing these adjusted stats.

The results show that combining distance and accuracy predicts 50% of the variance in GIR (R^2=0.494). The p-values are highly significant and indistinguishable from zero, which certainly squares with the empirical stats provided in the second paragraph. To predict GIR, the equation is Y=(.00283*Distance in yards)+(.4418*Accuracy in %)-(.4429). Basically, hitting the ball an extra three yards is worth around 2% in driving accuracy, meaning a golfer should be indifferent to adding three yards of distance if it means giving up 2% in accuracy.

If a golfer was provided with the choice of being one standard deviation better than average in one skill and one standard deviation below average in the other skill there is almost no difference between being good at driving distance and bad at accuracy or vice-versa (63.9% for good at distance and 63.6% for good at accuracy). This shows that performing well at either skill is a legitimate path to success on Tour.

Using this equation, we can also calculate a Total Driving skill stat. The PGA Tour has such a stat, which they calculate solely by adding together a golfer’s rank in distance and accuracy. Mine simply ranks golfers based on their predicted GIR based on their driving distance and accuracy. The leader, Henrik Stenson, finished 8th in accuracy and 55th in distance, with a predicted GIR of 69.2%, meaning a golfer with average approach shot ability would’ve hit the green 69% of the time shooting from his average location. The worst golfer by this metric, Mike Weir, finished 213th in distance and 196th in accuracy, with a predicted GIR of 56.2%.

Tiger Woods, who is regularly criticized for his wayward drives, actually finishes 20th in Total Driving on the strength of his 34th ranked accuracy and 78th (above average!) accuracy. His predicted GIR was 66.6%. On the other hand, Phil Mickelson is also criticized for being wild with the driver, and he has been wild this season (58% accuracy; 163rd on Tour), but his distance has killed him nearly as much. He’s only driven it 288 yards on average (98th on Tour). As a result, he was the 149th best driver on Tour last year.

I’ve attached the predicted GIR/Total Driving stats in this Google Doc.