Golf Analytics

How Golfers Win

Tag Archives: regression

Predicting Putting Performance by Distance

Mark Broadie’s research of the Shot Link data established a clear relationship between putt distance and % of putts made. PGA Tour pros make a very high percentage of their close putts, but only about half of their putts around 10 feet and only around one in six around 20 feet. Pros hole very few (~5%) of their longest efforts from 25 feet and beyond. That data on % of putts made for each distance now forms the backbone of the PGA Tour’s Strokes Gained Putting statistic where players are credited and debited for making or missing every putt from every distance. Over a single season Strokes Gained Putting is often an unreliable indicator of putting performance, particularly at the extremes and also for players who have putted much worse or much better than in previous seasons.

Putting performance is polluted by randomness; Tour players just don’t attempt enough putts over the course of the season to get an accurate picture of their underlying putting ability. However, to make accurate projections of putting ability, you need to know whether Graeme McDowell’s 0.9 putts gained this season represents more talent or more luck. I’ve broken down putting performance into four different distance buckets from the PGA Tour data: putts inside 5 feet, 5-15 footers, 15-25 footers, and putts outside 25 feet. The results show that putting performance is far more predictable and consistent at the short distances. Long putting is so noisy that it’s difficult to say anyone gains much of an advantage from their long putting over the long-term.

Inside 5 Feet:

These putts are almost always converted (average 96%). The spread in performance between 2011-14 was 93% to 99%. The spread in expected performance derived from weighting the previous four seasons is 94.3% to 97.8%. This indicates that we should expect every regular Tour player’s true talent from inside 5 feet to fall somewhere inside that 3.5% range. Based on an average of over 900 putts attempted inside 5 feet over a season, we should expect every regular Tour player’s talent in terms of putts gained or lost to fall between +0.2/round and -0.3/round.

The graph below shows the correlation between a three year average (2011-13) and 2014 performance for all players with qualifying rounds in all four seasons. The correlation (R=0.56) between prior performance and 2014 performance is strongest in this distance range.

inside5feet

5-15 foot Putts:

This length is either short birdie putts or par putts after a scrambling shot that are converted approximately half the time. The spread in performance between 2011-14 was 36% to 54%. The spread in expected performance derived from weighting the previous four seasons is 40% to 52%. Based on around 450 putts attempted from 5-15 feet over a season, we should expect every regular Tour player’s talent in terms of putts gained or lost to fall between +0.4/round and and -0.5/round. Compare that to the best putters on Tour gaining about 0.75 putts/round.

The correlation between three year average and 2014 performance is below. The correlation (R=0.53) is similar to that for the short <5 foot putts.

5-15 footers

15-25 foot Putts:

These length are normally longer birdies putts and are converted about 16% of the time. The spread in performance between 2011-14 was 8% to 26%. The spread in expected performance derived from weighting the previous four seasons is 12% to 20%. Based on around 225 putts attempted from 15-25 feet over a season, we should expect every regular Tour player’s talent in terms of putts gained or lost to fall between +0.15/round and and -0.15/round. There’s much less at stake from this range than the previous two, just because so few putts are attempted from 15-25 feet.

The correlation between three year average and 2014 performance is below. There’s not much of a relationship (R=0.28), showing that putting performance from this range is much more affected by random chance over a full season than the shorter length putts.

15-25 footers

Putts outside 25 feet:

These length are the longest birdie putts, often really lag putts just to get it close for par. The spread in performance between 2011-14 was 2% to 13%. The spread in expected performance derived from weighting the previous four seasons is 4% to 9%. Based on around 300 putts attempted from beyond 25 feet over a season, we should expect every regular Tour player’s talent in terms of putts gained or lost to fall between +0.1/round and and -0.1/round. Again, there’s very little difference in expected performance from this distance. Even the very best long putter on Tour will gain little from these putts – over the long term.

The correlation between three year average and 2014 performance is below. There’s almost no relationship (R=0.10), which means it’s almost impossible to predict how well a player will putt on these long putts. The top ten long putters from 2011-13 average hitting 7.6% of their putts (versus 5.5% average). They only hit 6.7% of their putts in 2014 – a regression of almost 50% to the mean.

outside25ft

The Big Picture:

This graph shows performance in all four ranges. The longer putts show little relationship to future performance, while the shorter putts do show a more consistent relationship. This means that players who gained a lot of putts last season based off their longer putts will start making putts at a lower rate, while those who gained a lot of putts based on shorter putts are better bets to retain that putting ability.

bigpicture

Most Improved Putters from 5-15 feet in 2014:

1. Graeme McDowell

2. Charley Hoffman

3. Billy Horschel

4. Justin Leonard

5. Michael Thompson

These guys have a better chance of retaining their putting performance into 2015.

Most Improved Putters from > 25 feet in 2014:

1. Rory McIlroy

2. Y.E. Yang

3. David Toms

4. Brendan Steele

5. Brian Gay

These guys look likely to regress in terms of putting performance, especially McIlroy who performed to career average on all other putts, but hit 8% more of his long putts – gaining almost a third of a putt per round over his career average.

Advertisements

Putting Driven Performance Changes are Illusory

Last week I posted about how repeatable performance on different shot types was from season to season. Tee to green play is more repeatable than putting which is more repeatable than scrambling. That makes sense once you realize that golfers play 2-3 times more tee to green shots than meaningful putts in a round; there’s just more inherent randomness in a season’s worth of putts than in a season’s worth of tee to green shots. Golfers play even fewer scrambling shots resulting in even more randomness in a season’s worth of scrambling.

Last month I also examined how repeatable small samples (4-8 tournaments) of putting performances are, in the context of discussing why I expected Jimmy Walker’s performance to regress to the mean. That micro-study indicated that there was very little correlation between a golfer’s performance in a 4-8 tournament sample of putts and the following 4-8 tournament sample of putts. In the whole, performances in such short samples regress almost entirely to the mean.

Those two lines of inquiry led me to examine whether putting was more random than tee to green performance. I have always believed that improvements/declines that were driven by over-performance in putting were less real than those driven by tee to green over-performance, but I had never actually tested that hypothesis. The key question is whether changes in performance driven by putting are less persistent than those driven by tee to green play. That is when a golfer performs better over the first half of a season, and much of the improvement can be traced back to an improvement in his putting stats, will that golfer continue to perform better in the second half of the season? The evidence says changes in performance driven by putting are more illusory than changes in performance driven by tee to green play.

Design:

I gathered the tournament by tournament overall, tee to green, and putting performances of all PGA Tour golfers in rounds measured by the ShotLink system for 2011-Present. I divided those rounds into roughly half-season chunks (January-May 2011, May-November 2011, January-May 2012, May-November 2012, January-May 2013, May-September 2013, October 2013-Present). Each chunk included around 15-18 tournaments. I considered all golfers who recorded at least 20 rounds in consecutive half-season chunks.

To measure putting performance I used the PGA Tour’s Strokes Gained Putting stat and to measure tee to green performance I used my own overall ratings with putting performance subtracted out. This methodology is consistent with my measurement of tee to green performance in numerous recent work.

Half-Season Correlations by Shot Type:

First, I measured how repeatable putting and tee to green performance was between half-season samples, much like the full-season samples used in this study. I included all golfers with at least 20 rounds in consecutive half-season samples and compared each half-season to the half-season that directly followed, including 2nd halves to 1st halves of following calendar years. This yielded samples of ~800 golfers for both tee to green and putting. Graphs are below.

half tee to green

half putting

Tee to green performance was again more repeatable than putting performance. In the study linked above consecutive full-seasons of tee to green performance were correlated at a R=0.69 level. I found a correlation of R=0.62 between consecutive half-seasons, understandably less given the smaller number of rounds/shots played. The full-season correlation for putting was R=0.55. Half-season putting performances were similarly less correlated than full-seasons at R=0.40. Both these findings are consistent with the understanding that randomness between samples increases when fewer rounds/shots are compared. Most importantly, putting is less repeatable than tee to green play.

Persistence of Changes in Performance by Shot Type:

Next, I measured how persistent changes in performance are when considering putting and tee to green play. That is, when a golfer improves their putting over a half-season sample, how much of that performance is retained in the following half-season? If 100% of the performance is retained, changes in putting performance over a half-season entirely represent a change in true talent. If 0% of the performance is retained, changes in putting performance over a half-season entirely represent randomness. The same for tee to green play. My assumption was that a larger percent of performance would be retained for tee to green play than putting, meaning that half-season samples of putting are more affected by randomness than half-seasons of tee to green play.

To measure the effect, I first established prior expectations of performance for every golfer in my sample. I simply averaged performance in tee to green play and putting for the three years prior to the beginning of each half-season sample. For example, for the May-November 2011 sample, I averaged play between May 2008 and May 2011. This is not an ideal measure of performance, but it provides a consistent baseline for comparisons to be made.

I removed all golfers from the sample who had no prior performances. This reduced my sample to around 750 consecutive half-seasons.

The values I compared were the initial delta (Prior minus 1st Half-season) and the subsequent delta (Prior minus 2nd Half-season). Using this method I can find how persistent a change in performance is between to half-seasons. I did this considering putting and tee to green play. Graphs are below.

persist tee to green

persist putting

Changes in tee to green play were twice as persistent as changes in putting play, meaning golfers who improved their tee to green play retained twice as much of those improvements as golfers who improved a similar amount in putting. Golfers maintained around 60% of their tee to green improvements, but only 30% of their putting improvements. This indicates that putting performances regress more sharply to prior expectations than tee to green performances.

Are Putting Performances More Illusory?

Finally, I gathered the data from above to measure whether changes in performance driven by putting less real than changes in performance driven by tee to green play. I ran a linear regression using the initial delta for overall performance and the initial delta for putting performance as independent variables and the subsequent delta for overall performance as the dependent variable. In short, given a certain overall change in performance and a certain change in putting performance over the first half-season, how much of that overall change in performance is retained over the second half-season?

As the following table shows golfers retain much more of their improvement or decline when that improvement or decline occurred in tee to green shots than if it occurred in putting. The columns show improvements/declines in overall play (considering all shots) and the rows show improvements/declines solely in putting. The table shows that a golfer who improves overall by 0.50 strokes will retain only a quarter of their improvement if all of the improvement was due to putting (0.50), while they will retain over half of their improvement if none of the improvement was due to putting (0.00). The equation used to produce this chart is Subsequent Delta = (0.56 * Initial Overall Delta) – (0.28 * Initial Putting Delta).

delta comparisons

Discussion:

These findings should fundamentally alter how we discuss short-term changes in performance. I’ve already shown repeatedly that performances better than prior expectation will regress to the mean over larger samples. That idea is consistent across sports analytics. However, these findings indicate that the amount of regression depends on which part of a golfer’s game is improving or declining. Golfers who improve on the basis of putting are largely getting lucky and will regress more strongly to the mean than golfers who are improve on the basis of the tee to green game. Those who improve using the tee to green game are showing more robust improvements which should be expected to be more strongly retained.

The golfers who represent either side of this for the 2014 season are Jimmy Walker and Patrick Reed. I’ve discussed both in the past month, alluding to how Walker’s improvements were almost entirely driven by putting and how Reed’s were mostly driven by tee to green play. Based off these findings, Reed is more likely to retain his improvements over the rest of the season, all else being equal, than Walker.

 

All graphs/charts are denominated in strokes better or worse than PGA Tour average. Negative numbers indicate performances better than PGA Tour average.

Regression Rules Everything

This post will be number/graph heavy, but it explains perhaps the most important concept in predicting golf performance – everyone regresses to the mean, no matter their performance. The below are two charts that show this effect in action. The first uses large buckets and compares all players performance in seasons with N > 50 rounds with their performance (regardless of N) in the subsequent season. The following shows similar data, broken down more at a more granular level, which also includes which percentage of seasons meet the criteria. Read the buckets as seasons within 0.05 standard deviations.

initialtosubseqseasons

tableofsubsequentseasons

In the first graph, all golfers better than +0.30 (approximately Web.com Tour average) in year 1 declined in year 2. Those worse (think Challenge Tour average) did not improve or decline, on average. Only those who performed very poorly in year 1 actually improved. For those better than PGA Tour average, the decline was fairly uniform (~0.05 to ~0.10 standard deviations). Remember, these are the aggregation of huge samples; many players improved at all skill levels, but on average regression/decline ruled everything.

In the second graph, the most important lesson is how rare the truly elite seasons are. Only roughly 1/4 of seasons came in below -.15 (which is roughly the talent level of the average PGA Tour card holder). The cut-off for the top 5% of seasons (2010-2012) came in at -0.45. Also, the regression of almost all players is evident; no bucket better than +0.35 improved in the subsequent season.

This data is fairly strong evidence that we should expect decline from most performances, on average. In fact, based on the rarity of rounds and the demonstrated regression, we should be skeptical about predicting any elite performance to be repeated the following season.

The Aging Curve for PGA Tour Golfers (Part III) – Using Bayesian Prior

Several weeks ago I posted a two studies on aging among PGA Tour golfers, the most recent of which compared sequential seasons, regressing both seasons to PGA Tour average based on the number of rounds a golfer had played in the seasons. DSMok1 suggested modifying the amount and degree of regression by including a better prior, which makes more sense than regressing every golfer to the same mean. Instead of simply adding 25.5 rounds of average play to each golfer’s season, I found a Bayesian prior based on play in the prior season and measured the change in performance from that prior in the following season.

Sample and Design:

I included every player with >20 PGA Tour rounds in a season for 2010, 2011, and 2012. This limited my sample to 703 seasons. I then gathered data for YR N-1, YR N, and YR N+1 (ie, 2009, 2010, and 2011 for golfers with >20 rounds in 2010) on all major Tours (PGA, European, Web.com, and Challenge).

Using the equation ((prior mean/prior variance)+(observed mean/observed variance))/((1/prior variance)+(1/observed variance)) I found my prior expectation on performance, inputting data from YR N-1 for prior mean and variance and from YR N for observed mean and variance. That equation adjusts the observed performance based on what we’ve observed in the prior season to generate a true-talent level (True YR N) for YR N+1. I used the same equation to find the true-talent level for YR N+1. I inputted the prior generated from YR N-1 and YR N as the prior mean and the data for YR N+1 as the observed mean. This produced True YR N+1. I then compared both True YR N and True YR N+1to find the change in true-talent for each age group.

I weighted the results using the harmonic mean rounds played in YR N and YR N+1. For example, there were 18 golfers for age 26, so I took the sum of each harmonic mean of rounds and divided each golfer’s change in true talent by their share of the total rounds. This produced my total change in true-talent due to age for each age-group.

If a golfer had no performance in YR N-1 I used +0.08 (slightly below PGA Tour average) as their YR N-1 prior. In most cases, these players qualified via Qualifying School and +0.08 is the observed true-talent for Q-School golfers for 2009-2013. Only 8 golfers had 0 rounds in YR N-1 however.

Results:

20    -0.05    2
21    -0.06    3
22    -0.01    6
23    -0.05    8
24    -0.07    9
25    -0.11    11
26    -0.13    18
27    -0.13    23
28    -0.14    29
29    -0.12    36
30    -0.13    34
31    -0.11    39
32    -0.12    36
33    -0.11    34
34    -0.13    34
35    -0.12    36
36    -0.11    37
37    -0.10    42
38    -0.08    26
39    -0.05    30
40    -0.01    21
41    0.03    35
42    0.07    28
43    0.12    19
44    0.13    17
45    0.15    13
46    0.21    17
47    0.25    19
48    0.31    13
49    0.36    12
50    0.35    9
51    0.45    4
52    0.47    2

bayesian aging

Discussion:

The curve generated is very similar to that of the prior study regressing to a mean of +0.00. The peak is slightly lower and the decline is deeper in the late 40s, but otherwise this study supports my prior conclusion of aging with a peak in the mid 30s and subsequent decline.

The Aging Curve for PGA Tour Golfers (Part II)

Yesterday I posted the results of my study on aging among PGA Tour members. You can read the methodology at the link, but basically it compared pairs of seasons by age to find how much a player should be expected to improve or decline based solely on age (I included a mechanism to regress performance in an attempt to find “true talent”).  At the end I said I’d like to try a different regression mechanism that I hoped would produce a more accurate representation of true talent.

I’ve found before that it’s correct to regress PGA Tour performance around 30% to the mean to find true talent. However, that’s most accurate for golfers who play something like a full season (ie, 50-100 rounds worldwide/season). For regular Tour members, regressing 30% is correct, but for golfers playing only partial seasons it’s likely not regressing enough. A performance over 20 rounds is more likely to be the product of luck than a performance over 60 rounds. That’s problematic for this study because it doesn’t regress more extreme good or bad performances enough to the mean. You’ll see the errors that result when I compare the two studies below.

In prior research comparing sets of rounds [1], I’ve found that adding 25.5 rounds of average (0.00) performance properly regresses a performance to the mean. This means for a player with around 60 rounds, the 30% figure quoted above is accurate. For those playing more, like Brendon de Jonge’s 118 rounds in 2012, regressing 30% is way too much. We know a lot more about de Jonge’s true talent in 118 rounds than we do about, say, Jason Day’s 60 round sample in 2012, enough to regress de Jonge only 18%. Similarly, Hideki Matsuyama’s 26 major tour rounds in 2013 tell us much less about his true talent, and by adding 25.5 rounds of average he gets regressed 50% to the mean.

Sample & Design:

The same sample and methodology as the above quoted study were used, except instead of regressing using the equation True Talent=(.6944*Observed)+0.01, I simply added 25.5 rounds of average performance to every observed performance: True Talent=((Observed Z*Observed N)/(Observed N + 25.5)).

I still did not weight my data.

Results:
age         delta      N
19           0.02        3
20           -0.02      2
21           -0.03      4
22           0.01        8
23           -0.03      8
24           -0.01      11
25           -0.06      16
26           -0.02      23
27           -0.01      30
28           -0.01      39
29           -0.03      46
30           0.04        45
31           0.00        49
32           -0.01      44
33           -0.02      43
34           0.04        46
35           0.01        46
36           -0.02      49
37           0.01        51
38           0.04        38
39           0.03        34
40           0.03        38
41           0.05        40
42           0.03        28
43           0.01        27
44           0.04        21
45           0.10        18
46           0.00        28
47           0.03        22
48           0.06        15
49           0.03        16
50           0.02        10
51           0.00        6
52           0.07        2

aging w25.5regression

The smoothed curve averages the improvement of year N-1, N, and N+1.

The results here were much different using a more accurate regression mechanism. There is an observed slow increase in true talent of around -0.02/season from 19 to 29. Between 30 and 37 the curve is more or less flat, declining almost imperceptibly. Beginning in the late 30s is the steady decline of around 0.04/season that was also observed (though to a greater extent) in the previous study.

Discussion:
With this more accurate methodology, I think the previous study can be discarded. There IS age related improvement in a golfer’s twenties. Golfers tend to peak between 29 and 34, with a sharp decline around 38 onwards. This study does not necessarily disprove my prior hypothesis that there is a decline based on lessened commitment to practice/preparation among the more transient PGA Tour members, but it certainly means there is a larger improvement in the 20s being observed among the more permanent members.

[1] This study ordered PGA Tour rounds for a large group of golfers over a full-season from oldest to newest. I then selected two samples – one comprised of the even number rounds and one of odd number rounds – and compared them to see how predictive one half was of the other. I expect to reproduce that study with a larger sample of seasons and golfers soon.

The Aging Curve for PGA Tour Golfers

This is a short study I conducted on the typical aging curve for PGA Tour golfers. I stress again, this is the typical aging curve for the average PGA Tour member. As I discuss below, it is not likely to reflect the aging curves of the most elite golfers.

Sample & Design:
All PGA Tour golfers who in Year 1 played in >20 PGA Tour [1] rounds and who in Year 2 played at least 1 round of golf worldwide. I studied 2009-2010, 2010-2011, 2011-2012, and 2012-2013. My sample included 916 pairs of seasons.

I then compared these golfers in all worldwide rounds in Year 1 and in Year 2. I regressed each Year 1 and Year 2 to PGA Tour Average (0.00) using the equation Y=(.6944*X)+0.01. I regressed because I want the best estimate of a golfer’s “true talent”. Golf performance is heavily influenced by luck; over a normal 85 round season, a golfer’s displayed performance represents approximately 70% skill and 30% luck.

The delta of Year 2 – Year 1 provided my comparison point. I did not weight my data.

I included only golfers who appeared in >20 PGA Tour rounds in Year 1 because it is rare for a golfer to accumulate >20 PGA Tour rounds and subsequently fail to record a single round worldwide because of the nature of the international golf tour structure. Golfers who fail to re-qualify for the PGA Tour almost always are able to play on the Web.com Tour the following season. If I had used all golfers with >20 rounds in Year 1, many golfers who performed poorly on the Web.com Tour would’ve fallen completely out of my sample because they would have played on minor tours for which I do not gather data. By measuring only PGA Tour players I ensure that no matter how lucky or unlucky, good or bad a player was in Year 1, it’s very likely they will be included in the data for Year 2.

AGE       delta      N
19           0.02        3
20           -0.02      2
21           -0.01      4
22           0.00        8
23           -0.02      8
24           0.02        11
25           -0.04      16
26           0.00        23
27           0.01        30
28           0.00        42
29           -0.01      47
30           0.04        45
31           0.01        49
32           0.00        45
33           -0.01      44
34           0.04        46
35           0.00        47
36           0.00        51
37           0.06        51
38           0.04        38
39           0.05        35
40           0.02        39
41           0.07        41
42           0.04        29
43           0.00        27
44           0.07        22
45           0.13        18
46           0.07        28
47           0.09        22
48           0.08        15
49           0.08        16
50           0.13        11
51           0.01        6
52           0.04        2

aging2

Discussion:
The aging curve for this sample is basically flat from the age 21 to age 34, with a significant year-by-year decline beginning in the late 30s. This indicates that the golfers in this sample did not generally improve or decline due to age until the mid-30s. The sample is small until age 26, but it’s possible to observe a slight improvement of -0.01/season. From age 26-36 the decline is less than 0.01/season. From 37-47 the decline accelerates to 0.06/season. After 47, the sample is relatively small, but shows continued significant decline.

Obviously this is surprising, as I anticipated finding a normal aging curve where an athlete reaches peak performance in the late 20s before declining beginning in the mid-30s. Instead, the sample hardly improved through the late 20s and even slightly declined by the mid-30s. After that, the sample followed the sharp decline in the late 30s and 40s which is anticipated from other athletics-focused aging studies.

My main hypothesis about why golfers show no age related improvement relates to the sample I chose to work with. This study measures the typical PGA Tour professional. Most of the public is familiar with golfers who have remained on Tour for many years, decades even, like Tiger Woods, Phil Mickelson, and Ernie Els. However, the PGA Tour is a very transitory competition. Around 225 golfers play more than 20 rounds in a season, but only 125 retain full playing privileges the following seasons; the rest attempt to qualify via Q-School or, failing that, play with reduced status or on the minor league Web.com Tour. Playing on the PGA Tour is very lucrative – purses are on average ten times larger than Web.com Tour purses, meaning players earn approximately ten times more money on the PGA Tour. The Web.com Tour qualifies only the best 25 golfers to the PGA Tour every season, meaning not even 10% of the Web.com golfers receive promotion to the PGA Tour.

Because of this financial disparity, only a third of golfers who competed regularly on the Web.com Tour in 2013 earned more than the US median household income for 2013 (~$51,000). Professional golf requires endless hours of practice, separation from family/friends, and constant travel between tournament venues that regularly cover at least three or four continents. It may be that the average PGA Tour golfer just cannot handle the constant grind of professional golf and his skills slowly deteriorate from very early in his career. Because it’s unlikely that the average PGA Tour pro will even maintain their membership from year, most professional golfers face years of yo-yoing up and down between the lucrative PGA Tour and the relative penury of the Web.com Tour. Viewed like that, it’s understandable why the typical player does not improve.

Understand that there are many forces at work to produce the small improvements or declines due to age. Golfers certainly become more experienced at reading greens, making club decisions, and choosing how to play shots as they play deeper into their careers. At the same time, the athletic decline observed in other sports affects a golfer’s ability to generate club head speed or repeat their swing. Many commentators talk about how older players get the “yips” and putt worse than they did when younger. At the same time, golf requires constant dedication to practice and preparation. A golfer that isn’t prepared to commit hours to practice each day is going to watch his skills erode. It is likely that the aging curve observed above is a combination of all these factors.

Again, I have to stress how I looked at typical PGA Tour professionals. There are likely many different aging curves based on ability. I would be stunned if the aging curve for elite golfers resembled this slow decline. Golfers who are elite can expect significant and sustained rewards for high levels of performance. Elite golfers are unlikely to lose their playing privileges on the PGA Tour, so they know that by maintaining their practice and preparation they can expect to earn more than a million dollars in prize money per season plus endorsements and appearance fees. That is what fuels golfers like Mickelson and Vijay Singh to take care of their bodies, to practice, to prepare for each tournament, and to withstand the weekly grind of playing in different tournaments.

Future Work:
I’d like to follow this study up with one that does weight the data by rounds played. I’m also less comfortable with my regression technique than I would like. Instead of regressing every observed value by a fixed ~30% to the mean, I’ll regress the observed by adding a certain number of rounds of average play. For example, past work I’ve done estimates that adding 25.5 rounds of 0.00 properly regresses the observed data.

[1] – I defined PGA Tour rounds as any PGA Tour (co-)sponsored tournament plus the World Golf Championships and Majors.

The Intersection of Driving Distance & Accuracy

If you have ever watched televised golf I’m sure you have heard an announcer bemoan the wildness of a golfer’s drive. Tiger Woods and Phil Mickelson in particular seem to dogged by comments about how often they end up in the rough compared to the field.  However, I cannot recall hearing much talk at all about the distance golfers are hitting the ball. Now, a lot of that is due to it being easy to convey the advantage of hitting an approach shot from the fairway rather than the rough. We see the thick rough and remember the times golfers have been forced to pitch out into the fairway when they are behind obstructions. On the other hand, it’s difficult to convey the advantage hitting an approach shot from 20 yards provides to a golfer. However, that advantage is very real.

The 2013 ShotLink data shows that, on average, PGA golfers hit the green on 71% of their shots from 125-150 yards, but on only 64% of their shots from 150-175 yards. In his seminal Assessing Golfer Performance on the PGA Tour, Mark Broadie shows that, on average, a golfer will take 2.89 shots to finish a hole from 137.5 yards, but 3.00 shots to finish from 162.5 yards. In other words, driving the ball 25 yards further provides a substantial advantage in hitting greens and scoring low. There is certainly an advantage to avoiding the rough also. According to ShotLink data, golfers hit the green nearly 76% of the time from the fairway, but only 51% of the time when they missed the fairway. Birdies are 50% more likely when you hit the fairway versus the rough (21% to 14% of holes).

However, almost every golfer is forced to choose which skill – distance or accuracy – they want to attempt to excel at. Driving Distance and Driving Accuracy are strongly negatively correlated (R = -0.51), meaning that very few players perform well in both categories. For example, of the 216 golfers who exceeded 10 tournaments played or finished in the FedEx Cup top 200, Dustin Johnson ranked 1st in driving distance and 195th in driving accuracy. Rory McIlroy followed at 2nd in distance, but 181st in accuracy. Opposite those two, Russell Knox finished 1st in accuracy, but only 135th in distance, while Chez Reavie was 5th in accuracy, but only 159th in distance. As the following graph shows, only one of six PGA golfers exceed the mean for distance and accuracy (shown in red) and no one is +1 standard deviation from the mean in both distance and accuracy (shown in yellow).

2013 Driving Distance Accuracy Correlation

However, knowing that it is important to do both well, but difficult to do both well, is their one skill that predominates? To determine just how important each factor was to analyzing driving skill, I set-up a regression of driving distance and driving accuracy on a golfer’s greens in regulation (GIR). Because the courses played can vary in difficulty, I used my course adjusted stats which determines how much better or worse than field average a golfer performed each week in each stat. These adjust most slightly, but for golfers like Tiger Woods who typically play tougher courses than average the adjustment can be significant. I’ve attached a Google Doc of every PGA player to finish in the FedEx Cup top 200 plus anyone else with >10 tournaments entered showing these adjusted stats.

The results show that combining distance and accuracy predicts 50% of the variance in GIR (R^2=0.494). The p-values are highly significant and indistinguishable from zero, which certainly squares with the empirical stats provided in the second paragraph. To predict GIR, the equation is Y=(.00283*Distance in yards)+(.4418*Accuracy in %)-(.4429). Basically, hitting the ball an extra three yards is worth around 2% in driving accuracy, meaning a golfer should be indifferent to adding three yards of distance if it means giving up 2% in accuracy.

If a golfer was provided with the choice of being one standard deviation better than average in one skill and one standard deviation below average in the other skill there is almost no difference between being good at driving distance and bad at accuracy or vice-versa (63.9% for good at distance and 63.6% for good at accuracy). This shows that performing well at either skill is a legitimate path to success on Tour.

Using this equation, we can also calculate a Total Driving skill stat. The PGA Tour has such a stat, which they calculate solely by adding together a golfer’s rank in distance and accuracy. Mine simply ranks golfers based on their predicted GIR based on their driving distance and accuracy. The leader, Henrik Stenson, finished 8th in accuracy and 55th in distance, with a predicted GIR of 69.2%, meaning a golfer with average approach shot ability would’ve hit the green 69% of the time shooting from his average location. The worst golfer by this metric, Mike Weir, finished 213th in distance and 196th in accuracy, with a predicted GIR of 56.2%.

Tiger Woods, who is regularly criticized for his wayward drives, actually finishes 20th in Total Driving on the strength of his 34th ranked accuracy and 78th (above average!) accuracy. His predicted GIR was 66.6%. On the other hand, Phil Mickelson is also criticized for being wild with the driver, and he has been wild this season (58% accuracy; 163rd on Tour), but his distance has killed him nearly as much. He’s only driven it 288 yards on average (98th on Tour). As a result, he was the 149th best driver on Tour last year.

I’ve attached the predicted GIR/Total Driving stats in this Google Doc.

Predicting Performance based on the Previous Weekend is Hard

Possibly the most overused idea among those involved in predicting golf is that a player’s performance the previous week’s tournament holds a lot of predictive power over their performance in the current tournament. Golfers who win are said to be “in form”, while those who struggle and miss the cut are said to be struggling with their game. However, there’s never been any evidence provided by anyone that the prior week should be factored into a prediction about the following week.

To examine whether this is actually a thing I gathered performance data from the last ten weeks of PGA tournaments (AT&T to Deutsche Bank) from a sample of over 300 golfers. If a golfer played in consecutive weeks, I added his rounds to the sample. In the end I had 497 pairs of prior week->following week data.

The results weren’t exactly promising for “in-from” advocates. The correlation between week 1 and week 2 was only R=0.19. This means if you only knew the performance of a golfer in the prior week, you would predict the following week by regressing that performance over 80% towards the mean (which was -0.07 for the sample).

Now, perhaps there is a prior week effect, but it only appears when you factor underlying skill into the regression. To test this, I took each golfer’s weighted two year Z-Score from the week following the Travelers Championship and added that variable to the prior week variable to predict the following week. The results follow:

R=0.32, y=(0.94*SKILL)+(.10*PRIOR)+0.16, both variables were statistically significant at the 95% level

However, this indicates that underlying skill is over nine times more important than the prior week in predicting the following week. In other words, to predict how a golfer will perform the following week, look at how he’s performed in the last two years rather than just how he played the week earlier.

Predicting Professional Performance of Collegiate Golfers (Part II)

Yesterday, I posted a comprehensive look at the performance of collegiate golfers during their first season in Major Tour (PGA/Web.com/European) professional golf and examined how correlated those results were to Jeff Sagarin’s Rankings at Golfweek. However, I was concerned about that study largely because I did not remove golfers who took several years to actually record >20 Major Tour rounds. It can often be very difficult for the non-elite college golfers to play regularly on the Major Tours right after graduation. Many play on the minor league tours (eGolf/NGA) or one of the international tours (Challenge/Asian/etc.). Obviously, this introduces bias into the study if we’re comparing, for example, Jordan Spieth’s season right after leaving college and Chesson Hadley’s season three years after he graduated. Part of success in pro golf is learning how to endure the grind of a season – securing entrance into tournaments, sponsors, and performing well enough to earn a living. Add to that that golfers almost always become better players from their early 20s to mid 20s, and I’m not sure I trust the reliability of yesterday’s study.

To correct for that bias, I removed all seasons from the sample that occurred more than one year after the golfer’s last in collegiate golf. For example, Keegan Bradley last competed in college in 2008, but did not record Major Tour rounds until 2010. He’s dropped from my sample along with roughly half of the seasons. I followed the same methodology as yesterday using only the seasons that met this new criteria.

N=35, average college seasons = 3.4, average Sagarin = 70.7, average pro performance in Z-Score = 0.10

college golf regession 2

The results showed a much stronger correlation than yesterday (R=0.70). In fact, this correlation is almost exactly equal to what I found earlier this week when examining the correlation between sets of professional seasons. This indicates that Sagarin’s Rankings are an extremely valuable predictor of professional success, even more so than what I found yesterday.

Predicting Professional Performance of Collegiate Players (REDUX)

Let’s start with some tough love. This was a laughably lazy attempt at performing this study. Hopefully this effort will be less awful, considering I think I have a somewhat proper sample.

Quickly, I’m looking for the correlation between Jeff Sagarin’s college golf rankings and my own Z-Score Model Ratings. Sagarin publishes the best (only?) math based ranking of college golfers. There’s obviously issues of sample size in the college game (most teams play <15 tournaments at normally three rounds), but Rankings are fairly strongly correlated between seasons (R=.62) even though players are in a volatile period in their golf development. To combat concerns about sample-size, I’ve averaged the golfer’s Ranking over their college career. This isn’t ideal either, but, again, a max of 45 rounds isn’t something I’m comfortable using.

Once I had those, I looked in my Z-Score database for the first instance of those players playing >20 rounds in one season and took their Z-Score from that season. A few concerns about this method of finding seasons: 1) if a golfer has less than 20 rounds in every season, they won’t show up at all, 2) if a golfer has less than 20 rounds before getting greater than 20 rounds in a subsequent season, that first season will be ignored, and 3) it can often be several years before a golfer accumulates >20 rounds on the PGA/Web.com/European Tours (I do not have eGolf/NGA/Challenge/Asian/etc. Tour Ratings). Of these concerns, #1 isn’t that big of a deal. Plenty of collegians don’t have the game for high level pro golf – there are less than 1,000 guys who play regularly on the three Tours I track and I’ve gathered data on the top 500 golfers from each season. #2 isn’t very concerning. The sample has to be set somewhere. #3 concerns me the most because comparing a 26 year old with three seasons of minor tour golf to a 21 year old right out of college is kind of apples and oranges, but perhaps I’ll run another study in the future that excludes those data points.

First information about my sample. N=80, all but three golfers had at least 2 seasons of college golf (average was 3.2 seasons) – Spieth, Todd Baek, and Roger Sloan had the single seasons, the average performance in college was a 71.1, and the average performance in pro golf was a +0.15 Z-Score (below average). I’ve chosen to display the pro results in terms of strokes better than/worse than average. Divide by 3 to get the corresponding Z-Score.

The results were much less irrelevant than that turd I linked above:

college golf regression

The correlation was R=.49, which indicates that we can predict pro performance on a roughly 50% Sagarin/50% mean basis. The equation to use is y=0.47x-32.9 where y is pro performance in Strokes to average (divide by 3 to get Z-Score) and x is Sagarin Rating. For comparison, I’ve found that the correlation between back-to-back professional seasons is about 70% (70% Year 1 + 30% mean) and correlation between back-to-back college seasons is about 63% (63% Year 1 + 37% mean). Based on the concerns I laid out above, I think that’s not terrible.

Unsurprisingly, this method of predicting would have misses. Sagarin did not think highly of Keegan Bradley coming out of St. John’s. Whether it was poor play or an awful schedule, Keegan averaged a 72.5 over three years in school. Keegan was one of those guys who turned pro and who took several seasons to record Major Tour rounds. He graduated in 2008 and didn’t record a Major Tour round until 2010.

I do take solace in the fact that no golfer who averaged better than a 70.0 (basically in the top 15 each season) failed to perform better than the sample average in their first season. This indicates that success in college is correlated with success in professional golf. In fact, only a single player with a Sagarin below 70.0 in the entire 2005-2013 sample (who has graduated) has failed to record 20 or more Major Tour rounds – (Arnond Vongvanij, who has exclusively played on the Asian Tour, has a professional win, and is ranked 218th by the Official World Goal Ranking).

This method predicts success for the best collegian golfer not currently in pro golf, Justin Thomas (69.2), who plans to turn pro after the Walker Cup. He’s recorded a -0.06 Z-Score in 12 rounds dating back to 2012.