Many of you have probably seen there was a 24 man playoff for the 64th place in match play at the US Amateur. The USGA sent 6 foursomes off the 17th hole planning to alternate the 17th and 18th holes until one player prevailed. In the end, two of the 24 made birdie at the 17th and advanced to the 18th where one made triple and the other made bogey to win. In total, the playoff took 90 minutes to complete (26 player-holes were played).

During I suggested it would be an interesting exercise to find the optimal starting hole for the playoff. In the end, 26 individual holes were played in 90 minutes, but as I’ll find later the median expected holes played was 31 starting with the 17th and 35% of the time the playoff was expected to extend past the 18th hole.

To determine the optimal hole, I looked at three different outcome statistics:

- median holes played (counting each time a player played a hole as a hole played, the minimum is 24 if there is one winner after the first playoff hole)
- don’t return to the starting hole (at the least the playoff ends after two holes)
- stay under 50 holes played total

These are arbitrary and I’m sure the USGA’s decision was mainly made based on proximity to the warm-up areas and clubhouse.

I used the scoring stats for the 2010 US Open at Pebble Beach as a USGA event in summer more represents the conditions than a PGA Tour event in February. The US Am scoring is available here for anyone who wants to replicate this analysis. Just organize your CSV as **hole_no, score_to_par, count** where score_to_par shows -2 for eagle, -1 for birdie, and so on.

**Results**

Based on these results in terms of limiting holes played (lowest median value) the 5th, 12th, and 17th (all par 3s) stand-out. In the case of the 5th and 17th, the par 3 is followed by a par 5. The 6th hole at 38 median holes is the least optimal.

In terms of reducing chances of returning to the starting hole (finish in two or fewer playoff holes), the 5th, 12th, and 17th again triumph with about 64-65% chance of lasting only two holes. The combination of #18-#1 comes in last with only 41% ending in two or fewer playoff holes.

The least interesting outcome is avoiding playing more than 50 holes. If 26 holes actually took 1.5 hours, a reasonable guess is that 50 holes would have taken around 3 hours – finishing as the first round of 64 match was nearing the end of the front 9.

In those terms, starting at the 5th and 6th holes gives only a 90% chance of finishing in 50 or fewer holes played, while starting at the 17th and 18th holes yields a 99.5% chance of finishing in 50 or fewer holes played.

Based on that, starting at the 17th hole ranks as one of the clear optimal options, if not the most optimal! Kudos USGA. Kudos to you too if you can keep your solution under my absurd 313 lines.

**Code**

I’ve posted my *very* for loop heavy code below for anyone who wants to replicate this.

library(dplyr)
library(tidyr)
library(readr)
# bring in prepared CSV showing data in form hole_no, score_to_par, count
scoring_data <- read_csv("pebble-beach-scoring-2010.csv")
# players qualifying, (just one player advancing here) and field size (# in playoffs)
PQ <- 1
FIELD <- 24
# calculates percentages of birdie, par, bogey, etc.
setup_scoring %
group_by(hole_no) %>%
mutate(perc = count / sum(count)) %>%
ungroup() %>%
select(-count) %>%
spread(score_to_par, perc) %>%
gather(score_to_par, perc, -hole_no) %>%
mutate(perc = ifelse(is.na(perc), 0, perc),
score_to_par = as.numeric(score_to_par)) %>%
spread(score_to_par, perc) %>%
# we'll use a random function later on so define which parts of the 0 to 1 continuum reflects probability of birdie, par, etc.
mutate(eagle = `-2`,
birdie = `-1` + eagle,
par = `0` + birdie,
bogey = `1` + par,
worse = 1 - bogey) %>%
select(hole_no, eagle:worse)
# we now enter the for loop hacks zone
# create a data frame with a row for each hole_no and competitor (24 players)
competitors <- vector("list", FIELD)
for(p in 1:FIELD) {
d %
mutate(comp = p)
competitors[[p]] <- d
}
competitors <- bind_rows(competitors)
# link consecutive holes including #18 to #1
# with some knowledge of realistic back to back holes you could expand to cover all options (for example #3 to #17 or #16 to #4)
two_holes <- vector("list", 18)
for(h in 1:18) {
d %
filter((hole_no == h | hole_no == h + 1) | (h == 18 & hole_no %in% c(1, 18)))
#
two_holes[[h]] %
mutate(start_hole = h)
}
two_holes <- bind_rows(two_holes)
# run the main simulation for loop
tictoc::tic()
#
it <- 1000
holes_data <- vector("list", 18)
#
for(h in 1:18) {
data %
filter(start_hole == h)
sim_data <- vector("list", it)
for(i in 1:it) {
# the logic here is that we're just simulating a single run of the first hole & removing anyone who does not earn the best score
# we then filter the data for the next hole and continue on
# this can be for looped as well
first_hole %
filter(hole_no == h) %>%
mutate(s = runif(n(), min = 0, max = 1),
s = ifelse(s < eagle, -2,
ifelse(s < birdie, -1,
ifelse(s < par, 0,
ifelse(s %
mutate(rk = rank(s, ties.method = "min")) %>%
filter(rk == 1) %>%
select(comp) %>%
as.list() %>%
.[[1]]
#
left_after_1 <- length(first_hole)
#
second_hole %
filter(hole_no != h & comp %in% first_hole) %>%
mutate(s = runif(n(), min = 0, max = 1),
s = ifelse(s < eagle, -2,
ifelse(s < birdie, -1,
ifelse(s < par, 0,
ifelse(s %
mutate(rk = rank(s, ties.method = "min")) %>%
filter(rk == 1) %>%
select(comp) %>%
as.list() %>%
.[[1]]
#
left_after_2 <- length(second_hole)
#
third_hole %
filter(hole_no == h & comp %in% second_hole) %>%
mutate(s = runif(n(), min = 0, max = 1),
s = ifelse(s < eagle, -2,
ifelse(s < birdie, -1,
ifelse(s < par, 0,
ifelse(s %
mutate(rk = rank(s, ties.method = "min")) %>%
filter(rk == 1) %>%
select(comp) %>%
as.list() %>%
.[[1]]
#
left_after_3 <- length(third_hole)
#
fourth_hole %
filter(hole_no != h & comp %in% third_hole) %>%
mutate(s = runif(n(), min = 0, max = 1),
s = ifelse(s < eagle, -2,
ifelse(s < birdie, -1,
ifelse(s < par, 0,
ifelse(s %
mutate(rk = rank(s, ties.method = "min")) %>%
filter(rk == 1) %>%
select(comp) %>%
as.list() %>%
.[[1]]
#
left_after_4 <- length(fourth_hole)
#
fifth_hole %
filter(hole_no == h & comp %in% fourth_hole) %>%
mutate(s = runif(n(), min = 0, max = 1),
s = ifelse(s < eagle, -2,
ifelse(s < birdie, -1,
ifelse(s < par, 0,
ifelse(s %
mutate(rk = rank(s, ties.method = "min")) %>%
filter(rk == 1) %>%
select(comp) %>%
as.list() %>%
.[[1]]
#
left_after_5 <- length(fifth_hole)
#
sixth_hole %
filter(hole_no != h & comp %in% fifth_hole) %>%
mutate(s = runif(n(), min = 0, max = 1),
s = ifelse(s < eagle, -2,
ifelse(s < birdie, -1,
ifelse(s < par, 0,
ifelse(s %
mutate(rk = rank(s, ties.method = "min")) %>%
filter(rk == 1) %>%
select(comp) %>%
as.list() %>%
.[[1]]
#
left_after_6 <- length(sixth_hole)
#
results <- tibble::tibble(a1 = left_after_1,
a2 = left_after_2,
a3 = left_after_3,
a4 = left_after_4,
a5 = left_after_5,
a6 = left_after_6,
total = (24 + a1 + a2 + a3 + a4 + a5 + a6),
ends_by = ifelse(a1 == PQ, 1,
ifelse(a2 == PQ, 2,
ifelse(a3 == PQ, 3,
ifelse(a4 == PQ, 4,
ifelse(a5 == PQ, 5,
ifelse(a6 == PQ, 6, 7)))))),
start_hole = h)
sim_data[[i]] <- results
}
holes_data[[h]] <- bind_rows(sim_data)
}
#
results <- bind_rows(holes_data)
tictoc::toc()
# calculate results based on the starting hole
hole_results %
group_by(start_hole) %>%
summarize(median_holes = median(total),
mean_holes = mean(total),
ends_in_1 = mean(ends_by < 2),
ends_in_2 = mean(ends_by < 3),
ends_in_3 = mean(ends_by < 4),
ends_in_4 = mean(ends_by < 5),
fewer_31 = mean(total < 31),
fewer_41 = mean(total < 41),
fewer_51 = mean(total %
ungroup()

## Recent Comments