Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new function calculate_stats() #470

Merged
merged 38 commits into from
Nov 10, 2024
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
d89f2f8
update stat_ids
mrcaseb Jul 2, 2024
6cea069
News and documentation
mrcaseb Jul 2, 2024
0a01a16
start `calculate_stats()`
mrcaseb Jul 2, 2024
1b1f831
rushing/receiving yards and tds should include the stat ids for laterals
mrcaseb Jul 2, 2024
6f614c2
include half sacks and corresponding yards
mrcaseb Jul 3, 2024
58d13d0
add play information to stats df
mrcaseb Jul 3, 2024
7f01b16
implement special teams tds and defensive tds
mrcaseb Jul 3, 2024
59b2e1b
try to avoid non matches because of different team abbrs
mrcaseb Jul 3, 2024
f59b2e6
implement fumble recoveries
mrcaseb Jul 3, 2024
72884c3
more stuff (I forgot what I did here tbh)
mrcaseb Aug 2, 2024
b367fb4
add pbp stats, add documentation, combine all
mrcaseb Oct 15, 2024
7942ae7
Merge branch 'master' into new-stats-approach
mrcaseb Oct 15, 2024
d3075b1
News update
mrcaseb Oct 15, 2024
d807a30
remove irrelevant comments
mrcaseb Oct 15, 2024
a679cd3
fix pkgdown
mrcaseb Oct 16, 2024
49aefe8
add season_type and team info and fix yac stats
mrcaseb Oct 16, 2024
1d99b23
add cpoe and number of games
mrcaseb Oct 16, 2024
5c07c0a
users should be able to do the rounding themselves
mrcaseb Oct 16, 2024
bb8e20c
deleted the dot
mrcaseb Oct 16, 2024
fba04c0
implement game winning field goals
mrcaseb Oct 16, 2024
627d6a6
team variable naming consistency
mrcaseb Oct 16, 2024
28db290
add variable description
mrcaseb Oct 16, 2024
1fef4d3
fix pkgdown
mrcaseb Oct 16, 2024
5e61d0d
test column names and row numbers
mrcaseb Oct 16, 2024
34bd7ea
fix check notes
mrcaseb Oct 16, 2024
60e1e4b
not all of these are def stats actually
mrcaseb Nov 3, 2024
5401559
adjust variable explainer after def modifications
mrcaseb Nov 3, 2024
b0b9dd6
Merge branch 'master' into new-stats-approach
mrcaseb Nov 3, 2024
d0f1485
falsely counted team air yards of all game instead of play
mrcaseb Nov 3, 2024
5aa1d80
Merge branch 'new-stats-approach' of https://github.com/nflverse/nflf…
mrcaseb Nov 3, 2024
8d45b29
Add punt/kickoff returns and yardage
mrcaseb Nov 3, 2024
fd648dc
add timeouts
mrcaseb Nov 3, 2024
74094b2
Version bump and news bullet for nfl_stats_variables
mrcaseb Nov 3, 2024
cb83260
document differences to old stats
mrcaseb Nov 9, 2024
2d7a0e7
use player short name from playstats
mrcaseb Nov 9, 2024
457e5d6
deprecate old stats functions
mrcaseb Nov 10, 2024
ea1fa2c
Describe variable name differences
mrcaseb Nov 10, 2024
e1f357f
snapshot test variable types
mrcaseb Nov 10, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Type: Package
Package: nflfastR
Title: Functions to Efficiently Access NFL Play by Play Data
Version: 4.6.1.9010
Version: 4.6.1.9011
Authors@R:
c(person(given = "Sebastian",
family = "Carl",
Expand Down Expand Up @@ -71,6 +71,6 @@ Suggests:
testthat (>= 3.0.0)
Encoding: UTF-8
LazyData: true
RoxygenNote: 7.3.1
RoxygenNote: 7.3.2
Roxygen: list(markdown = TRUE)
Config/testthat/edition: 3
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ import(dplyr)
import(fastrmodels)
importFrom(cli,rule)
importFrom(curl,curl_fetch_memory)
importFrom(data.table,"%between%")
importFrom(data.table,setDT)
importFrom(furrr,future_map)
importFrom(furrr,future_map_chr)
Expand Down
2 changes: 2 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@
- `punter_player_id`, and `punter_player_name` are filled for blocked punt attempts. (#463)
- Fixed an issue affecting scores of 2022 games involving a return touchdown (#466)
- Added identification of scrambles from 1999 through 2004 with thank to Aaron Schatz (#468)
- Added new function `calculate_stats()` that combines the output of all `calculate_player_stats*()` functions with a more robust and faster approach. The `calculate_player_stats*()` function will be deprecated.
- Updated the dataframe `stat_ids` with some IDs that were previously missing.

# nflfastR 4.6.1

Expand Down
263 changes: 263 additions & 0 deletions R/calculate_stats.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,263 @@
calculate_stats <- function(seasons = nflreadr::most_recent_season(),
summary_level = c("season", "week"),
stat_type = c("player", "team")){

# testing
# seasons = 2023
# summary_level = "week"
# stat_type = "player"

guga31bb marked this conversation as resolved.
Show resolved Hide resolved
summary_level <- rlang::arg_match(summary_level)
stat_type <- rlang::arg_match(stat_type)

pbp <- nflreadr::load_pbp(seasons = seasons)

playinfo <- pbp %>%
dplyr::group_by(.data$game_id, .data$play_id) %>%
dplyr::summarise(
off = nflreadr::clean_team_abbrs(posteam),
def = nflreadr::clean_team_abbrs(defteam),
special = as.integer(special == 1)
) %>%
dplyr::ungroup()

# Function defined below
# more_stats = all stat IDs of one player in a single play
# team_stats = all stat IDs of one team in a single play
# we need those to identify things like fumbles depending on playtype or
# first downs depending on playtype
playstats <- load_playstats(seasons = seasons) %>%
dplyr::group_by(.data$season, .data$week, .data$play_id, .data$gsis_player_id) %>%
dplyr::mutate(
# we append a collapse separator to the string in order to search for matches
# including the separator to avoid 1 matching 10
more_stats = paste0(paste(stat_id, collapse = ";"), ";")
) %>%
dplyr::group_by(.data$season, .data$week, .data$play_id, .data$team_abbr) %>%
dplyr::mutate(
# we append a collapse separator to the string in order to search for matches
# including the separator to avoid 1 matching 10
team_stats = paste0(paste(stat_id, collapse = ";"), ";")
) %>%
dplyr::ungroup() %>%
dplyr::left_join(
playinfo, by = c("game_id", "play_id")
)

if (stat_type == "player"){
# need newer version of nflreadr to use load_players
rlang::check_installed("nflreadr (>= 1.3.0)", "to join player information.")

player_info <- nflreadr::load_players() %>%
dplyr::select(
"player_id" = "gsis_id",
"player_display_name" = "display_name",
"player_name" = "short_name",
"position",
"position_group",
"headshot_url" = "headshot"
)
}

# Check combination of summary_level and stat_type to set a helper that is
# used to create the grouping variables
grp_id <- data.table::fcase(
summary_level == "season" && stat_type == "player", "10",
summary_level == "season" && stat_type == "team", "20",
summary_level == "week" && stat_type == "player", "30",
summary_level == "week" && stat_type == "team", "40"
)

grp_vars <- switch (grp_id,
"10" = rlang::data_syms(c("season", "player_id" = "gsis_player_id")),
"20" = rlang::data_syms(c("season", "team_abbr")),
"30" = rlang::data_syms(c("season", "week", "player_id" = "gsis_player_id")),
"40" = rlang::data_syms(c("season", "week", "team_abbr"))
)

# Silence global vars NOTE
# We do this differently here because it's only stat_id and yards it makes
# the code more readable
utils::globalVariables(c("stat_id", "yards"))

stats <- playstats %>%
dplyr::group_by(!!!grp_vars) %>%
dplyr::summarise(

# Offense #####################
completions = sum(stat_id %in% 15:16),
attempts = sum(stat_id %in% c(14:16, 19)),
passing_yards = sum((stat_id %in% 15:16) * yards),
passing_tds = sum(stat_id == 16),
passing_interceptions = sum(stat_id == 19),
sacks_suffered = sum(stat_id == 20),
sack_yards_lost = sum((stat_id == 20) * yards),
sack_fumbles = sum(stat_id == 20 & any(has_id(52, more_stats), has_id(53, more_stats), has_id(54, more_stats))),
sack_fumbles_lost = sum(stat_id == 20 & has_id(106, more_stats)),
passing_air_yards = sum((stat_id %in% 111:112) * yards),
# passing_yards_after_catch = 15:16 - 111,
passing_first_downs = sum((stat_id %in% 15:16) & has_id(4, team_stats)),
# passing_epa = ,
passing_2pt_conversions = sum(stat_id == 77),
pacr = .data$passing_yards / .data$passing_air_yards,
# dakota = ,

carries = sum(stat_id %in% 10:11),
rushing_yards = sum((stat_id %in% 10:13) * yards),
rushing_tds = sum(stat_id %in% c(11,13)),
rushing_fumbles = sum((stat_id %in% 10:11) & any(has_id(52, more_stats), has_id(53, more_stats), has_id(54, more_stats))),
rushing_fumbles_lost = sum((stat_id %in% 10:11) & has_id(106, more_stats)),
rushing_first_downs = sum((stat_id %in% 10:11) & has_id(3, team_stats)),
# rushing_epa = ,
rushing_2pt_conversions = sum(stat_id == 75),

receptions = sum(stat_id %in% 21:22),
targets = sum(stat_id == 115),
receiving_yards = sum((stat_id %in% 21:24) * yards),
receiving_tds = sum(stat_id %in% c(22,24)),
receiving_fumbles = sum((stat_id %in% 21:22) & any(has_id(52, more_stats), has_id(53, more_stats), has_id(54, more_stats))),
receiving_fumbles_lost = sum((stat_id %in% 21:22) & has_id(106, more_stats)),
# receiving_air_yards = that's in 111:112 but it is a passer stat not a receiver stat,
receiving_yards_after_catch = sum((stat_id == 113) * yards),
receiving_first_downs = sum((stat_id %in% 21:22) & has_id(4, team_stats)),
# receiving_epa = ,
receiving_2pt_conversions = sum(stat_id == 104),
# racr = ,
# target_share = ,
# air_yards_share = ,
# wopr = ,
special_teams_tds = sum((special == 1) & stat_id %in% td_ids()),
# fantasy_points = ,
# fantasy_points_ppr = ,

# Defense #####################
# def_tackles = ,
def_tackles_solo = sum(stat_id == 79),
def_tackles_with_assist = sum(stat_id == 80),
def_tackle_assists = sum(stat_id == 82),
def_tackles_for_loss = sum(stat_id == 402),
def_tackles_for_loss_yards = sum((stat_id == 402) * yards),
def_fumbles_forced = sum(stat_id == 91),
def_sacks = sum(stat_id == 83) + 1 / 2 * sum(stat_id == 84),
def_sack_yards = sum((stat_id == 83) * -yards) + 1 / 2 * sum((stat_id == 84) * -yards),
def_qb_hits = sum(stat_id == 110),
def_interceptions = sum(stat_id %in% 25:26),
def_interception_yards = sum((stat_id %in% 25:28) * yards),
def_pass_defended = sum(stat_id == 85),
def_tds = sum((team_abbr == .data$def) & stat_id %in% td_ids()),
def_fumbles = sum((team_abbr == .data$def) & stat_id %in% 52:54),
def_fumble_recovery_own = sum((team_abbr == .data$def) & stat_id %in% 55:56),
def_fumble_recovery_yards_own = sum((team_abbr == .data$def) & stat_id %in% 55:58),
def_fumble_recovery_opp = sum((team_abbr == .data$def) & stat_id %in% 59:60),
def_fumble_recovery_yards_opp = sum((team_abbr == .data$def) & stat_id %in% 59:62),
# def_safety = ,
# def_penalty = ,
# def_penalty_yards = ,

# Kicking #####################
fg_made = sum(stat_id == 70),
fg_att = sum(stat_id %in% 69:71),
fg_missed = sum(stat_id == 69),
fg_blocked = sum(stat_id == 71),
fg_long = max((stat_id == 70) * yards) %0% NA_integer_,
fg_pct = round(.data$fg_made / .data$fg_att, 3L),
fg_made_0_19 = sum((stat_id == 70) * (yards %between% c(0, 19))),
fg_made_20_29 = sum((stat_id == 70) * (yards %between% c(20, 29))),
fg_made_30_39 = sum((stat_id == 70) * (yards %between% c(30, 39))),
fg_made_40_49 = sum((stat_id == 70) * (yards %between% c(40, 49))),
fg_made_50_59 = sum((stat_id == 70) * (yards %between% c(50, 59))),
fg_made_60_ = sum((stat_id == 70) * (yards > 60)),
fg_missed_0_19 = sum((stat_id == 69) * (yards %between% c(0, 19))),
fg_missed_20_29 = sum((stat_id == 69) * (yards %between% c(20, 29))),
fg_missed_30_39 = sum((stat_id == 69) * (yards %between% c(30, 39))),
fg_missed_40_49 = sum((stat_id == 69) * (yards %between% c(40, 49))),
fg_missed_50_59 = sum((stat_id == 69) * (yards %between% c(50, 59))),
fg_missed_60_ = sum((stat_id == 69) * (yards > 60)),
fg_made_list = fg_list(stat_id, yards, collapse_id = 70),
fg_missed_list = fg_list(stat_id, yards, collapse_id = 69),
fg_blocked_list = fg_list(stat_id, yards, collapse_id = 71),
fg_made_distance = sum((stat_id == 70) * yards),
fg_missed_distance = sum((stat_id == 69) * yards),
fg_blocked_distance = sum((stat_id == 71) * yards),
pat_made = sum(stat_id == 72),
pat_att = sum(stat_id %in% 72:74),
pat_missed = sum(stat_id == 73),
pat_blocked = sum(stat_id == 74),
pat_pct = round(.data$pat_made / .data$pat_att, 3L),
# gwfg_att = ,
# gwfg_distance = ,
# gwfg_made = ,
# gwfg_missed = ,
# gwfg_blocked =
) %>%
dplyr::ungroup() %>%
dplyr::mutate(
pacr = dplyr::case_when(
is.nan(.data$pacr) ~ NA_real_,
.data$passing_air_yards <= 0 ~ 0,
TRUE ~ .data$pacr
)
) %>%
dplyr::mutate_if(
.predicate = is.character,
.funs = ~ dplyr::na_if(.x, "")
) %>%
dplyr::left_join(player_info, by = "player_id") %>%
dplyr::select(
"player_id",
"player_name",
"player_display_name",
"position",
"position_group",
"headshot_url",
dplyr::everything()
)

# set grouping variables based off summary_level and stat_type
#
# sumarise epa stats and dakota using pbp
#
# summarise all other stats using playstats. That's a big call to summarise
# where we create all sorts of stats with the various stat IDs
#
# load player data if stat_type is player to joing player info
#
# join everything

}

load_playstats <- function(seasons = nflreadr::most_recent_season()) {

if(isTRUE(seasons)) seasons <- seq(1999, nflreadr::most_recent_season())

stopifnot(is.numeric(seasons),
seasons >= 1999,
seasons <= nflreadr::most_recent_season())

urls <- paste0("https://github.com/nflverse/nflverse-pbp/releases/download/playstats/play_stats_",
seasons, ".rds")

out <- nflreadr::load_from_url(urls, seasons = TRUE, nflverse = FALSE)

out
}

fg_list <- function(stat_ids, yards, collapse_id){
paste(
yards[stat_ids == collapse_id],
collapse = ";"
)
}

`%0%` <- function(lhs, rhs) if (lhs != 0) lhs else rhs

has_id <- function(id, all_ids){
grepl(paste0(id, ";"), all_ids, fixed = TRUE, useBytes = TRUE)
}

td_ids <- function(){
c(
11, 13, 16, 18, 22, 24, 26, 28, 34,
36, 46, 48, 56, 58, 60, 62, 64, 108
)
}
2 changes: 1 addition & 1 deletion R/nflfastR-package.R
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,7 @@
#' @import dplyr
#' @importFrom cli rule
#' @importFrom curl curl_fetch_memory
#' @importFrom data.table setDT
#' @importFrom data.table setDT %between%
#' @import fastrmodels
#' @importFrom furrr future_map_chr future_map_dfr future_map
#' @importFrom future plan
Expand Down
28 changes: 12 additions & 16 deletions data-raw/build_stat_id_df.R
Original file line number Diff line number Diff line change
@@ -1,19 +1,15 @@
library(magrittr)

stat_ids <- "http://www.nflgsis.com/gsis/Documentation/Partners/StatIDs_files/sheet001.html" %>%
xml2::read_html() %>%
rvest::html_table(fill = TRUE) %>%
as.data.frame() %>%
dplyr::rename("stat_id" = X1, "name" = X2, "comment" = X3) %>%
dplyr::select(1:3) %>%
dplyr::slice(-1) %>%
dplyr::na_if("") %>%
dplyr::filter(!is.na(comment)) %>%
dplyr::mutate(stat_id = as.integer(stat_id)) %>%
dplyr::group_by(stat_id, name) %>%
dplyr::summarise(comment = paste0(comment, collapse = " ")) %>%
dplyr::ungroup() %>%
stat_ids <- "https://www.nflgsis.com/gsis/Documentation/Partners/StatIDs_files/sheet001.html" |>
xml2::read_html() |>
rvest::html_table(fill = TRUE) |>
as.data.frame() |>
dplyr::rename("stat_id" = X1, "name" = X2, "comment" = X3) |>
dplyr::select(1:3) |>
dplyr::slice(-1) |>
dplyr::mutate(stat_id = as.integer(stat_id)) |>
dplyr::filter(!is.na(stat_id)) |>
dplyr::group_by(stat_id, name) |>
dplyr::summarise(comment = paste0(comment, collapse = " ")) |>
dplyr::ungroup() |>
dplyr::mutate(comment = stringr::str_squish(comment))

# save(stat_ids, file = "data-raw/stat_ids.rda")
usethis::use_data(stat_ids, overwrite = TRUE)
Binary file modified data/stat_ids.rda
Binary file not shown.
Loading