diff --git a/.github/workflows/R-CMD-check.yaml b/.github/workflows/R-CMD-check.yaml index af5291b4..217fc457 100644 --- a/.github/workflows/R-CMD-check.yaml +++ b/.github/workflows/R-CMD-check.yaml @@ -4,10 +4,11 @@ on: push: branches: [main, master] pull_request: - branches: [main, master] workflow_dispatch: -name: R-CMD-check +name: R-CMD-check.yaml + +permissions: read-all jobs: R-CMD-check: @@ -19,8 +20,8 @@ jobs: fail-fast: false matrix: config: - - {os: macOS-latest, r: 'release'} - - {os: windows-latest, r: 'release'} + - {os: macos-latest, r: 'release'} + - {os: windows-latest, r: 'release'} - {os: ubuntu-latest, r: 'devel', http-user-agent: 'release'} - {os: ubuntu-latest, r: 'release'} - {os: ubuntu-latest, r: 'oldrel-1'} @@ -30,7 +31,7 @@ jobs: R_KEEP_PKG_SOURCE: yes steps: - - uses: actions/checkout@v3 + - uses: actions/checkout@v4 - uses: r-lib/actions/setup-pandoc@v2 @@ -52,3 +53,4 @@ jobs: - uses: r-lib/actions/check-r-package@v2 with: upload-snapshots: true + build_args: 'c("--no-manual","--compact-vignettes=gs+qpdf")' diff --git a/vignettes/nflfastR.Rmd b/vignettes/nflfastR.Rmd index cbdebdd2..188dc221 100644 --- a/vignettes/nflfastR.Rmd +++ b/vignettes/nflfastR.Rmd @@ -24,18 +24,18 @@ nflfastR processes and cleans up play-by-play data and adds variables through [i ```{r} library(nflfastR) library(dplyr, warn.conflicts = FALSE) -ids <- nflfastR::fast_scraper_schedules(2017:2019) %>% - dplyr::filter(game_type == "SB") %>% +ids <- nflfastR::fast_scraper_schedules(2017:2019) |> + dplyr::filter(game_type == "SB") |> dplyr::pull(game_id) pbp <- nflfastR::build_nflfastR_pbp(ids) ``` -In most cases, however, it is not necessary to use this function for individual games, because nflfastR provides both a [data repository](https://github.com/nflverse/nflfastR-data) and two main play-by-play functions: `load_pbp()` and `update_db()`. We cover `load_pbp()` below, and please see [Example 8: Using the built-in database function] for how to work with the database function `update_db()`. +In most cases, however, it is not necessary to use this function for individual games, because nflfastR provides both a [data release](https://github.com/nflverse/nflverse-data/releases/tag/pbp) and two main play-by-play functions: `load_pbp()` and `update_db()`. We cover `load_pbp()` below, and please see [Example 8: Using the built-in database function] for how to work with the database function `update_db()`. -The easiest way to access the data in the data repository is the new function `load_pbp()`. It can load multiple seasons directly into memory and supports multiple data formats. Loading all play-by-play data of the 2018-2020 seasons is as easy as +The easiest way to access the data from the release is the function `load_pbp()`. It can load multiple seasons directly into memory and supports multiple data formats. Loading all play-by-play data of the 2022-2024 seasons is as easy as ```{r} -pbp <- nflfastR::load_pbp(2018:2020) +pbp <- nflfastR::load_pbp(2022:2024) ``` Joining roster data to the play-by-play data set is possible as well. The data can be accessed with the function `fast_scraper_roster()` and its application is demonstrated in [Example 10: Working with roster and position data]. @@ -49,82 +49,46 @@ library(nflfastR) library(tidyverse) ``` -## Example 1: replicate nflscrapR with fast_scraper - -The functionality of `nflscrapR` can be duplicated by using `fast_scraper()`. This obtains the same information contained in `nflscrapR` (plus some extra) but much more quickly. To compare to `nflscrapR`, we use their data repository as the program no longer functions now that the NFL has taken down the old Gamecenter feed. Note that EP differs from nflscrapR as we use a newer era-adjusted model (more on this in [this post on Open Source Football](https://www.opensourcefootball.com/posts/2020-09-28-nflfastr-ep-wp-and-cp-models/)). - -This example also uses the built-in function `clean_pbp()` to create a 'name' column for the primary player involved (the QB on pass play or ball-carrier on run play). - -``` {r ex1-nflscrapR, warning = FALSE, message = FALSE} -readr::read_csv("https://github.com/ryurko/nflscrapR-data/blob/master/play_by_play_data/regular_season/reg_pbp_2019.csv?raw=true") %>% - dplyr::filter(home_team == "SF" & away_team == "SEA") %>% - dplyr::select(desc, play_type, ep, epa, home_wp) %>% - utils::head(6) %>% - knitr::kable(digits = 3) -``` - -``` {r ex1-fs, warning = FALSE, message = FALSE} -nflfastR::fast_scraper("2019_10_SEA_SF") %>% - nflfastR::clean_pbp() %>% - dplyr::select(desc, play_type, ep, epa, home_wp, name) %>% - utils::head(6) %>% - knitr::kable(digits = 3) -``` - -## Example 2: scrape a batch of games very quickly with fast_scraper - -This is a demonstration of `nflfastR`'s capabilities. While `nflfastR` can scrape a batch of games very quickly, **please be respectful of Github's servers and use the [data repository](https://github.com/nflverse/nflfastR-data) which hosts all the scraped and cleaned data** whenever possible. The only reason to ever actually use the scraper is if it's in the middle of the season and we haven't updated the repository with recent games (but it is automatically updated overnight every day). - -``` {r ex2-bigscrape, warning = FALSE, message = FALSE} -# get list of some games from 2019 -games_2019 <- nflfastR::fast_scraper_schedules(2019) %>% - utils::head(10) %>% - dplyr::pull(game_id) -tictoc::tic(glue::glue("{length(games_2019)} games with nflfastR:")) -f <- nflfastR::fast_scraper(games_2019) -tictoc::toc() -``` - -## Example 3: Completion Percentage Over Expected (CPOE) +## Example 1: Completion Percentage Over Expected (CPOE) Let's look at CPOE leaders from the 2009 regular season. -As discussed above, `nflfastR` has a data repository for old seasons, so there's no need to actually scrape them. Let's use that here with the convenience function `load_pbp()` which fetches data from the repository (for non-R users, .csv and .parquet are also available in the [data repository](https://github.com/nflverse/nflfastR-data)). +As discussed above, `nflfastR` has a data release for all available seasons, so there's no need to actually scrape them. Let's use that here with the convenience function `load_pbp()` which fetches data from the release (for non-R users, .csv and .parquet are also available in the [data release](https://github.com/nflverse/nflverse-data/releases/tag/pbp)). ``` {r ex3-cpoe, warning = FALSE, message = FALSE} -tictoc::tic("loading all games from 2009") -games_2009 <- nflfastR::load_pbp(2009) %>% dplyr::filter(season_type == "REG") -tictoc::toc() -games_2009 %>% - dplyr::filter(!is.na(cpoe)) %>% - dplyr::group_by(passer_player_name) %>% - dplyr::summarize(cpoe = mean(cpoe), Atts = n()) %>% - dplyr::filter(Atts > 200) %>% - dplyr::arrange(-cpoe) %>% - utils::head(5) %>% +games_2009 <- nflfastR::load_pbp(2009) |> dplyr::filter(season_type == "REG") +games_2009 |> + dplyr::filter(!is.na(cpoe)) |> + dplyr::summarize( + passer = nflreadr::stat_mode(passer_player_name), + cpoe = mean(cpoe), + Atts = n(), + .by = passer_player_id + ) |> + dplyr::filter(Atts > 200) |> + dplyr::slice_max(cpoe, n = 5) |> knitr::kable(digits = 1) ``` -## Example 4: Using Drive Information +## Example 2: Using Drive Information When working with `nflfastR`, drive results are automatically included. We use `fixed_drive` and `fixed_drive_result` since the NFL-provided information is a bit wonky. Let's look at how much more likely teams were to score starting from 1st & 10 at their own 20 yard line in 2015 (the last year before touchbacks on kickoffs changed to the 25) than in 2000. ``` {r ex4, warning = FALSE, message = FALSE} pbp <- nflfastR::load_pbp(c(2003, 2015)) -out <- pbp %>% - dplyr::filter(season_type == "REG" & down == 1 & ydstogo == 10 & yardline_100 == 80) %>% - dplyr::mutate(drive_score = dplyr::if_else(fixed_drive_result %in% c("Touchdown", "Field goal"), 1, 0)) %>% - dplyr::group_by(season) %>% - dplyr::summarize(drive_score = mean(drive_score)) +out <- pbp |> + dplyr::filter(season_type == "REG" & down == 1 & ydstogo == 10 & yardline_100 == 80) |> + dplyr::mutate(drive_score = dplyr::if_else(fixed_drive_result %in% c("Touchdown", "Field goal"), 1, 0)) |> + dplyr::summarize(drive_score = mean(drive_score), .by = season) -out %>% +out |> knitr::kable(digits = 3) ``` So `r scales::percent(out$drive_score[1], accuracy = 0.1)` of 1st & 10 plays from teams' own 20 would see the drive end up in a score in 2003, compared to `r scales::percent(out$drive_score[2], accuracy = 0.1)` in 2015. This has implications for Expected Points models (see [this article](https://www.opensourcefootball.com/posts/2020-09-28-nflfastr-ep-wp-and-cp-models/)). -## Example 5: Plot offensive and defensive EPA per play for a given season +## Example 3: Plot offensive and defensive EPA per play for a given season Let's build the **[NFL team tiers](https://rbsdm.com/stats/stats/)** using offensive and defensive expected points added per play for the 2005 regular season. Creating data viz including NFL team logos (or wordmarks, or headshots), we recommend the nflverse R package [nflplotR](https://nflplotr.nflverse.com). @@ -132,17 +96,17 @@ When using `load_pbp()`, the helper function `clean_pbp()` has already been run, ```{r ex5, warning = FALSE, message = FALSE, results = 'hide', fig.keep = 'all', dpi = 600} library(nflplotR) -pbp <- nflfastR::load_pbp(2005) %>% - dplyr::filter(season_type == "REG") %>% +pbp <- nflfastR::load_pbp(2005) |> + dplyr::filter(season_type == "REG") |> dplyr::filter(!is.na(posteam) & (rush == 1 | pass == 1)) -offense <- pbp %>% - dplyr::group_by(team = posteam) %>% +offense <- pbp |> + dplyr::group_by(team = posteam) |> dplyr::summarise(off_epa = mean(epa, na.rm = TRUE)) -defense <- pbp %>% - dplyr::group_by(team = defteam) %>% +defense <- pbp |> + dplyr::group_by(team = defteam) |> dplyr::summarise(def_epa = mean(epa, na.rm = TRUE)) -offense %>% - dplyr::inner_join(defense, by = "team") %>% +offense |> + dplyr::inner_join(defense, by = "team") |> ggplot2::ggplot(aes(x = off_epa, y = def_epa)) + ggplot2::geom_abline(slope = -1.5, intercept = c(.4, .3, .2, .1, 0, -.1, -.2, -.3), alpha = .2) + nflplotR::geom_mean_lines(aes(y0 = off_epa, x0 = def_epa)) + @@ -160,7 +124,7 @@ offense %>% ggplot2::scale_y_reverse() ``` -## Example 6: Expected Points calculator +## Example 4: Expected Points calculator We have provided a calculator for working with the Expected Points model. Here is an example of how to use it, looking for how the Expected Points on a drive beginning following a touchback has changed over time. @@ -179,8 +143,8 @@ data <- tibble::tibble( "posteam_timeouts_remaining" = 3, "defteam_timeouts_remaining" = 3 ) -nflfastR::calculate_expected_points(data) %>% - dplyr::select(season, yardline_100, td_prob, ep) %>% +nflfastR::calculate_expected_points(data) |> + dplyr::select(season, yardline_100, td_prob, ep) |> knitr::kable(digits = 2) ``` @@ -202,14 +166,14 @@ data <- tibble::tibble( "posteam_timeouts_remaining" = 3, "defteam_timeouts_remaining" = 3 ) -nflfastR::calculate_expected_points(data) %>% - dplyr::select(season, yardline_100, td_prob, ep) %>% +nflfastR::calculate_expected_points(data) |> + dplyr::select(season, yardline_100, td_prob, ep) |> knitr::kable(digits = 2) ``` So for 2018 and 2019, 1st & 10 from a home team's own 25 yard line had higher EP in domes than at home, which is to be expected. -## Example 7: Win probability calculator +## Example 5: Win probability calculator We have also provided a calculator for working with the win probability models. Here is an example of how to use it, looking for how the win probability to begin the game depends on the pre-game spread. @@ -230,14 +194,14 @@ data <- tibble::tibble( "posteam_timeouts_remaining" = 3, "defteam_timeouts_remaining" = 3 ) -nflfastR::calculate_win_probability(data) %>% - dplyr::select(spread_line, wp, vegas_wp) %>% +nflfastR::calculate_win_probability(data) |> + dplyr::select(spread_line, wp, vegas_wp) |> knitr::kable(digits = 2) ``` Not surprisingly, `vegas_wp` increases with the amount a team was coming into the game favored by. -## Example 8: Using the built-in database function +## Example 6: Using the built-in database function If you're comfortable using `dplyr` functions to manipulate and tidy data, you're ready to use a database. Why should you use a database? @@ -311,7 +275,7 @@ DBI::dbListTables(connection) Since we went with the defaults, there's a table called `nflfastR_pbp`. Another useful function is to see the fields (i.e., columns) in a table: ``` {r} -DBI::dbListFields(connection, "nflfastR_pbp") %>% +DBI::dbListFields(connection, "nflfastR_pbp") |> utils::head(10) ``` @@ -326,21 +290,21 @@ pbp_db <- dplyr::tbl(connection, "nflfastR_pbp") And now, everything will magically just "work": you can forget you're even working with a database! ``` {r} -pbp_db %>% - dplyr::group_by(season) %>% +pbp_db |> + dplyr::group_by(season) |> dplyr::summarize(n = dplyr::n()) -pbp_db %>% - dplyr::filter(rush == 1 | pass == 1, down <= 2, !is.na(epa), !is.na(posteam)) %>% - dplyr::group_by(pass) %>% +pbp_db |> + dplyr::filter(rush == 1 | pass == 1, down <= 2, !is.na(epa), !is.na(posteam)) |> + dplyr::group_by(pass) |> dplyr::summarize(mean_epa = mean(epa, na.rm = TRUE)) ``` So far, everything has stayed in the database. If you want to bring a query into memory, just use `collect()` at the end: ``` {r} -russ <- pbp_db %>% - dplyr::filter(name == "R.Wilson" & posteam == "SEA") %>% - dplyr::select(desc, epa) %>% +russ <- pbp_db |> + dplyr::filter(name == "R.Wilson" & posteam == "SEA") |> + dplyr::select(desc, epa) |> dplyr::collect() russ ``` @@ -353,7 +317,7 @@ DBI::dbDisconnect(connection) For more details on using a database with `nflfastR`, see [Thomas Mock's life-changing post here](https://themockup.blog/posts/2019-04-28-nflfastr-dbplyr-rsqlite/). More detailed information on dbplyr (the dplyr database back-end) are given in the second edition of [Hadley Wickham's R for Data Science (2e)](https://r4ds.hadley.nz/import-databases.html). -## Example 9: working with the expected yards after catch model +## Example 7: working with the expected yards after catch model The variables in `xyac` are as follows: @@ -374,28 +338,27 @@ Some other notes: Let's create measures for EPA and first downs over expected in 2015: ``` {r ex9-xyac, warning = FALSE, message = FALSE} -nflfastR::load_pbp(2015) %>% - dplyr::group_by(receiver, receiver_id, posteam) %>% - dplyr::mutate(tgt = sum(complete_pass + incomplete_pass)) %>% - dplyr::filter(tgt >= 50) %>% - dplyr::filter(complete_pass == 1, air_yards < yardline_100, !is.na(xyac_epa)) %>% +nflfastR::load_pbp(2015) |> + dplyr::group_by(receiver, receiver_id, posteam) |> + dplyr::mutate(tgt = sum(complete_pass + incomplete_pass)) |> + dplyr::filter(tgt >= 50) |> + dplyr::filter(complete_pass == 1, air_yards < yardline_100, !is.na(xyac_epa)) |> dplyr::summarize( epa_oe = mean(yac_epa - xyac_epa), actual_fd = mean(first_down), expected_fd = mean(xyac_fd), fd_oe = mean(first_down - xyac_fd), rec = dplyr::n() - ) %>% - dplyr::ungroup() %>% - dplyr::select(receiver, posteam, actual_fd, expected_fd, fd_oe, epa_oe, rec) %>% - dplyr::arrange(-epa_oe) %>% - utils::head(10) %>% + ) |> + dplyr::ungroup() |> + dplyr::select(receiver, posteam, actual_fd, expected_fd, fd_oe, epa_oe, rec) |> + dplyr::slice_max(epa_oe, n = 10) |> knitr::kable(digits = 3) ``` The presence of so many running backs on this list suggests that even though it takes into account target depth and pass direction, the model doesn't do a great job capturing space. Alternatively, running backs might be better at generating yards after the catch since running with the football is their primary role. -## Example 10: Working with roster and position data +## Example 8: Working with roster and position data At long last, there's a way to merge the new play-by-play data with roster information. Use the function to get the rosters: @@ -411,209 +374,96 @@ games_2019 <- nflfastR::load_pbp(2019) Here is what the player IDs look like because `nflfastR` now automatically decodes IDs to look like the old format with GSIS IDs: ``` {r roster_pbp} -games_2019 %>% - dplyr::filter(rush == 1 | pass == 1, posteam == "SEA") %>% +games_2019 |> + dplyr::filter(rush == 1 | pass == 1, posteam == "SEA") |> dplyr::select(name, id) ``` Now we're ready to join to the roster data using these IDs: ``` {r decode_join} -joined <- games_2019 %>% - dplyr::filter(!is.na(receiver_id)) %>% - dplyr::select(posteam, season, desc, receiver, receiver_id, epa) %>% +joined <- games_2019 |> + dplyr::filter(!is.na(receiver_id)) |> + dplyr::select(posteam, season, desc, receiver, receiver_id, epa) |> dplyr::left_join(roster, by = c("receiver_id" = "gsis_id")) ``` ``` {r decode_table} # the real work is done, this just makes a table and has it look nice -joined %>% - dplyr::filter(position %in% c("WR", "TE", "RB")) %>% - dplyr::group_by(receiver_id, receiver, position) %>% - dplyr::summarize(tot_epa = sum(epa), n = n()) %>% - dplyr::arrange(-tot_epa) %>% - dplyr::ungroup() %>% - dplyr::group_by(position) %>% - dplyr::mutate(position_rank = 1:n()) %>% - dplyr::filter(position_rank <= 5) %>% - dplyr::rename(Pos_Rank = position_rank, Player = receiver, Pos = position, Tgt = n, EPA = tot_epa) %>% - dplyr::select(Player, Pos, Pos_Rank, Tgt, EPA) %>% +joined |> + dplyr::filter(position %in% c("WR", "TE", "RB")) |> + dplyr::group_by(receiver_id, receiver, position) |> + dplyr::summarize(tot_epa = sum(epa), n = n()) |> + dplyr::arrange(-tot_epa) |> + dplyr::ungroup() |> + dplyr::group_by(position) |> + dplyr::mutate(position_rank = 1:n()) |> + dplyr::filter(position_rank <= 5) |> + dplyr::rename(Pos_Rank = position_rank, Player = receiver, Pos = position, Tgt = n, EPA = tot_epa) |> + dplyr::select(Player, Pos, Pos_Rank, Tgt, EPA) |> knitr::kable(digits = 0) ``` Not surprisingly, all 5 of the top 5 WRs in terms of EPA added come in ahead of the top RB. Note that the number of targets won't match official stats because we're including plays with penalties. -## Example 11: Replicating official stats +## Example 9: Replicating official stats The columns like `name`, `passer`, `fantasy` etc are `nflfastR`-created columns that mimic "real" football: i.e., excluding plays with spikes, counting scrambles and sacks as pass plays, etc. But if you're trying to replicate official statistics -- perhaps for fantasy purposes -- use the `*_player_name` and `*_player_id` columns. -### Leaderboards - [Let's try to replicate this page of passing leaders](https://www.nfl.com/stats/player-stats/). ``` {r stats1} -nflfastR::load_pbp(2020) %>% - dplyr::filter(season_type == "REG", complete_pass == 1 | incomplete_pass == 1 | interception == 1, !is.na(down)) %>% - dplyr::group_by(passer_player_name, posteam) %>% +nflfastR::load_pbp(2020) |> + dplyr::filter(season_type == "REG", complete_pass == 1 | incomplete_pass == 1 | interception == 1, !is.na(down)) |> + dplyr::group_by(passer_player_name, posteam) |> dplyr::summarize( yards = sum(passing_yards, na.rm = T), tds = sum(touchdown == 1 & td_team == posteam), ints = sum(interception), att = dplyr::n() - ) %>% - dplyr::arrange(-yards) %>% - utils::head(10) %>% + ) |> + dplyr::arrange(-yards) |> + utils::head(10) |> knitr::kable(digits = 0) ``` These match the official stats on NFL.com (note the filter for `season_type == "REG"` since official stats only count regular season games). Note that we're using `passing_yards` here because `yards_gained` is not equal to passing yards on plays with laterals. -While the above works, we've also provided a function that does this all for you: `calculate_player_stats()`. This function takes an `nflfastR` play-by-play dataframe as an input along with one other argument, `weekly`, which defaults to `FALSE`. When `weekly` is true, a week-by-week dataframe is returned (rather than an aggregate over the whole provided dataframe). Let's again replicate the top 10 players in passing yards: - -``` {r stats2} -nflfastR::load_pbp(2020) %>% - dplyr::filter(season_type == "REG") %>% - nflfastR::calculate_player_stats() %>% - dplyr::arrange(-passing_yards) %>% - dplyr::select(player_name, recent_team, completions, attempts, passing_yards, passing_tds, interceptions) %>% - utils::head(10) %>% - knitr::kable(digits = 0) -``` - -We can do the same for rush attempts to replicate the [NFL leaderboard](https://www.nfl.com/stats/player-stats/category/rushing/2020/POST/all/rushingyards/desc): +While the above code works in this case, there are several special cases where it is nearly impossible to get official player stats from nflfastR play-by-play data. The reason for this is that the idea of nflfastR play-by-play data is a “tidy” data structure. In other words, the aim is to have one row per play in the data. This can lead to problems if, for example, there are several changes of possession per play (i.e. several fumbles) or if the ball is lateraled in a play. These are just two examples of “abnormal” plays that are not fully captured in a tidy data structure. +We have solved this problem with the function `calculate_stats()`. This function uses playstats of the raw play-by-play data before it is parsed into a tidy structure by nflfastR. -``` {r stats_rush} -nflfastR::load_pbp(2020) %>% - dplyr::filter(season_type == "REG") %>% - nflfastR::calculate_player_stats() %>% - dplyr::arrange(-rushing_yards) %>% - dplyr::select(player_name, recent_team, carries, rushing_yards, rushing_tds, rushing_fumbles_lost) %>% - utils::head(10) %>% - knitr::kable(digits = 0) -``` - -Again, this matches up exactly. +This function has the following features: -### Yards from scrimmage - -What if we want total yards from scrimmage? We'll demonstrate three methods here. The hardest way is to use the `fantasy_player_name` column, which is the rusher on rush plays and receiver on receiving plays: - -``` {r stats_fantasy} -nflfastR::load_pbp(2020) %>% - dplyr::filter(season_type == "REG", !is.na(down)) %>% - dplyr::group_by(fantasy_player_name, posteam) %>% - dplyr::summarize( - carries = sum(rush_attempt), - receptions = sum(complete_pass), - touches = sum(rush_attempt + complete_pass), - yards = sum(yards_gained), - tds = sum(touchdown == 1 & td_team == posteam) - ) %>% - dplyr::arrange(-yards) %>% - utils::head(10) %>% - knitr::kable(digits = 0) -``` - -Looking at the [PFR scrimmage stats](https://www.pro-football-reference.com/years/2020/scrimmage.htm), these columns are an exact match. - -But we could also just use `calculate_player_stats()` again: - -``` {r stats_fantasy2} -nflfastR::load_pbp(2020) %>% - dplyr::filter(season_type == "REG") %>% - nflfastR::calculate_player_stats() %>% - dplyr::mutate( - yards = rushing_yards + receiving_yards, - touches = carries + receptions, - tds = rushing_tds + receiving_tds - ) %>% - dplyr::arrange(-yards) %>% - dplyr::select(player_name, recent_team, carries, receptions, touches, yards, tds) %>% - utils::head(10) %>% - knitr::kable(digits = 0) -``` +- It determines stats in offense, defense, and special teams, +- either on player level or on team level, +- and can summarize them on season level (separately for regular season and post season) or on week level. -And we get the same thing. +For more information see the function documentation of `calculate_stats()`. Again, **don't try to get an exact match with official stats based on nflfastR play-by-play data**. It usually works, but fails because of details that are unsolvable. -The third way is to use the `load_player_stats()` function, which can load a data frame of player-level stats for every week since 1999. +Now let's replicate the above table using `calculate_stats()`: -``` {r stats_fantasy3} -nflfastR::load_player_stats(seasons = 2020) %>% - dplyr::filter(season_type == "REG") %>% - dplyr::group_by(player_id) %>% - dplyr::summarize( - player_name = dplyr::first(player_name), - recent_team = dplyr::first(recent_team), - yards = sum(rushing_yards + receiving_yards), - touches = sum(carries + receptions), - carries = sum(carries), - receptions = sum(receptions), - tds = sum(rushing_tds + receiving_tds) - ) %>% - dplyr::ungroup() %>% - dplyr::arrange(-yards) %>% - dplyr::select(player_name, recent_team, carries, receptions, touches, yards, tds) %>% - utils::head(10) %>% +``` {r stats2} +s <- nflfastR::calculate_stats( + seasons = 2020, + summary_level = "season", + stat_type = "player", + season_type = "REG" +) +s |> + dplyr::slice_max(passing_yards, n = 10) |> + dplyr::select(player_name, recent_team, completions, attempts, passing_yards, passing_tds, passing_interceptions, attempts) |> knitr::kable(digits = 0) ``` -And again the output is identical. - -### Fantasy points - -Let's calculate PPR fantasy points per game in the first 16 weeks of the season among wide receivers who appeared in more than 5 games. - -``` {r stats_fantasy4} -nflfastR::load_pbp(2020) %>% - dplyr::filter(week <= 16) %>% - nflfastR::calculate_player_stats() %>% - dplyr::mutate( - ppg = fantasy_points_ppr / games - ) %>% - dplyr::filter(games > 5) %>% - # only keep the WRs - dplyr::inner_join( - nflfastR::fast_scraper_roster(2020) %>% - dplyr::filter(position == "WR") %>% - dplyr::select(player_id = gsis_id), - by = "player_id" - ) %>% - dplyr::arrange(-ppg) %>% - dplyr::select(player_name, recent_team, games, fantasy_points_ppr, ppg) %>% - utils::head(10) %>% - knitr::kable(digits = 1) -``` - -Comparing to [the FantasyPros website](https://www.fantasypros.com/nfl/reports/leaders/ppr-wr.php?year=2020&start=1&end=16), this is an exact match. - # Frequent issues ## The `drive` column looks wacky -Use `fixed_drive` and `fixed_drive_result` instead. See [Example 4: Using Drive Information]. +Use `fixed_drive` and `fixed_drive_result` instead. See [Example 2: Using Drive Information]. ## Why are there so many win probability columns? `vegas_wp` and `vegas_home_wp` incorporate the pregame spread and are much better models. -## I'm trying to do X. Help! - -Please ask [in the Discord channel](https://discord.com/invite/5Er2FBnnQa). - -# Links - -This section is a helper that holds the hyperlinks to the above chapters. It's a workaround for the missing sections anchor bug in pkgdown which will hopefully be fixed with [this pull request](https://github.com/r-lib/pkgdown/pull/1536) at some point in the future. +## Need more help? -- [The Main Functions] -- [Example 1: replicate nflscrapR with fast_scraper] -- [Example 2: scrape a batch of games very quickly with fast_scraper] -- [Example 3: Completion Percentage Over Expected (CPOE)] -- [Example 4: Using Drive Information] -- [Example 5: Plot offensive and defensive EPA per play for a given season] -- [Example 6: Expected Points calculator] -- [Example 7: Win probability calculator] -- [Example 8: Using the built-in database function] -- [Example 9: working with the expected yards after catch model] -- [Example 10: Working with roster and position data] -- [Example 11: Replicating official stats] -- [Frequent issues] -- [Links] +Please ask [in the nflverse Discord server](https://discord.com/invite/5Er2FBnnQa).