-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new function calculate_stats()
#470
Conversation
the previous code unintentionally dropped some ids
It turned out that some stats are hard to compute using the playstats because some stats require information of other players, e.g. receiving_air_yards. The remaining stats probably need to be computed using pbp. I didn't start this yet |
also invert yards here as it's the defense player stat
this adds off, def, and special teams info that can be used to create some stats
(only defense)
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
I installed the development version, and the old function 'calculate_player_stats' still worked while I got the error with the new function. calculate_stats(pbp) |
Use pak::pkg_install("nflverse/nflfastR#470") to install the PR branch version. the development version refers to what is merged to main |
- def_tds shouldn't count special teams tds - fumble recovery stats can happen in any unit, not only defense - same is true for penalties - fumble recovery tds need to be counted separately also added a misc yards column to catch remaining yards
|
Changed variable names compared to the current stats approach are now documented here nflfastR/data-raw/nfl_stats_variables.R Lines 35 to 63 in 2d7a0e7
This is the final call for review |
Does this mean the new version is erasing the difference between solo tackles and tackles with assist? Or will it maintain the two categories and also offer a combined variable? |
It counts solo tackles, tackles with assist and tackle assists separately |
Got it! |
new_stats <- calculate_stats(2023, summary_level = "week", stat_type = "player")
old_stats <- dplyr::bind_rows(nflreadr::load_player_stats(2023,stat_type = "offense"),nflreadr::load_player_stats(2023,stat_type = "defense"),nflreadr::load_player_stats(2023,stat_type = "kicking"))
setdiff(names(old_stats), names(new_stats))
#> [1] "recent_team" "interceptions" "sacks"
#> [4] "sack_yards" "dakota" "def_tackles"
#> [7] "def_fumble_recovery_own" "def_fumble_recovery_yards_own" "def_fumble_recovery_opp"
#>[10] "def_fumble_recovery_yards_opp" "def_safety" "def_penalty"
#>[13] "def_penalty_yards"
setdiff(names(new_stats),names(old_stats))
#> [1] "passing_interceptions" "sacks_suffered" "sack_yards_lost" "passing_cpoe"
#> [5] "def_safeties" "misc_yards" "fumble_recovery_own" "fumble_recovery_yards_own"
#> [9] "fumble_recovery_opp" "fumble_recovery_yards_opp" "fumble_recovery_tds" "penalties"
#> [13] "penalty_yards" "punt_returns" "punt_return_yards" "kickoff_returns"
#> [17] "kickoff_return_yards" I see most things have equivalents, should we document these somewhere e.g. what new fields are avail + what fields are no longer avail? dakota and def_tackles for instance don't have direct equivs? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
helluva job tackling this! performance not superb but I fully support dplyr readability here over data.table performance/maintenance tradeoff since nflreadr will pull precalculated values 99% of the time. a few questions about field differences, I think.
Done in the article that hosts the variables |
Yeah it's not really worth caring too much about performance here. It's also quite slow because we have to download both pbp and playstats before we do some have grouped aggregation |
This implements #445