Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new function calculate_stats() #470

Merged
merged 38 commits into from
Nov 10, 2024
Merged

Add new function calculate_stats() #470

merged 38 commits into from
Nov 10, 2024

Conversation

mrcaseb
Copy link
Member

@mrcaseb mrcaseb commented Jul 2, 2024

This implements #445

mrcaseb added 3 commits July 2, 2024 18:19
the previous code unintentionally dropped some ids
@mrcaseb
Copy link
Member Author

mrcaseb commented Jul 2, 2024

It turned out that some stats are hard to compute using the playstats because some stats require information of other players, e.g. receiving_air_yards.

The remaining stats probably need to be computed using pbp. I didn't start this yet

Copy link

github-actions bot commented Jul 2, 2024

mrcaseb added 5 commits July 3, 2024 17:33
also invert yards here as it's the defense player stat
this adds off, def, and special teams info that can be used to create some stats
@jacobakaye

This comment was marked as off-topic.

@mrcaseb

This comment was marked as off-topic.

@jacobakaye
Copy link

I installed the development version, and the old function 'calculate_player_stats' still worked while I got the error with the new function.

calculate_stats(pbp)
Error in calculate_stats(pbp) : could not find function "calculate_stats"

@tanho63
Copy link
Member

tanho63 commented Jul 16, 2024

I installed the development version, and the old function 'calculate_player_stats' still worked while I got the error with the new function.

calculate_stats(pbp) Error in calculate_stats(pbp) : could not find function "calculate_stats"

Use

pak::pkg_install("nflverse/nflfastR#470")

to install the PR branch version. the development version refers to what is merged to main

@mrcaseb mrcaseb enabled auto-merge (rebase) October 22, 2024 12:20
mrcaseb and others added 3 commits November 3, 2024 12:22
- def_tds shouldn't count special teams tds
- fumble recovery stats can happen in any unit, not only defense
- same is true for penalties
- fumble recovery tds need to be counted separately
also added a misc yards column to catch remaining yards
@mrcaseb
Copy link
Member Author

mrcaseb commented Nov 3, 2024

  • Fumble recovery stats now count all recoveries (not only defense). That's necessary to count fumble recoveries of offensive players, esp. when they result in a TD, e.g. Trey McBride 2024, wk2. cc: @alecglen
  • added new variable fumble_recovery_tds to account for the above
  • Changed variable names of fumble recovery and penalty stats to make clear they don't belong to defense only
  • documented the changes and improved variable explainer in nfl_stats_variables

@mrcaseb
Copy link
Member Author

mrcaseb commented Nov 9, 2024

Changed variable names compared to the current stats approach are now documented here

# Differences to old offense stats ----------------------------------------
# recent_team -> team (recent team in weekly data never made sense)
# interceptions -> passing_interceptions (all passing stats have the passing prefix)
# sacks -> sacks_suffered (to make clear it's not on defensive side)
# sack_yards -> sack_yards_lost (to make clear it's not on defensive side)
# dakota -> not implemented at the moment
setdiff(n_old_1, n2)
setdiff(n2, n_old_1)
# Differences to old defense stats ----------------------------------------
# def_tackles -> there is def_tackles_solo and def_tackles_with_assist
# def_fumble_recovery_own -> fumble_recovery_own (it is not exclusive to defense)
# def_fumble_recovery_yards_own -> fumble_recovery_yards_own (it is not exclusive to defense)
# def_fumble_recovery_opp -> fumble_recovery_opp (it is not exclusive to defense)
# def_fumble_recovery_yards_opp -> fumble_recovery_yards_opp (it is not exclusive to defense)
# def_safety -> def_safeties (we use plural everywhere)
# def_penalty -> penalties (it is not exclusive to defense)
# def_penalty_yards -> penalty_yards (it is not exclusive to defense)
setdiff(n_old_2, n2)
setdiff(n2, n_old_2)
# Differences to old kicking stats ----------------------------------------
# No differences
setdiff(n_old_3, n2)
setdiff(n2, n_old_3)

This is the final call for review

@TheMathNinja
Copy link
Contributor

Changed variable names compared to the current stats approach are now documented here

# Differences to old offense stats ----------------------------------------
# player_name -> player_display_name
# recent_team -> team (recent team in weekly data never made sense)
# interceptions -> passing_interceptions (all passing stats have the passing prefix)
# sacks -> sacks_suffered (to make clear it's not on defensive side)
# sack_yards -> sack_yards_lost (to make clear it's not on defensive side)
# dakota -> not implemented at the moment
setdiff(n_old_1, n2)
setdiff(n2, n_old_1)
# Differences to old defense stats ----------------------------------------
# player_name -> player_display_name
# def_tackles -> there is def_tackles_solo and def_tackles_with_assist
# def_fumble_recovery_own -> fumble_recovery_own (it is not exclusive to defense)
# def_fumble_recovery_yards_own -> fumble_recovery_yards_own (it is not exclusive to defense)
# def_fumble_recovery_opp -> fumble_recovery_opp (it is not exclusive to defense)
# def_fumble_recovery_yards_opp -> fumble_recovery_yards_opp (it is not exclusive to defense)
# def_safety -> def_safeties (we use plural everywhere)
# def_penalty -> penalties (it is not exclusive to defense)
# def_penalty_yards -> penalty_yards (it is not exclusive to defense)
setdiff(n_old_2, n2)
setdiff(n2, n_old_2)
# Differences to old kicking stats ----------------------------------------
# player_name -> player_display_name
setdiff(n_old_3, n2)
setdiff(n2, n_old_3)

This is the final call for review

Does this mean the new version is erasing the difference between solo tackles and tackles with assist? Or will it maintain the two categories and also offer a combined variable?

@mrcaseb
Copy link
Member Author

mrcaseb commented Nov 9, 2024

Changed variable names compared to the current stats approach are now documented here

# Differences to old offense stats ----------------------------------------
# player_name -> player_display_name
# recent_team -> team (recent team in weekly data never made sense)
# interceptions -> passing_interceptions (all passing stats have the passing prefix)
# sacks -> sacks_suffered (to make clear it's not on defensive side)
# sack_yards -> sack_yards_lost (to make clear it's not on defensive side)
# dakota -> not implemented at the moment
setdiff(n_old_1, n2)
setdiff(n2, n_old_1)
# Differences to old defense stats ----------------------------------------
# player_name -> player_display_name
# def_tackles -> there is def_tackles_solo and def_tackles_with_assist
# def_fumble_recovery_own -> fumble_recovery_own (it is not exclusive to defense)
# def_fumble_recovery_yards_own -> fumble_recovery_yards_own (it is not exclusive to defense)
# def_fumble_recovery_opp -> fumble_recovery_opp (it is not exclusive to defense)
# def_fumble_recovery_yards_opp -> fumble_recovery_yards_opp (it is not exclusive to defense)
# def_safety -> def_safeties (we use plural everywhere)
# def_penalty -> penalties (it is not exclusive to defense)
# def_penalty_yards -> penalty_yards (it is not exclusive to defense)
setdiff(n_old_2, n2)
setdiff(n2, n_old_2)
# Differences to old kicking stats ----------------------------------------
# player_name -> player_display_name
setdiff(n_old_3, n2)
setdiff(n2, n_old_3)

This is the final call for review

Does this mean the new version is erasing the difference between solo tackles and tackles with assist? Or will it maintain the two categories and also offer a combined variable?

It counts solo tackles, tackles with assist and tackle assists separately

@TheMathNinja
Copy link
Contributor

Changed variable names compared to the current stats approach are now documented here

# Differences to old offense stats ----------------------------------------
# player_name -> player_display_name
# recent_team -> team (recent team in weekly data never made sense)
# interceptions -> passing_interceptions (all passing stats have the passing prefix)
# sacks -> sacks_suffered (to make clear it's not on defensive side)
# sack_yards -> sack_yards_lost (to make clear it's not on defensive side)
# dakota -> not implemented at the moment
setdiff(n_old_1, n2)
setdiff(n2, n_old_1)
# Differences to old defense stats ----------------------------------------
# player_name -> player_display_name
# def_tackles -> there is def_tackles_solo and def_tackles_with_assist
# def_fumble_recovery_own -> fumble_recovery_own (it is not exclusive to defense)
# def_fumble_recovery_yards_own -> fumble_recovery_yards_own (it is not exclusive to defense)
# def_fumble_recovery_opp -> fumble_recovery_opp (it is not exclusive to defense)
# def_fumble_recovery_yards_opp -> fumble_recovery_yards_opp (it is not exclusive to defense)
# def_safety -> def_safeties (we use plural everywhere)
# def_penalty -> penalties (it is not exclusive to defense)
# def_penalty_yards -> penalty_yards (it is not exclusive to defense)
setdiff(n_old_2, n2)
setdiff(n2, n_old_2)
# Differences to old kicking stats ----------------------------------------
# player_name -> player_display_name
setdiff(n_old_3, n2)
setdiff(n2, n_old_3)

This is the final call for review

Does this mean the new version is erasing the difference between solo tackles and tackles with assist? Or will it maintain the two categories and also offer a combined variable?

It counts solo tackles, tackles with assist and tackle assists separately

Got it!

@tanho63
Copy link
Member

tanho63 commented Nov 10, 2024

new_stats <- calculate_stats(2023, summary_level = "week", stat_type = "player")

old_stats <- dplyr::bind_rows(nflreadr::load_player_stats(2023,stat_type = "offense"),nflreadr::load_player_stats(2023,stat_type = "defense"),nflreadr::load_player_stats(2023,stat_type = "kicking"))

setdiff(names(old_stats), names(new_stats))
#> [1] "recent_team"                   "interceptions"                 "sacks"                        
#> [4] "sack_yards"                    "dakota"                        "def_tackles"                  
#> [7] "def_fumble_recovery_own"       "def_fumble_recovery_yards_own" "def_fumble_recovery_opp"      
#>[10] "def_fumble_recovery_yards_opp" "def_safety"                    "def_penalty"                  
#>[13] "def_penalty_yards"      

setdiff(names(new_stats),names(old_stats))
#> [1] "passing_interceptions"     "sacks_suffered"            "sack_yards_lost"           "passing_cpoe"             
#> [5] "def_safeties"              "misc_yards"                "fumble_recovery_own"       "fumble_recovery_yards_own"
#> [9] "fumble_recovery_opp"       "fumble_recovery_yards_opp" "fumble_recovery_tds"       "penalties"                
#> [13] "penalty_yards"             "punt_returns"              "punt_return_yards"         "kickoff_returns"          
#> [17] "kickoff_return_yards"     

I see most things have equivalents, should we document these somewhere e.g. what new fields are avail + what fields are no longer avail? dakota and def_tackles for instance don't have direct equivs?

Copy link
Member

@tanho63 tanho63 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

helluva job tackling this! performance not superb but I fully support dplyr readability here over data.table performance/maintenance tradeoff since nflreadr will pull precalculated values 99% of the time. a few questions about field differences, I think.

@mrcaseb
Copy link
Member Author

mrcaseb commented Nov 10, 2024

new_stats <- calculate_stats(2023, summary_level = "week", stat_type = "player")

old_stats <- dplyr::bind_rows(nflreadr::load_player_stats(2023,stat_type = "offense"),nflreadr::load_player_stats(2023,stat_type = "defense"),nflreadr::load_player_stats(2023,stat_type = "kicking"))

setdiff(names(old_stats), names(new_stats))
#> [1] "recent_team"                   "interceptions"                 "sacks"                        
#> [4] "sack_yards"                    "dakota"                        "def_tackles"                  
#> [7] "def_fumble_recovery_own"       "def_fumble_recovery_yards_own" "def_fumble_recovery_opp"      
#>[10] "def_fumble_recovery_yards_opp" "def_safety"                    "def_penalty"                  
#>[13] "def_penalty_yards"      

setdiff(names(new_stats),names(old_stats))
#> [1] "passing_interceptions"     "sacks_suffered"            "sack_yards_lost"           "passing_cpoe"             
#> [5] "def_safeties"              "misc_yards"                "fumble_recovery_own"       "fumble_recovery_yards_own"
#> [9] "fumble_recovery_opp"       "fumble_recovery_yards_opp" "fumble_recovery_tds"       "penalties"                
#> [13] "penalty_yards"             "punt_returns"              "punt_return_yards"         "kickoff_returns"          
#> [17] "kickoff_return_yards"     

I see most things have equivalents, should we document these somewhere e.g. what new fields are avail + what fields are no longer avail? dakota and def_tackles for instance don't have direct equivs?

Done in the article that hosts the variables

@mrcaseb
Copy link
Member Author

mrcaseb commented Nov 10, 2024

helluva job tackling this! performance not superb but I fully support dplyr readability here over data.table performance/maintenance tradeoff since nflreadr will pull precalculated values 99% of the time. a few questions about field differences, I think.

Yeah it's not really worth caring too much about performance here. It's also quite slow because we have to download both pbp and playstats before we do some have grouped aggregation

auto-merge was automatically disabled November 10, 2024 16:04

Rebase failed

@mrcaseb mrcaseb merged commit bd7ab61 into master Nov 10, 2024
7 checks passed
@mrcaseb mrcaseb deleted the new-stats-approach branch November 10, 2024 16:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment