-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Even faster did #209
Merged
Merged
Even faster did #209
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…into even-faster-did
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR introduces a new argument in
did
calledfaster_mode
, which improves the overall performance ofdid
by optimizing bothpre_process_did
andcompute.att_gt
.The main objective of these enhancements is to implement faster data management when setting up the 2x2 DiD cohort data before running
DRDID
estimation. This new approach avoids filtering data inside a g-t loop, which can be computationally expensive for large datasets. Instead,faster_mode
introduces a new setup that:tname
,gname
, andidname
.This process allows us to construct a vector of indicators,
did_cohort_index
, containing values of 1, 0, or NA to mark treated, untreated, and non-participating units for each g-t cell, respectively. The construction ofdid_cohort_index
considers the structure of the data (panel vs RCS), the base period (universal vs varying), and the control group (never treated vs not-yet-treated).Below is an example illustrating how
did_cohort_index
can be manually constructed, considering 3 groups, 3 periods, a universal base period, and both types of control groups:Changes:
pre_process_did2
, which processes arguments passed to the main methods indid
and performs checks to ensure the data is in the correct format, providing helpful error messages when necessary. This function is analogous topre_process_did
but utilizes faster implementations, orders the data, and computes metadata that is used to populatedid_cohort_index
.get_did_tensors
, a utility function used bypre_process_did2
, which splits the data into a list of outcome tensors and a list of arguments. Tensors are objects with dimensionsid_count x 1 x time_periods_count
and are used for faster filtering in the computation of the DiD estimator. This is only applicable to panel data.validate_args
anddid_standarization
, functions that validate arguments passed toatt_gt()
and standardize the data format, respectively.compute.att_gt2
, which processes the (g,t) cell, sends it to estimation, and then handles all post-processing steps to recover the same outputs ascompute.att_gt
, ensuring that subsequent procedures remain unaffected.att_gt
withfaster_mode = TRUE
andfaster_mode = FALSE
produces the same results.Evidence:
Panel data; unique ids = 10^order, where order in {2,...,6}, time periods = 10, DR estimation.
RCS; unique ids = 10^order, where oder in {3,...6}, time periods = 8, DR estimation.
🚨 This PR may affect workflows that use
did
under the hood. While all tests are passing, a careful review is recommended. To prevent disruptions in existing workflows, these changes are implemented under the argumentfaster_mode = TRUE
, with the default set toFALSE
. This default preserves the current procedures, which are already efficient for most datasets.