Inviting contributions to DoWhy #330
amit-sharma
started this conversation in
Ideas
Replies: 1 comment 2 replies
-
Hey @amit-sharma - this is a really cool list of possible use cases. Do you still have interest in most of these? Both 1 and 6 are pretty relevant to what I'm doing in my current work and I'd be happy to discuss a bit more on moving one of these forward. WDTY? |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
@emrekiciman and I have been discussing extensions to the DoWhy package that would make it more relevant for different kinds of causal inference problems. We've identified a few directions that we list below.
It's a big, ambitious list! Our team cannot work on all of these, so we'd love to have contributions from the community to accelerate the progress. If any of these directions interest you, let us know here, or on Discord.
We'd also like your feedback on these broad directions. What other directions would be useful? Let us know in the comments below.
1. Time-series causal inference
A lot of the causal inference questions involve time series data. The goal is to add support for time series data and inference in DoWhy. This would involve,
a) literature review of the different methods that work with time series (e.g., synthetic control, dose response analysis, using historical variables as confounders in backdoor, etc.) ;
b) Figuring out the api and implementing 2-3 state of the art methods,
This is a good survey to get started: https://arxiv.org/abs/2102.05829
2. Interpretation and debugging of causal effects
This is often asked for as a key feature that would enable better decision-making. Each causal effect estimator comes with its own inspection plots/tables that can help a user understand whether the estimate is a good one (for instance, balance plots for propensity score based methods). The goal is to propose and add innovative visualizations to help people a) interpret the causal effect, b) debug/inspect the different estimator's quality.
To get started, you can check this webpage with visualizations: Causal Inference Animated Plots [not necessarily animations, even static visualizations will be helpful].
3. Non-linear Mediation analysis
DoWhy has a very basic linear mediation method. The goalis to implement a non-linear mediation method, based on this paper: A General Approach to Causal Mediation Analysis. You can also check out this blog to get started: Nonlinear Mediation Analysis – Paul Hünermund
4. Better Identification:
a) Bhattacharya, Nabi, and Shpitser have recently proposed advanced algorithms for identification and estimation that are implemented in their package, ananke. The goal is to develop a common API so that DoWhy and Ananke libraries can used together and investigate the benefits of the new identification methods. To get started, you can check out Ananke: Semiparametric Inference For Causal Effects.
b) DoWhy has an implementation of the ID/IDC algorithm, but it is hopelessly slow for large graphs. The goal is to implement an optimized version of ID/IDC algorithm from Shpitser and Pearl. This will require a deep dive into data structures and optimization and perhaps also graph theory. There was an attempt to optimize ID in R: https://arxiv.org/abs/1806.07161
5. More Refutations
a) Given a dataset and a causal graph, can we check if the graph is consistent with data? Infer conditional independencies from graph and check if those are satisfied in data.
b) Refutations for CATE estimators: Can we extend refutors to support conditional average treatment effects (separate effects for separate subpopulations)?
c) Cinelli and Hazlett have proposed a general way to implement sensitivity analysis/
add_unobserved_common_cause
refuter for linear models. Will be great to implement this refuter in DoWhy. While it is restricted to linear models, it can be useful for non-linear models from EconML that have a linear second stage, e.g., double machine learing. To get started, you can browse the paper or look at the R implementation, sensemakr.6. Multiple-treatments
Extend DoWhy to work with multiple treatments. The deconfounder paper will be a great one to implement. The Blessings of Multiple Causes.
7. Causal inference on Natural language/Vision datasets
Extend DoWhy to natural language data or image data. A common problem is that the we are given text, its label, and some attributes (e.g., gender of the author). Can we estimate the causal effect of gender on the label, controlling for the text? To get started, you can refer to this paper, Causal Inference in Natural Language Processing: Estimation, Prediction, Interpretation and Beyond
Beta Was this translation helpful? Give feedback.
All reactions