Inviting contributions to DoWhy #330

amit-sharma · 2021-10-27T07:54:39Z

amit-sharma
Oct 27, 2021
Maintainer

@emrekiciman and I have been discussing extensions to the DoWhy package that would make it more relevant for different kinds of causal inference problems. We've identified a few directions that we list below.

It's a big, ambitious list! Our team cannot work on all of these, so we'd love to have contributions from the community to accelerate the progress. If any of these directions interest you, let us know here, or on Discord.

We'd also like your feedback on these broad directions. What other directions would be useful? Let us know in the comments below.

1. Time-series causal inference

A lot of the causal inference questions involve time series data. The goal is to add support for time series data and inference in DoWhy. This would involve, 
a) literature review of the different methods that work with time series (e.g., synthetic control, dose response analysis, using historical variables as confounders in backdoor, etc.) ;
b) Figuring out the api and implementing 2-3 state of the art methods, 

This is a good survey to get started: https://arxiv.org/abs/2102.05829

2. Interpretation and debugging of causal effects

This is often asked for as a key feature that would enable better decision-making. Each causal effect estimator comes with its own inspection plots/tables that can help a user understand whether the estimate is a good one (for instance, balance plots for propensity score based methods). The goal is to propose and add innovative visualizations to help people a) interpret the causal effect, b) debug/inspect the different estimator's quality.

To get started, you can check this webpage with visualizations: Causal Inference Animated Plots [not necessarily animations, even static visualizations will be helpful].

3. Non-linear Mediation analysis

DoWhy has a very basic linear mediation method. The goalis to implement a non-linear mediation method, based on this paper:  A General Approach to Causal Mediation Analysis. You can also check out this blog to get started: Nonlinear Mediation Analysis – Paul Hünermund

4. Better Identification:

a) Bhattacharya, Nabi, and Shpitser have recently proposed advanced algorithms for identification and estimation that are implemented in their package, ananke. The goal is to develop a common API so that DoWhy and Ananke libraries can used together and investigate the benefits of the new identification methods. To get started, you can check out Ananke: Semiparametric Inference For Causal Effects.

b) DoWhy has an implementation of the ID/IDC algorithm, but it is hopelessly slow for large graphs. The goal is to implement an optimized version of ID/IDC algorithm from Shpitser and Pearl. This will require a deep dive into data structures and optimization and perhaps also graph theory. There was an attempt to optimize ID in R: https://arxiv.org/abs/1806.07161

5. More Refutations

a) Given a dataset and a causal graph, can we check if the graph is consistent with data? Infer conditional independencies from graph and check if those are satisfied in data.  

b) Refutations for CATE estimators: Can we extend refutors to support conditional average treatment effects (separate effects for separate subpopulations)?

c) Cinelli and Hazlett have proposed a general way to implement sensitivity analysis/add_unobserved_common_cause refuter for linear models. Will be great to implement this refuter in DoWhy. While it is restricted to linear models, it can be useful for non-linear models from EconML that have a linear second stage, e.g., double machine learing. To get started, you can browse the paper or look at the R implementation, sensemakr.

6. Multiple-treatments

  Extend DoWhy to work with multiple treatments. The deconfounder paper will be a great one to implement.  The Blessings of Multiple Causes.

7. Causal inference on Natural language/Vision datasets

Extend DoWhy to natural language data or image data. A common problem is that the we are given text, its label,  and some attributes (e.g., gender of the author). Can we estimate the causal effect of gender on the label, controlling for the text? To get started, you can refer to this paper, Causal Inference in Natural Language Processing: Estimation, Prediction, Interpretation and Beyond

Padarn · 2022-08-27T00:17:11Z

Padarn
Aug 27, 2022

Hey @amit-sharma - this is a really cool list of possible use cases. Do you still have interest in most of these? Both 1 and 6 are pretty relevant to what I'm doing in my current work and I'd be happy to discuss a bit more on moving one of these forward. WDTY?

2 replies

amit-sharma Aug 27, 2022
Maintainer Author

that's great to hear @Padarn Yes, very much interested in 1 and 6. In fact, for 1, we are currently discussing how to extend the API to support time-series data. It will be great to have your input there too.
I'd suggest you can join the discord channel and we can discuss a plan over there. We also have weekly meetings at 8:30am Pacific Mondays on Discord if you'd like to join.

Padarn Aug 28, 2022

Great. Talk to you there.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inviting contributions to DoWhy #330

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

Inviting contributions to DoWhy #330

amit-sharma Oct 27, 2021 Maintainer

1. Time-series causal inference

2. Interpretation and debugging of causal effects

3. Non-linear Mediation analysis

4. Better Identification:

5. More Refutations

6. Multiple-treatments

7. Causal inference on Natural language/Vision datasets

Replies: 1 comment · 2 replies

Padarn Aug 27, 2022

amit-sharma Aug 27, 2022 Maintainer Author

Padarn Aug 28, 2022

amit-sharma
Oct 27, 2021
Maintainer

2. Interpretation and debugging of causal effects

Replies: 1 comment 2 replies

Padarn
Aug 27, 2022

amit-sharma Aug 27, 2022
Maintainer Author