Major code refactor to unify quasi experiment classes #381

drbenvincent · 2024-07-02T12:36:50Z

This is a relatively major code refactor with minor breaking changes to the API. The main purpose is to eliminate the parallel class hierarchy we had. Namely, we had virtually identical experiment classes which worked with either the PyMC or scikit-learn models. There were only slight differences here to deal with the fact that PyMC models produce InferenceData objects and the scikit-learn models would produce numpy arrays, for example.

We don't have an immediate intention of expanding beyond PyMC or scikit-learn models, however the new code structure would make it much much easier to expand the kinds of models used. The main appeal of this is to focus on high level description of quasi-experimental methods and to abstract away from model-related implementation issues. So you could add in non-PyMC Bayesian models (see #116), or use statsmodels (see #8) to use OLS but also get confidence intervals (which you don't get from scikit-learn models).

We should have 100% passing doctests and tests, and I re-ran all the notebooks to check that we have stable performance.

Splits the experiment classes from one module into separate modules, so closes Split out large modules #253
Adds test coverage of the summary method, so closes Have more coherent testing of the summary method #305

Before

After (at time of initial PR)

After (after dealing with review comments)

So now we just have a single set of quasi experiment classes, all inheriting from BaseExperiment.

Other changes

I renamed ModelBuilder to PyMCModel. This seems to make more sense as it contrasts better with a new ScikitLearnAdaptor class/mixin which gives some extra functionality to scikit-learn models.
I increased test coverage
Plotting is done by experiment classes, though either the bayes_plot or ols_plot methods, though some experiment classes have custom plot methods.

API changes

The change in API for the user is relatively small. The only change should really be how the experiment classes are imported. For example:

Before

import causalpy as cp
df = cp.load_data("did")
result = cp.pymc_experiments.DifferenceInDifferences(
    df,
    formula="y ~ 1 + group*post_treatment",
    time_variable_name="t",
    group_variable_name="group",
    model=cp.pymc_models.LinearRegression(sample_kwargs={"random_seed": seed}),
)

After

The import changes from cp.pymc_experiments.DifferenceInDifferences to cp.DifferenceInDifferences.

import causalpy as cp
df = cp.load_data("did")
result = cp.DifferenceInDifferences(
    df,
    formula="y ~ 1 + group*post_treatment",
    time_variable_name="t",
    group_variable_name="group",
    model=cp.pymc_models.LinearRegression(sample_kwargs={"random_seed": seed}),
)

The old API will still work, but will emit a deprecation warning. At some point in the future we may remove the old API, so it is best to make this minor update to existing workflows.

TODO's

Fix up the not quite perfect use of if isinstance in the experiment classes
Add missing module level docstrings to improve the auto generated API docs

📚 Documentation preview 📚: https://causalpy--381.org.readthedocs.build/en/381/

drbenvincent · 2024-08-07T19:57:43Z

Thanks @wd60622. Still working through the comments, but I didn't follow what you meant about the deprecation warnings not working.

When I run with the old API (e.g. cp.pymc_experiments.DifferenceInDifferences) then we do get a warning, followed with the rest

I'm also seeing the deprecation tests pass (at the bottom of test_integration_pymc_examples.py)

The import of experiments in the warnings do not work. Do you want people to use cp.DifferenceInDifferences or cp.experiments.DifferenceInDifferences or both?

So the new thing is cp.DifferenceInDifferences, but in order to not break things cp.experiments.DifferenceInDifferences will still work but issue a deprecation warning. I could issue the warning then error out at that point, but right now the old API still works. And it does so by just routing through to cp.DifferenceInDifferences.

drbenvincent · 2024-08-09T14:39:54Z

Would it be convenient to be able to pass scikit-learn models as well and use the create_causalpy_compatible_class behind within the class? It seems like that was the behavior before and might be break some previous workflows. I see most of the tests just do it once at the top which could then be avoided.

This is a great suggestion @wd60622, and stopped me from being a bit lazy! I've addressed this in 02dacb2

The first attempt I tried turned out to be a dead end. It basically came down to the fact that we are passing in an instantiated model object, not a model class. This means that if the model was built with non-trivial kwargs, it was getting highly complex to create another class with the mixin approach.

So the solution I went with was to simply take the user-provided model instance and to use a helper function to attach the methods required by CausalPy to that model instance. I did use GPT to help with this function :)

def add_mixin_methods(model_instance, mixin_class):
    for attr_name in dir(mixin_class):
        attr = getattr(mixin_class, attr_name)
        if callable(attr) and not attr_name.startswith("__"):
            # Bind the method to the instance
            method = attr.__get__(model_instance, model_instance.__class__)
            setattr(model_instance, attr_name, method)
    return model_instance

So now users can just provide unadulterated scikit-learn model instances. The experiment base class then adds the required methods to this behind the scenes. So there is zero API change for the user.

drbenvincent · 2024-08-09T15:24:30Z

The fit method should ideally have the same signature. This leads to the if elif blocks that are all over the place. Though this might be another refactor.

So this is an example in DifferencesInDifferences:

# fit model
if isinstance(self.model, PyMCModel):
    COORDS = {"coeffs": self.labels, "obs_indx": np.arange(self.X.shape[0])}
    self.model.fit(X=self.X, y=self.y, coords=COORDS)
elif isinstance(self.model, RegressorMixin):
    self.model.fit(X=self.X, y=self.y)
else:
    raise ValueError("Model type not recognized")

You are right, we only need all this type checking because of the different signatures of the fit methods. I can't not provide coords to the pymc fit method. I think the only other alternative is to create and pass in coords to both fit methods - so I'd have to override the default fit method of the scikit learn objects to make it accept **kwargs and the coords would basically disappear into a black hole.

It's maybe a bit clunky because you are needlessly defining and passing coords when you have scikit-learn models, but that seems like the easiest way to go.

However, this would only affect internal code and not affect the user experience. So I'm happy if we sit and think about this one and address it at a later date. But open to any other ideas.

causalpy/pymc_experiments.py

causalpy/skl_experiments.py

causalpy/pymc_experiments.py

Co-authored-by: Will Dean <[email protected]>

drbenvincent · 2024-08-20T08:47:25Z

Thanks for the final points @wd60622. Fingers crossed, that should be it now?

wd60622

Looks good! Just two things I noticed

wd60622 · 2024-08-22T01:09:42Z

causalpy/__init__.py

+import causalpy.pymc_experiments as pymc_experiments  # to be depricated
+import causalpy.pymc_models as pymc_models
+import causalpy.skl_experiments as skl_experiments  # to be depricated


Suggested change

import causalpy.pymc_experiments as pymc_experiments # to be depricated

import causalpy.pymc_models as pymc_models

import causalpy.skl_experiments as skl_experiments # to be depricated

import causalpy.pymc_experiments as pymc_experiments # to be deprecated

import causalpy.pymc_models as pymc_models

import causalpy.skl_experiments as skl_experiments # to be deprecated

Thanks - did a global find/replace

wd60622 · 2024-08-22T01:26:26Z

causalpy/tests/test_misc.py

+import causalpy as cp
+
+sample_kwargs = {"tune": 20, "draws": 20, "chains": 2, "cores": 2}
+
+
+def test_regression_kink_gradient_change():


Suggested change

import causalpy as cp

sample_kwargs = {"tune": 20, "draws": 20, "chains": 2, "cores": 2}

def test_regression_kink_gradient_change():

import causalpy as cp

sample_kwargs = {"tune": 20, "draws": 20, "chains": 2, "cores": 2}

def test_regression_kink_gradient_change():

Suggested change

import causalpy as cp

sample_kwargs = {"tune": 20, "draws": 20, "chains": 2, "cores": 2}

def test_regression_kink_gradient_change():

import causalpy as cp

def test_regression_kink_gradient_change():

Good catch, have removed sample_kwargs

twiecki · 2024-08-22T09:05:48Z

Congrats!

drbenvincent added 30 commits June 28, 2024 17:48

Very initial work on the big refactor

df1989f

update uml diagram

d36e289

change from dict to dataclass

0945fc4

Friday evening progress

d0d3bc3

bayesian anova sorted + made plot methods static methods

f87577b

regression discontinuity sorted

e82325d

regression discontinuity sorted + add summary methods

4ccea2c

rename classes

119f749

add inverse propensity weighting

841cca4

add docstrings back in

e5a07d9

make doctests pass

7b1e600

make lots of tests pass

48cf9b5

make test pass - Line2D

5c2b103

fix import

b23b373

add docstrings to make interrogate pre-commit check pass

0435456

Update expt_prepostnegd.py

ac2389d

fix _causal_impact_summary_stat

c6ae453

tidy up plotting

ff5122f

fix score

291dc47

add convert_to_string + fix typo

4d10175

ensure fig, ax returned from plot method

4442d5b

add causal impact arrow back in to DID plot

3757199

zero failing tests

6c4e43c

add test coverage for plot method to integration tests

ae7c405

fix errors in tests + make some tests pass

a85ccfb

make a test pass

00c1290

fix docstring + add type hint

5b5ccd2

make another plot test pass

c3df3eb

fix plotting issue with some scikit-learn models

2fb344a

fix problem with GaussianProcessRegressor. Back to ZERO failing tests

fd74658

drbenvincent added 6 commits August 6, 2024 21:05

remove unnecessary kwargs in did experiments

0cf4f46

update uml

60cfb2a

remove old API pages

121fe46

better model/experiment compatability

cc62438

move IPW integration test to better test file

644cf6b

add warning to NotImplementedError

dede64a

convert scikitlearn models behind the scenes

02dacb2

fix import in one of the tests

3f9763a

wd60622 reviewed Aug 9, 2024

View reviewed changes

causalpy/pymc_experiments.py Outdated Show resolved Hide resolved

improve deprecation warnings

01ce582

wd60622 reviewed Aug 15, 2024

View reviewed changes

causalpy/skl_experiments.py Outdated Show resolved Hide resolved

wd60622 reviewed Aug 15, 2024

View reviewed changes

causalpy/pymc_experiments.py Outdated Show resolved Hide resolved

drbenvincent and others added 2 commits August 20, 2024 09:31

fix typo

6f6fade

Co-authored-by: Will Dean <[email protected]>

update deprecation warnings for the scikit-learn experiment classes

66f22aa

drbenvincent added the enhancement New feature or request label Aug 20, 2024

Merge branch 'main' into refactor

77b0b2b

wd60622 approved these changes Aug 22, 2024

View reviewed changes

drbenvincent added 2 commits August 22, 2024 09:40

depricated -> deprecated

9bc3d25

remove redundant sample_kwargs definition in test

e0b0847

drbenvincent merged commit e55f23b into main Aug 22, 2024
8 checks passed

drbenvincent deleted the refactor branch August 22, 2024 08:59

This was referenced Aug 22, 2024

split pymc_experiments into smaller files #254

Closed

Add augmented synthetic control model #365

Open

Add penalised synthetic control model #366

Open

Event Study and Staggered Diff in Diff #384

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Major code refactor to unify quasi experiment classes #381

Major code refactor to unify quasi experiment classes #381

drbenvincent commented Jul 2, 2024 •

edited

Loading

drbenvincent commented Aug 7, 2024 •

edited

Loading

drbenvincent commented Aug 9, 2024

drbenvincent commented Aug 9, 2024

drbenvincent commented Aug 20, 2024

wd60622 left a comment

wd60622 Aug 22, 2024

drbenvincent Aug 22, 2024

wd60622 Aug 22, 2024

drbenvincent Aug 22, 2024

twiecki commented Aug 22, 2024

Major code refactor to unify quasi experiment classes #381

Major code refactor to unify quasi experiment classes #381

Conversation

drbenvincent commented Jul 2, 2024 • edited Loading

Before

After (at time of initial PR)

After (after dealing with review comments)

Other changes

API changes

Before

After

TODO's

drbenvincent commented Aug 7, 2024 • edited Loading

drbenvincent commented Aug 9, 2024

drbenvincent commented Aug 9, 2024

drbenvincent commented Aug 20, 2024

wd60622 left a comment

Choose a reason for hiding this comment

wd60622 Aug 22, 2024

Choose a reason for hiding this comment

drbenvincent Aug 22, 2024

Choose a reason for hiding this comment

wd60622 Aug 22, 2024

Choose a reason for hiding this comment

drbenvincent Aug 22, 2024

Choose a reason for hiding this comment

twiecki commented Aug 22, 2024

drbenvincent commented Jul 2, 2024 •

edited

Loading

drbenvincent commented Aug 7, 2024 •

edited

Loading