-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Major code refactor to unify quasi experiment classes #381
Conversation
Thanks @wd60622. Still working through the comments, but I didn't follow what you meant about the deprecation warnings not working. When I run with the old API (e.g. I'm also seeing the deprecation tests pass (at the bottom of
So the new thing is |
This is a great suggestion @wd60622, and stopped me from being a bit lazy! I've addressed this in 02dacb2 The first attempt I tried turned out to be a dead end. It basically came down to the fact that we are passing in an instantiated model object, not a model class. This means that if the model was built with non-trivial kwargs, it was getting highly complex to create another class with the mixin approach. So the solution I went with was to simply take the user-provided model instance and to use a helper function to attach the methods required by CausalPy to that model instance. I did use GPT to help with this function :) def add_mixin_methods(model_instance, mixin_class):
for attr_name in dir(mixin_class):
attr = getattr(mixin_class, attr_name)
if callable(attr) and not attr_name.startswith("__"):
# Bind the method to the instance
method = attr.__get__(model_instance, model_instance.__class__)
setattr(model_instance, attr_name, method)
return model_instance So now users can just provide unadulterated scikit-learn model instances. The experiment base class then adds the required methods to this behind the scenes. So there is zero API change for the user. |
So this is an example in # fit model
if isinstance(self.model, PyMCModel):
COORDS = {"coeffs": self.labels, "obs_indx": np.arange(self.X.shape[0])}
self.model.fit(X=self.X, y=self.y, coords=COORDS)
elif isinstance(self.model, RegressorMixin):
self.model.fit(X=self.X, y=self.y)
else:
raise ValueError("Model type not recognized") You are right, we only need all this type checking because of the different signatures of the It's maybe a bit clunky because you are needlessly defining and passing coords when you have scikit-learn models, but that seems like the easiest way to go. However, this would only affect internal code and not affect the user experience. So I'm happy if we sit and think about this one and address it at a later date. But open to any other ideas. |
Co-authored-by: Will Dean <[email protected]>
Thanks for the final points @wd60622. Fingers crossed, that should be it now? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Just two things I noticed
causalpy/__init__.py
Outdated
import causalpy.pymc_experiments as pymc_experiments # to be depricated | ||
import causalpy.pymc_models as pymc_models | ||
import causalpy.skl_experiments as skl_experiments # to be depricated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import causalpy.pymc_experiments as pymc_experiments # to be depricated | |
import causalpy.pymc_models as pymc_models | |
import causalpy.skl_experiments as skl_experiments # to be depricated | |
import causalpy.pymc_experiments as pymc_experiments # to be deprecated | |
import causalpy.pymc_models as pymc_models | |
import causalpy.skl_experiments as skl_experiments # to be deprecated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks - did a global find/replace
causalpy/tests/test_misc.py
Outdated
import causalpy as cp | ||
|
||
sample_kwargs = {"tune": 20, "draws": 20, "chains": 2, "cores": 2} | ||
|
||
|
||
def test_regression_kink_gradient_change(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import causalpy as cp | |
sample_kwargs = {"tune": 20, "draws": 20, "chains": 2, "cores": 2} | |
def test_regression_kink_gradient_change(): | |
import causalpy as cp | |
sample_kwargs = {"tune": 20, "draws": 20, "chains": 2, "cores": 2} | |
def test_regression_kink_gradient_change(): |
import causalpy as cp | |
sample_kwargs = {"tune": 20, "draws": 20, "chains": 2, "cores": 2} | |
def test_regression_kink_gradient_change(): | |
import causalpy as cp | |
def test_regression_kink_gradient_change(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, have removed sample_kwargs
Congrats! |
This is a relatively major code refactor with minor breaking changes to the API. The main purpose is to eliminate the parallel class hierarchy we had. Namely, we had virtually identical experiment classes which worked with either the PyMC or scikit-learn models. There were only slight differences here to deal with the fact that PyMC models produce
InferenceData
objects and the scikit-learn models would produce numpy arrays, for example.We don't have an immediate intention of expanding beyond PyMC or scikit-learn models, however the new code structure would make it much much easier to expand the kinds of models used. The main appeal of this is to focus on high level description of quasi-experimental methods and to abstract away from model-related implementation issues. So you could add in non-PyMC Bayesian models (see #116), or use
statsmodels
(see #8) to use OLS but also get confidence intervals (which you don't get from scikit-learn models).We should have 100% passing doctests and tests, and I re-ran all the notebooks to check that we have stable performance.
summary
method, so closes Have more coherent testing of thesummary
method #305Before
After (at time of initial PR)
After (after dealing with review comments)
So now we just have a single set of quasi experiment classes, all inheriting from
BaseExperiment
.Other changes
ModelBuilder
toPyMCModel
. This seems to make more sense as it contrasts better with a newScikitLearnAdaptor
class/mixin which gives some extra functionality to scikit-learn models.bayes_plot
orols_plot
methods, though some experiment classes have custom plot methods.API changes
The change in API for the user is relatively small. The only change should really be how the experiment classes are imported. For example:
Before
After
The import changes from
cp.pymc_experiments.DifferenceInDifferences
tocp.DifferenceInDifferences
.The old API will still work, but will emit a deprecation warning. At some point in the future we may remove the old API, so it is best to make this minor update to existing workflows.
TODO's
if isinstance
in the experiment classes📚 Documentation preview 📚: https://causalpy--381.org.readthedocs.build/en/381/