-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add example of multiple geo lift test analysis #338
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #338 +/- ##
==========================================
+ Coverage 83.10% 85.60% +2.49%
==========================================
Files 21 22 +1
Lines 1687 1716 +29
==========================================
+ Hits 1402 1469 +67
+ Misses 285 247 -38 ☔ View full report in Codecov by Sentry. |
View / edit / reply to this conversation on ReviewNB juanitorduz commented on 2024-05-08T12:57:25Z nip: use " or ' for strings (not both) ... btw: do we have ruff for notebooks? :) drbenvincent commented on 2024-05-08T19:57:06Z Fixed. Created #340 to double check ruff for notebooks. |
View / edit / reply to this conversation on ReviewNB juanitorduz commented on 2024-05-08T12:57:26Z shall we use
I see (from the plot) negative sales? Is this expected? drbenvincent commented on 2024-05-08T20:03:07Z Applied the tab20 colormap.
Good spot with the negative sales. This is not expected. I'll add this to the todo list - I'll go back to the synthetic data generation and ponder whether we need to operate on log sales etc. |
View / edit / reply to this conversation on ReviewNB juanitorduz commented on 2024-05-08T12:57:27Z Canw e take the legend outside the plots? they are hard to read drbenvincent commented on 2024-05-08T20:06:20Z Good point. I'll actually create an issue about this because this is an important plot type in CausalPy and we want a decent solution which will be useful in many contexts. |
View / edit / reply to this conversation on ReviewNB juanitorduz commented on 2024-05-08T12:57:28Z i think is worth adding a section where we compare the results of the two methods. drbenvincent commented on 2024-05-08T20:12:51Z Agreed. Will add this to the todo list and update very soon |
@drbenvincent this is very interesting! In practice, I have used a fixed effect model as described in https://matheusfacure.github.io/python-causality-handbook/14-Panel-Data-and-Fixed-Effects.html. In this case, we can use all the info from both groups without aggregating. It would be super interesting to compare the results of these approaches (I think the fixed effects model is very popular) |
Actually! What if you de-mean the data (in time and in unit) as in the book and apply the synthetic control model instead of the classic linear regression💡 This could be the better way 🤔 |
Fixed. Created #340 to double check ruff for notebooks. View entire conversation on ReviewNB |
Applied the tab20 colormap.
Good spot with the negative sales. This is not expected. I'll add this to the todo list - I'll go back to the synthetic data generation and ponder whether we need to operate on log sales etc. View entire conversation on ReviewNB |
Agreed. Will add this to the todo list and update very soon View entire conversation on ReviewNB |
So this is interesting @juanitorduz. But I think it will have implications in terms of interpolation/extrapolation and kind of change the nature of the model we are using - in that now we might need to use a In the situation where we have some large geos (with many sales) and some small geos (few sales), this scaling would also be scaling up/down the observation noise. I can't quite think through the implications of this at the moment, but is there a reason why this isn't done in situations where extrapolation seems to be required? Eg when a target geo is outside the convex hull. |
Maybe we can leave this out from this PR and I can try to test it myself? 😄 |
Sounds good. We can update the example at a later date with more content. Also saw this...
|
@juanitorduz... So I addressed many of the comments. Today I fixed the negative sales (whoops) and added a section comparing the approaches. The comments I've not yet addressed, I felt were appropriate to bundle up into separate issues (see the PR description up top). Let me know what you think. |
View / edit / reply to this conversation on ReviewNB juanitorduz commented on 2024-06-19T16:42:37Z The legend is redundate as all of them are gray ;) So maybe use different colors or remove the legend? drbenvincent commented on 2024-06-19T18:44:55Z Good point. I've changed the colours |
I left a small comment regarding a plot. We can merge this one an iterate. I think the killer feature would be to add a hierarchical model. This will close the gap between the pooled and unpooled models :D |
Good point. I've changed the colours View entire conversation on ReviewNB |
I agree. I'll add an issue for that. |
Replaced manual multiple forest plots with the nice feature of plot_forest that allows you to compare multiple models. Eg. @juanitorduz this should be good for approval + merging now? |
Create new classes?
The PR currently does not add any additional code or classes. All the new functionality is embedded within the example notebook. It is relatively simple, we are just creating an aggregate treated region or iterating through each treated geo. So at the moment this uses an approach of transparency so that users can see what is being done etc.
However, part of the point of CausalPy is to provide a simple to use API, not requiring the user to do that much manual python coding. So the question is whether we should create new classes / an API. It would be something along the lines of:
For the pooled approach:
For the unpooled approach:
So the trade-off would be having a relatively clean API, but at the expense of making the operation a little more opaque. The manual python code in the notebook (as it stands right now) is not that complex. So I think it's not overwhelmingly obvious which we should go with.
TODO's based on feedback so far
Check the pre-commit checks are applying ruff formatting to notebooks.We'll do this in that in Check pre-commit ruff formatting for notebooks is set up correctly #340 and apply before the next release.Fix the legends overlapping with plot content. We'll deal with that in Fix legend overlapping with lines on causal impact plots #341 and run it on this notebook (and others) before the next release.Think about using fixed effects approaches https://matheusfacure.github.io/python-causality-handbook/14-Panel-Data-and-Fixed-Effects.html. And/or thinking about "What if you de-mean the data (in time and in unit) as in the book and apply the synthetic control model instead of the classic linear regression"This is a great suggestion, but I think we are holding off and waiting until @juanitorduz investigates. We can certainly update this notebook in the future.