-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue19: add support for Linearmodels #144
base: main
Are you sure you want to change the base?
Conversation
Thanks, this looks fantastic! I'll try to give this a read as soon as possible, but it's the end of semester, so I'm not sure exactly when I'll have time for a deep dive. One thing I'd like us to think about is how we can design a consistent interface for all model-fitting functions that accept matrices rather than data frames. For example, @artiom-matvei and I have discussed adding support for Scikit-Learn, where the typical algorithm accepts There also, we could use the That will be a use case that applies to several packages, not just Scikit-Learn and LinearModels, so we want to think of a consistent approach. Maybe we can learn something from Scikit-Learn pipelines, where one of the possible transformers is a formula: https://scikit-learn.org/1.5/modules/generated/sklearn.pipeline.Pipeline.html |
FYI, I'm doing some experiments here: #145 Still far from a finished product, but I'll let you know when I think I've found a good high-level interface. |
I merged the sklearn PR. Usage examples in the first post of #145 We can still think about the interface some, but that gives us something to build on. I also took the liberty to update your branch, since some of the methods names have changed. The key difference is that the list-wise deleted data frame used to fit the model should now be hosted in Currently, the Thanks again for your contribution here. This is very cool! |
I'm working on the This will take a bit more work, as the new approach with |
I've added a few things:
A simple test of the new interface is as follows
I did disable the The test cases for linearmodels also need some updating. |
Thanks for this work, and sorry for the delay! (End of semester was a bit crazy this year.) I pushed a few commits:
TODO:
|
Does the The expected test results are the output of the first version of this PR. I'm not sure what caused the big discrepancy. I'll look into it. I'd like to to have some PanelOLS results and marginal effects estimated with R and the R version of marginaleffects to our python implentation with it. Could you point me in the right direction to generate these results? I'm not that familiar with the popular R packages. Could you have a look at the |
I just fixed the Don't worry about type checking. I plan to overhaul that later. You can just remove the decorator. The two main packages for these models in import numpy as np
from statsmodels.datasets import grunfeld
data = grunfeld.load_pandas().data
data.year = data.year.astype(np.int64)
from linearmodels import PanelOLS
etdata = data.set_index(['firm','year'])
PanelOLS(etdata.invest,
etdata[['value','capital']],
entity_effects=True,
time_effects=True).fit(debiased=True)
etdata.to_csv("~/Downloads/grunfeld.csv") library(fixest)
dat = read.csv("~/Downloads/grunfeld.csv")
mod = feols(invest ~ value + capital | firm + year, data = dat)
coef(mod)
library(plm)
mod = plm(invest ~ value + capital, data = dat, index = c("firm", "year"), model = "within", effect="twoways")
coef(mod) |
This PR adds initial support for the linearmodels package, focusing on panel data models.
Implementation Details
The implementation differs slightly from other models due to how linearmodels handles data. Since fitted linearmodels models don't retain the original DataFrame (discussed in #19), users need to manually wrap the fitted model in a ModelLinearmodels class:
This approach preserves access to all newdata options (e.g., mean).
Testing
I've added basic consistency tests for the current implementation. Additional R-based comparison tests are still needed, but I don't have the expertise in R panel data to make these myself.
Current limitations
Patsy doesn't support linearmodels' custom EntityEffects and TimeEffects keywords
As a result, marginaleffects can't handle these effects either.
Workaround: Create the y and exog dataframes yourself with Patsy and construct the model as follows:
The formula needs to be manually added here.
linearmodels adds an intercept by default, even when it's not specified explicitly in a formula. This also conflicts with Patsy's expectations
Workaround: Users must explicitly include an intercept
+1
in formula when desiredNext steps