-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PyFixest
returns nan
in models with fixed effects
#75
Comments
Edit to tag @s3alfisc The problem is that This is obviously an invalid fixed effect value for IMHO, the best solution available would be for Alternatively (and immediately), we can specify the values of # comparison for a specific unit
comparisons(fit, newdata = datagrid(
X1 = [2, 4],
f1 = 1,
model = fit)
)
# comparisons for all units
comparisons(fit, newdata = datagrid(
X1 = [2, 4],
f1 = data["f1"].unique(),
model = fit)
) |
Hi @vincentarelbundock, sorry for not responding right away, it took me some time to check
import polars as pl
import numpy as np
import pandas as pd
from pyfixest.estimation import feols
def create_test_data():
np.random.seed(1024)
data = pl.DataFrame({
"X1": np.random.normal(size = 1000),
"X2": np.random.normal(size = 1000),
"Z1": np.random.normal(size = 1000),
"e": np.random.normal(size = 1000),
"f1": np.random.choice([0, 1, 2, 3, 4, 5], size = 1000, replace = True),
"f2": np.random.choice([0, 1, 2, 3, 4, 5], size = 1000, replace = True)
}).with_columns((pl.col("X1") * pl.col("X2") * pl.col("Z1") + pl.col("e")).alias("Y"))
return data
data = create_test_data().to_pandas()
fit = feols("Y ~ X1 * X2 * Z1 | f1", data = data)
fit.predict()[:5] # array([-0.20069114, -0.18479564, 1.30258071, 0.17423046, 0.39628393])
data["f1"] = pd.Categorical(data["f1"])
fit = feols("Y ~ X1 * X2 * Z1 | f1", data = data)
fit.predict()[:5] # array([-0.20069114, -0.18479564, 1.30258071, 0.17423046, 0.39628393]) Nevertheless, |
Yeah, I get it. It is a nice in fact a nice convenience feature. I wonder about transforming the data. Maybe some users will be surprised to find that the dataset hosted in the fit object is not the same as the original. Could you maybe store an attribute in the fit object to say which variables should be treated as categorical? |
I also don't love it, so I was thinking to add a function argument |
Look, I won't argue if you want to take this on, but if I were you I'd be reticent to polluting the user interface with an additional argument for something that is basically only useful for internal mechanics... Your call, obviously. On my end, the only thing that would need to change would be this existing function: We could add an additional But again, your call. |
I think you're right. 👍 You can actually already access all fixed effects as a formula string via fit = feols("Y ~ X1 * X2 * Z1 | f1 + f2", data = data)
fit._fixef # 'f1+f2'
fit._fixef.split("+") #['f1', 'f2'] |
And there's never a "^" as in fixest for R? "+" is always the right split string? |
There might be, but then fit = feols("Y ~ X1 * X2 * Z1 | f1^f2", data = data)
fit._fixef.split("+") #['f1^f2']
fit._data.head()
X1 X2 Z1 e f1 f2 Y f1^f2
0 2.124449 0.139458 -1.133521 0.620320 2 2 0.284490 2^2
1 0.252646 0.394952 -0.108843 -0.066855 1 3 -0.077715 1^3
2 1.454179 0.778479 1.350050 2.289958 1 2 3.818280 1^2
3 0.569240 -0.422725 -0.573504 -0.417212 0 2 -0.279209 0^2
4 0.458224 -0.535603 -1.376782 0.725313 3 4 1.063211 3^4 |
Should be fixed on Github. Thanks for following up on this and for the extra info. import polars as pl
import numpy as np
from pyfixest.estimation import feols
from marginaleffects import *
def create_test_data():
np.random.seed(1024)
data = pl.DataFrame({
"X1": np.random.normal(size = 1000),
"X2": np.random.normal(size = 1000),
"Z1": np.random.normal(size = 1000),
"e": np.random.normal(size = 1000),
"f1": np.random.choice([0, 1, 2, 3, 4, 5], size = 1000, replace = True),
"f2": np.random.choice([0, 1, 2, 3, 4, 5], size = 1000, replace = True)
}).with_columns((pl.col("X1") * pl.col("X2") * pl.col("Z1") + pl.col("e")).alias("Y"))
return data
data = create_test_data()
fit = feols("Y ~ X1 * X2 * Z1 | f1", data = data.to_pandas())
comparisons(fit, newdata = datagrid(X1 = [2, 4], model = fit))
shape: (6, 10)
┌─────┬──────┬──────────┬──────────┬───┬──────────┬──────┬─────────┬─────────┐
│ X1 ┆ Term ┆ Contrast ┆ Estimate ┆ … ┆ P(>|z|) ┆ S ┆ 2.5% ┆ 97.5% │
│ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str ┆ str ┆ ┆ str ┆ str ┆ str ┆ str │
╞═════╪══════╪══════════╪══════════╪═══╪══════════╪══════╪═════════╪═════════╡
│ 2 ┆ X1 ┆ +1 ┆ 0.0259 ┆ … ┆ 0.216 ┆ 2.21 ┆ -0.0151 ┆ 0.0669 │
│ 4 ┆ X1 ┆ +1 ┆ 0.0259 ┆ … ┆ 0.216 ┆ 2.21 ┆ -0.0151 ┆ 0.0669 │
│ 2 ┆ X2 ┆ +1 ┆ 0.0712 ┆ … ┆ 0.145 ┆ 2.79 ┆ -0.0246 ┆ 0.167 │
│ 4 ┆ X2 ┆ +1 ┆ 0.161 ┆ … ┆ 0.14 ┆ 2.83 ┆ -0.0532 ┆ 0.376 │
│ 2 ┆ Z1 ┆ +1 ┆ -0.131 ┆ … ┆ 0.000852 ┆ 10.2 ┆ -0.207 ┆ -0.0538 │
│ 4 ┆ Z1 ┆ +1 ┆ -0.29 ┆ … ┆ 0.00137 ┆ 9.51 ┆ -0.468 ┆ -0.113 │
└─────┴──────┴──────────┴──────────┴───┴──────────┴──────┴─────────┴─────────┘ |
Awesome! I'll add some very basic tests to |
Here's a reproducible example for what I think is a bug in
comparisons()
:In R, I instead get
by running
I'll set up the PR with tests before the new year =)
Originally posted by @s3alfisc in #56 (comment)
The text was updated successfully, but these errors were encountered: