-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support pyfixest
#56
Support pyfixest
#56
Conversation
This appears to work, now. I would be curious to know what @s3alfisc thinks, especially about the Below I paste a Python and an R script and their output. Run Python first because it saves the CSV. FWIW, I have not tested it extensively, but the results look similar. RCodelibrary(fixest)
library(marginaleffects)
options(width = 10000)
# Prep and fit
data = read.csv("issue56.csv")
fit = feols(Y ~ X1 * X2 * Z1, data = data)
# Prediction: Average over full empirical distribution
p = avg_predictions(fit)
print(p)
# Average marginal effects (slopes)
s = avg_slopes(fit)
print(s)
# Comparisons at two different evaluation points
c = comparisons(fit, newdata = datagrid(X1 = c(2, 4)))
print(c) Log> library(fixest)
> library(marginaleffects)
> options(width = 10000)
>
> # Prep and fit
> data = read.csv("issue56.csv")
> fit = feols(Y ~ X1 * X2 * Z1, data = data)
NOTE: 2 observations removed because of NA values (LHS: 1, RHS: 1).
>
> # Prediction: Average over full empirical distribution
> p = avg_predictions(fit)
> print(p)
Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
2.57 0.0555 46.3 <0.001 Inf 2.46 2.68
Columns: estimate, std.error, statistic, p.value, s.value, conf.low, conf.high
Type: response
>
> # Average marginal effects (slopes)
> s = avg_slopes(fit)
> print(s)
Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
X1 0.42870 0.0865 4.9544 <0.001 20.4 0.2591 0.598
X2 0.07336 0.0183 4.0067 <0.001 14.0 0.0375 0.109
Z1 -0.00406 0.0544 -0.0747 0.94 0.1 -0.1107 0.103
Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high
Type: response
>
> # Comparisons at two different evaluation points
> c = comparisons(fit, newdata = datagrid(X1 = c(2, 4)))
> print(c)
Term Contrast X1 Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 % X2 Z1
X1 +1 2 0.4287 0.0865 4.953 <0.001 20.4 0.2591 0.598 -0.126 0.995
X1 +1 4 0.4287 0.0865 4.953 <0.001 20.4 0.2591 0.598 -0.126 0.995
X2 +1 2 0.0822 0.0357 2.301 0.0214 5.5 0.0122 0.152 -0.126 0.995
X2 +1 4 0.1002 0.0875 1.146 0.2520 2.0 -0.0712 0.272 -0.126 0.995
Z1 +1 2 0.0688 0.0795 0.866 0.3867 1.4 -0.0870 0.225 -0.126 0.995
Z1 +1 4 0.2169 0.1821 1.191 0.2336 2.1 -0.1400 0.574 -0.126 0.995
Columns: rowid, term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, X1, predicted_lo, predicted_hi, predicted, Y, X2, Z1
Type: response PythonCodefrom pyfixest.estimation import feols
from pyfixest.utils import get_data
from marginaleffects import *
# Prep and fit
data = get_data()
data.to_csv("issue56.csv")
fit = feols("Y ~ X1 * X2 * Z1", data = data)
# Prediction: Average over full empirical distribution
p = avg_predictions(fit)
print(p)
# Average marginal effects (slopes)
s = avg_slopes(fit)
print(s)
# Comparisons at two different evaluation points
c = comparisons(fit, newdata = datagrid(X1 = [2, 4], model = fit))
print(c) Logshape: (1, 7)
┌──────────┬───────────┬──────┬─────────┬─────┬──────┬───────┐
│ Estimate ┆ Std.Error ┆ z ┆ P(>|z|) ┆ S ┆ 2.5% ┆ 97.5% │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str ┆ str ┆ str ┆ str ┆ str │
╞══════════╪═══════════╪══════╪═════════╪═════╪══════╪═══════╡
│ 2.57 ┆ 0.0555 ┆ 46.3 ┆ 0 ┆ inf ┆ 2.46 ┆ 2.68 │
└──────────┴───────────┴──────┴─────────┴─────┴──────┴───────┘
Columns: estimate, std_error, statistic, p_value, s_value, conf_low, conf_high
shape: (3, 9)
┌──────┬─────────────┬──────────┬───────────┬───┬──────────┬────────┬────────┬───────┐
│ Term ┆ Contrast ┆ Estimate ┆ Std.Error ┆ … ┆ P(>|z|) ┆ S ┆ 2.5% ┆ 97.5% │
│ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str ┆ str ┆ ┆ str ┆ str ┆ str ┆ str │
╞══════╪═════════════╪══════════╪═══════════╪═══╪══════════╪════════╪════════╪═══════╡
│ X1 ┆ mean(dY/dX) ┆ 0.429 ┆ 0.0865 ┆ … ┆ 7.23e-07 ┆ 20.4 ┆ 0.259 ┆ 0.598 │
│ X2 ┆ mean(dY/dX) ┆ 0.0734 ┆ 0.0183 ┆ … ┆ 6.15e-05 ┆ 14 ┆ 0.0375 ┆ 0.109 │
│ Z1 ┆ mean(dY/dX) ┆ -0.00406 ┆ 0.0544 ┆ … ┆ 0.94 ┆ 0.0885 ┆ -0.111 ┆ 0.103 │
└──────┴─────────────┴──────────┴───────────┴───┴──────────┴────────┴────────┴───────┘
Columns: term, contrast, estimate, std_error, statistic, p_value, s_value, conf_low, conf_high
shape: (6, 10)
┌─────┬──────┬──────────┬──────────┬───┬──────────┬──────┬─────────┬───────┐
│ X1 ┆ Term ┆ Contrast ┆ Estimate ┆ … ┆ P(>|z|) ┆ S ┆ 2.5% ┆ 97.5% │
│ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str ┆ str ┆ ┆ str ┆ str ┆ str ┆ str │
╞═════╪══════╪══════════╪══════════╪═══╪══════════╪══════╪═════════╪═══════╡
│ 2 ┆ X1 ┆ +1 ┆ 0.429 ┆ … ┆ 7.29e-07 ┆ 20.4 ┆ 0.259 ┆ 0.598 │
│ 4 ┆ X1 ┆ +1 ┆ 0.429 ┆ … ┆ 7.29e-07 ┆ 20.4 ┆ 0.259 ┆ 0.598 │
│ 2 ┆ X2 ┆ +1 ┆ 0.0822 ┆ … ┆ 0.0214 ┆ 5.55 ┆ 0.0122 ┆ 0.152 │
│ 4 ┆ X2 ┆ +1 ┆ 0.1 ┆ … ┆ 0.252 ┆ 1.99 ┆ -0.0712 ┆ 0.272 │
│ 2 ┆ Z1 ┆ +1 ┆ 0.0688 ┆ … ┆ 0.387 ┆ 1.37 ┆ -0.087 ┆ 0.225 │
│ 4 ┆ Z1 ┆ +1 ┆ 0.217 ┆ … ┆ 0.234 ┆ 2.1 ┆ -0.14 ┆ 0.574 │
└─────┴──────┴──────────┴──────────┴───┴──────────┴──────┴─────────┴───────┘
Columns: X1, rowid, term, contrast, estimate, std_error, statistic, p_value, s_value, conf_low, conf_high, predicted, predicted_lo, predicted_hi, Y, Y2, X2, f1, f2, f3, group_id, Z1, Z2 |
Hi @vincentarelbundock, super cool and very exciting - this feels like an early Christmas gift! I can't promise to be able to take a look today but I'll definitely find time in the next few days =) |
Haha, nice! But just to calibrate expectations, the model above is literally the only one I tried. It might not work elsewhere, but at least we have a base to work with. :) |
* plotnine * trash.py for tests * plotnine * p9 plot_predictions works * p9: plot_comparisons and plot_slopes * lint * toml * workflows * toml * toml * toml * remove pandas dependency * remove pandas dependency * toml * toml * lint * snapshot * skip test
* plotnine * trash.py for tests * plotnine * p9 plot_predictions works * p9: plot_comparisons and plot_slopes * lint * toml * workflows * toml * toml * toml * remove pandas dependency * remove pandas dependency * toml * toml * lint * snapshot * skip test
Hi @vincentarelbundock - I have started to take a look yesterday and have implemented some tests with regressions with fixed effects. All major functions - If you are interested, I'd be happy to open a PR with more extensive unit tests. I would simply add tests for OLS with fixed effects, IV, and, once I add the required functionality to its predict method, Poisson Regression following your template in |
Ah thanks, this sounds amazing! Yep a PR seems like a good way to go. I can hopefully debug the |
Here's a reproducible example for what I think is a bug in import polars as pl
import numpy as np
from pyfixest.estimation import feols
from marginaleffects import *
def create_test_data():
np.random.seed(1024)
data = pl.DataFrame({
"X1": np.random.normal(size = 1000),
"X2": np.random.normal(size = 1000),
"Z1": np.random.normal(size = 1000),
"e": np.random.normal(size = 1000),
"f1": np.random.choice([0, 1, 2, 3, 4, 5], size = 1000, replace = True),
"f2": np.random.choice([0, 1, 2, 3, 4, 5], size = 1000, replace = True)
}).with_columns((pl.col("X1") * pl.col("X2") * pl.col("Z1") + pl.col("e")).alias("Y"))
return data
data = create_test_data()
# test 1: no fixed effects
fit = feols("Y ~ X1 * X2 * Z1 | f1", data = data.to_pandas())
comparisons(fit, newdata = datagrid(X1 = [2, 4], model = fit))
X1 rowid term contrast estimate std_error statistic p_value s_value conf_low conf_high predicted predicted_lo predicted_hi X2 Z1 e f1 f2 Y
i64 i32 str str f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 f64
2 0 "X1" "+1" NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN -0.024305 0.023899 0.004883 2.573 2.431 0.010447
4 1 "X1" "+1" NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN -0.024305 0.023899 0.004883 2.573 2.431 0.010447
2 0 "X2" "+1" NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN -0.024305 0.023899 0.004883 2.573 2.431 0.010447
4 1 "X2" "+1" NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN -0.024305 0.023899 0.004883 2.573 2.431 0.010447
2 0 "Z1" "+1" NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN -0.024305 0.023899 0.004883 2.573 2.431 0.010447
4 1 "Z1" "+1" NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN -0.024305 0.023899 0.004883 2.573 2.431 0.010447 In R, I instead get Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 % X2 Z1 f1 X1
X1 +1 0.0259 0.0209 1.24 0.21578 2.2 -0.0151 0.0669 -0.0243 0.0239 2 2
X1 +1 0.0259 0.0209 1.24 0.21578 2.2 -0.0151 0.0669 -0.0243 0.0239 2 4
X2 +1 0.0712 0.0489 1.46 0.14524 2.8 -0.0246 0.1670 -0.0243 0.0239 2 2
X2 +1 0.1612 0.1094 1.47 0.14068 2.8 -0.0533 0.3757 -0.0243 0.0239 2 4
Z1 +1 -0.1305 0.0392 -3.33 < 0.001 10.2 -0.2073 -0.0538 -0.0243 0.0239 2 2
Z1 +1 -0.2903 0.0907 -3.20 0.00138 9.5 -0.4681 -0.1125 -0.0243 0.0239 2 4 by running library(fixest)
library(marginaleffects)
library(reticulate)
data = py$data
fit = feols(Y ~ X1 * X2 * Z1 | f1, data = data)
comparisons(fit, newdata = datagrid(X1 = c(2,4), model = fit)) I'll set up the PR with tests before the new year =) |
Define an abstract class
ModelPyfixest
to supportpyfixest
models.WIP. Does not work yet.
Known problems:
get_predict()
expects a data frame, but thestatsmodels
-based infrastructure supplies apatsy
matrix. We probably want to createpatsy
matrices inside theget_predict()
method specific tostatsmodels
, and otherwise always pass a data frame to allget_predict()
.