Support `pyfixest` #56

vincentarelbundock · 2023-12-23T20:24:24Z

Define an abstract class ModelPyfixest to support pyfixest models.

WIP. Does not work yet.

Known problems:

get_predict() expects a data frame, but the statsmodels-based infrastructure supplies a patsy matrix. We probably want to create patsy matrices inside the get_predict() method specific to statsmodels, and otherwise always pass a data frame to all get_predict().

import polars as pl
import numpy as np
from pyfixest.estimation import feols
from pyfixest.utils import get_data
from marginaleffects import predictions
data = get_data()
fit = feols("Y ~ X1", data = data)
predictions(fit)

vincentarelbundock · 2023-12-23T22:23:25Z

This appears to work, now. I would be curious to know what @s3alfisc thinks, especially about the pyfixest class here:

https://github.com/vincentarelbundock/pymarginaleffects/pull/56/files#diff-9b8c333d26e1955c292c955f0da401086afad8a91415c2107187b12a53bb6552

Below I paste a Python and an R script and their output. Run Python first because it saves the CSV. FWIW, I have not tested it extensively, but the results look similar.

R

Code

library(fixest)
library(marginaleffects)
options(width = 10000)

# Prep and fit
data = read.csv("issue56.csv")
fit = feols(Y ~ X1 * X2 * Z1, data = data)

# Prediction: Average over full empirical distribution
p = avg_predictions(fit)
print(p)

# Average marginal effects (slopes)
s = avg_slopes(fit)
print(s)

# Comparisons at two different evaluation points
c = comparisons(fit, newdata = datagrid(X1 = c(2, 4)))
print(c)

Log

> library(fixest)
> library(marginaleffects)
> options(width = 10000)
> 
> # Prep and fit
> data = read.csv("issue56.csv")
> fit = feols(Y ~ X1 * X2 * Z1, data = data)
NOTE: 2 observations removed because of NA values (LHS: 1, RHS: 1).
> 
> # Prediction: Average over full empirical distribution
> p = avg_predictions(fit)
> print(p)

 Estimate Std. Error    z Pr(>|z|)   S 2.5 % 97.5 %
     2.57     0.0555 46.3   <0.001 Inf  2.46   2.68

Columns: estimate, std.error, statistic, p.value, s.value, conf.low, conf.high 
Type:  response 

> 
> # Average marginal effects (slopes)
> s = avg_slopes(fit)
> print(s)

 Term Estimate Std. Error       z Pr(>|z|)    S   2.5 % 97.5 %
   X1  0.42870     0.0865  4.9544   <0.001 20.4  0.2591  0.598
   X2  0.07336     0.0183  4.0067   <0.001 14.0  0.0375  0.109
   Z1 -0.00406     0.0544 -0.0747     0.94  0.1 -0.1107  0.103

Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high 
Type:  response 

> 
> # Comparisons at two different evaluation points
> c = comparisons(fit, newdata = datagrid(X1 = c(2, 4)))
> print(c)

 Term Contrast X1 Estimate Std. Error     z Pr(>|z|)    S   2.5 % 97.5 %     X2    Z1
   X1       +1  2   0.4287     0.0865 4.953   <0.001 20.4  0.2591  0.598 -0.126 0.995
   X1       +1  4   0.4287     0.0865 4.953   <0.001 20.4  0.2591  0.598 -0.126 0.995
   X2       +1  2   0.0822     0.0357 2.301   0.0214  5.5  0.0122  0.152 -0.126 0.995
   X2       +1  4   0.1002     0.0875 1.146   0.2520  2.0 -0.0712  0.272 -0.126 0.995
   Z1       +1  2   0.0688     0.0795 0.866   0.3867  1.4 -0.0870  0.225 -0.126 0.995
   Z1       +1  4   0.2169     0.1821 1.191   0.2336  2.1 -0.1400  0.574 -0.126 0.995

Columns: rowid, term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, X1, predicted_lo, predicted_hi, predicted, Y, X2, Z1 
Type:  response

Python

Code

from pyfixest.estimation import feols
from pyfixest.utils import get_data
from marginaleffects import *

# Prep and fit
data = get_data()
data.to_csv("issue56.csv")
fit = feols("Y ~ X1 * X2 * Z1", data = data)

# Prediction: Average over full empirical distribution
p = avg_predictions(fit)
print(p)

# Average marginal effects (slopes)
s = avg_slopes(fit)
print(s)

# Comparisons at two different evaluation points
c = comparisons(fit, newdata = datagrid(X1 = [2, 4], model = fit))
print(c)

Log

shape: (1, 7)
┌──────────┬───────────┬──────┬─────────┬─────┬──────┬───────┐
│ Estimate ┆ Std.Error ┆ z    ┆ P(>|z|) ┆ S   ┆ 2.5% ┆ 97.5% │
│ ---      ┆ ---       ┆ ---  ┆ ---     ┆ --- ┆ ---  ┆ ---   │
│ str      ┆ str       ┆ str  ┆ str     ┆ str ┆ str  ┆ str   │
╞══════════╪═══════════╪══════╪═════════╪═════╪══════╪═══════╡
│ 2.57     ┆ 0.0555    ┆ 46.3 ┆ 0       ┆ inf ┆ 2.46 ┆ 2.68  │
└──────────┴───────────┴──────┴─────────┴─────┴──────┴───────┘

Columns: estimate, std_error, statistic, p_value, s_value, conf_low, conf_high

shape: (3, 9)
┌──────┬─────────────┬──────────┬───────────┬───┬──────────┬────────┬────────┬───────┐
│ Term ┆ Contrast    ┆ Estimate ┆ Std.Error ┆ … ┆ P(>|z|)  ┆ S      ┆ 2.5%   ┆ 97.5% │
│ ---  ┆ ---         ┆ ---      ┆ ---       ┆   ┆ ---      ┆ ---    ┆ ---    ┆ ---   │
│ str  ┆ str         ┆ str      ┆ str       ┆   ┆ str      ┆ str    ┆ str    ┆ str   │
╞══════╪═════════════╪══════════╪═══════════╪═══╪══════════╪════════╪════════╪═══════╡
│ X1   ┆ mean(dY/dX) ┆ 0.429    ┆ 0.0865    ┆ … ┆ 7.23e-07 ┆ 20.4   ┆ 0.259  ┆ 0.598 │
│ X2   ┆ mean(dY/dX) ┆ 0.0734   ┆ 0.0183    ┆ … ┆ 6.15e-05 ┆ 14     ┆ 0.0375 ┆ 0.109 │
│ Z1   ┆ mean(dY/dX) ┆ -0.00406 ┆ 0.0544    ┆ … ┆ 0.94     ┆ 0.0885 ┆ -0.111 ┆ 0.103 │
└──────┴─────────────┴──────────┴───────────┴───┴──────────┴────────┴────────┴───────┘

Columns: term, contrast, estimate, std_error, statistic, p_value, s_value, conf_low, conf_high

shape: (6, 10)
┌─────┬──────┬──────────┬──────────┬───┬──────────┬──────┬─────────┬───────┐
│ X1  ┆ Term ┆ Contrast ┆ Estimate ┆ … ┆ P(>|z|)  ┆ S    ┆ 2.5%    ┆ 97.5% │
│ --- ┆ ---  ┆ ---      ┆ ---      ┆   ┆ ---      ┆ ---  ┆ ---     ┆ ---   │
│ str ┆ str  ┆ str      ┆ str      ┆   ┆ str      ┆ str  ┆ str     ┆ str   │
╞═════╪══════╪══════════╪══════════╪═══╪══════════╪══════╪═════════╪═══════╡
│ 2   ┆ X1   ┆ +1       ┆ 0.429    ┆ … ┆ 7.29e-07 ┆ 20.4 ┆ 0.259   ┆ 0.598 │
│ 4   ┆ X1   ┆ +1       ┆ 0.429    ┆ … ┆ 7.29e-07 ┆ 20.4 ┆ 0.259   ┆ 0.598 │
│ 2   ┆ X2   ┆ +1       ┆ 0.0822   ┆ … ┆ 0.0214   ┆ 5.55 ┆ 0.0122  ┆ 0.152 │
│ 4   ┆ X2   ┆ +1       ┆ 0.1      ┆ … ┆ 0.252    ┆ 1.99 ┆ -0.0712 ┆ 0.272 │
│ 2   ┆ Z1   ┆ +1       ┆ 0.0688   ┆ … ┆ 0.387    ┆ 1.37 ┆ -0.087  ┆ 0.225 │
│ 4   ┆ Z1   ┆ +1       ┆ 0.217    ┆ … ┆ 0.234    ┆ 2.1  ┆ -0.14   ┆ 0.574 │
└─────┴──────┴──────────┴──────────┴───┴──────────┴──────┴─────────┴───────┘

Columns: X1, rowid, term, contrast, estimate, std_error, statistic, p_value, s_value, conf_low, conf_high, predicted, predicted_lo, predicted_hi, Y, Y2, X2, f1, f2, f3, group_id, Z1, Z2

s3alfisc · 2023-12-24T09:17:01Z

Hi @vincentarelbundock, super cool and very exciting - this feels like an early Christmas gift! I can't promise to be able to take a look today but I'll definitely find time in the next few days =)

vincentarelbundock · 2023-12-24T13:23:11Z

Haha, nice!

But just to calibrate expectations, the model above is literally the only one I tried. It might not work elsewhere, but at least we have a base to work with.

:)

* plotnine * trash.py for tests * plotnine * p9 plot_predictions works * p9: plot_comparisons and plot_slopes * lint * toml * workflows * toml * toml * toml * remove pandas dependency * remove pandas dependency * toml * toml * lint * snapshot * skip test

s3alfisc · 2023-12-27T14:20:33Z

Hi @vincentarelbundock - I have started to take a look yesterday and have implemented some tests with regressions with fixed effects. All major functions - predictions(), avg_predictions(), and avg_slopes() work correctly and produce identical results in Python and R. Only comparisons() returns only NaN values. I have started to debug this but it might take me a little more time until I will have figured it out.

If you are interested, I'd be happy to open a PR with more extensive unit tests. I would simply add tests for OLS with fixed effects, IV, and, once I add the required functionality to its predict method, Poisson Regression following your template in test_pyfixest.py.

vincentarelbundock · 2023-12-27T14:33:00Z

Ah thanks, this sounds amazing! Yep a PR seems like a good way to go. I can hopefully debug the comparisons() issue pretty easily once I see a reproducible example.

s3alfisc · 2023-12-27T18:09:10Z

Here's a reproducible example for what I think is a bug in comparisons():

import polars as pl
import numpy as np
from pyfixest.estimation import feols
from marginaleffects import *

def create_test_data():

    np.random.seed(1024)
    data = pl.DataFrame({
        "X1": np.random.normal(size = 1000),
        "X2": np.random.normal(size = 1000),
        "Z1": np.random.normal(size = 1000),
        "e": np.random.normal(size = 1000),
        "f1": np.random.choice([0, 1, 2, 3, 4, 5], size = 1000, replace = True),
        "f2": np.random.choice([0, 1, 2, 3, 4, 5], size = 1000, replace = True)
    }).with_columns((pl.col("X1") * pl.col("X2") * pl.col("Z1") + pl.col("e")).alias("Y"))

    return data

data = create_test_data()

# test 1: no fixed effects
fit = feols("Y ~ X1 * X2 * Z1 | f1", data = data.to_pandas())

comparisons(fit, newdata = datagrid(X1 = [2, 4], model = fit))
X1	rowid	term	contrast	estimate	std_error	statistic	p_value	s_value	conf_low	conf_high	predicted	predicted_lo	predicted_hi	X2	Z1	e	f1	f2	Y
i64	i32	str	str	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64
2	0	"X1"	"+1"	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	-0.024305	0.023899	0.004883	2.573	2.431	0.010447
4	1	"X1"	"+1"	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	-0.024305	0.023899	0.004883	2.573	2.431	0.010447
2	0	"X2"	"+1"	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	-0.024305	0.023899	0.004883	2.573	2.431	0.010447
4	1	"X2"	"+1"	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	-0.024305	0.023899	0.004883	2.573	2.431	0.010447
2	0	"Z1"	"+1"	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	-0.024305	0.023899	0.004883	2.573	2.431	0.010447
4	1	"Z1"	"+1"	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	-0.024305	0.023899	0.004883	2.573	2.431	0.010447

In R, I instead get

 Term Contrast Estimate Std. Error     z Pr(>|z|)    S   2.5 %  97.5 %      X2     Z1 f1 X1
   X1       +1   0.0259     0.0209  1.24  0.21578  2.2 -0.0151  0.0669 -0.0243 0.0239  2  2
   X1       +1   0.0259     0.0209  1.24  0.21578  2.2 -0.0151  0.0669 -0.0243 0.0239  2  4
   X2       +1   0.0712     0.0489  1.46  0.14524  2.8 -0.0246  0.1670 -0.0243 0.0239  2  2
   X2       +1   0.1612     0.1094  1.47  0.14068  2.8 -0.0533  0.3757 -0.0243 0.0239  2  4
   Z1       +1  -0.1305     0.0392 -3.33  < 0.001 10.2 -0.2073 -0.0538 -0.0243 0.0239  2  2
   Z1       +1  -0.2903     0.0907 -3.20  0.00138  9.5 -0.4681 -0.1125 -0.0243 0.0239  2  4

by running

library(fixest)
library(marginaleffects)
library(reticulate)

data = py$data
fit = feols(Y ~ X1 * X2 * Z1 | f1, data = data)
comparisons(fit, newdata = datagrid(X1 = c(2,4), model = fit))

I'll set up the PR with tests before the new year =)

vincentarelbundock added 5 commits December 23, 2023 14:23

initial commit; not working at all

02d72cb

Merge branch 'main' into pyfixest

86043fa

fixups

02f80fd

looks like pyfixest is working!

ad7ddd5

pyfixest appears to work

656b2de

vincentarelbundock marked this pull request as ready for review December 23, 2023 22:13

lint

0369a3c

vincentarelbundock changed the title ~~[WIP]: Support pyfixest~~ Support pyfixest Dec 23, 2023

vincentarelbundock mentioned this pull request Dec 23, 2023

Support: pyfixest #36

Closed

vincentarelbundock and others added 18 commits December 24, 2023 10:42

back to poetry

3bd12a5

test

dc4b7d8

rye -> poetry

badb93a

remove 3.12

13ffe93

Merge branch 'main' into pyfixest

071de9d

README

6e00248

toml

a4597bc

cleanup

478d76d

Issue Get started bug #62

bd86f54

jss tests

a213e82

bug fixes

27a493a

hypothesis shape

00763ef

minor bugs

5248fe1

minor

797f4a5

Merge branch 'main' into pyfixest

6cff12d

test surface not volumne

077746b

plotnine (#74)

db921f3

* plotnine * trash.py for tests * plotnine * p9 plot_predictions works * p9: plot_comparisons and plot_slopes * lint * toml * workflows * toml * toml * toml * remove pandas dependency * remove pandas dependency * toml * toml * lint * snapshot * skip test

plotnine (#74)

6187c50

* plotnine * trash.py for tests * plotnine * p9 plot_predictions works * p9: plot_comparisons and plot_slopes * lint * toml * workflows * toml * toml * toml * remove pandas dependency * remove pandas dependency * toml * toml * lint * snapshot * skip test

vincentarelbundock added 5 commits December 26, 2023 22:53

condition string shortcuts

5320ab2

news + lint

15b598d

better threenum / fivenum labels in plot_*() condition argument

5b845c7

lint

093b686

lint tests + pre commit

1dc7b70

vincentarelbundock merged commit 1dc7b70 into main Dec 27, 2023

vincentarelbundock mentioned this pull request Dec 27, 2023

PyFixest returns nan in models with fixed effects #75

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support `pyfixest` #56

Support `pyfixest` #56

vincentarelbundock commented Dec 23, 2023 •

edited

Loading

vincentarelbundock commented Dec 23, 2023

s3alfisc commented Dec 24, 2023

vincentarelbundock commented Dec 24, 2023

s3alfisc commented Dec 27, 2023 •

edited

Loading

vincentarelbundock commented Dec 27, 2023 •

edited

Loading

s3alfisc commented Dec 27, 2023 •

edited

Loading

Support pyfixest #56

Support pyfixest #56

Conversation

vincentarelbundock commented Dec 23, 2023 • edited Loading

vincentarelbundock commented Dec 23, 2023

R

Code

Log

Python

Code

Log

s3alfisc commented Dec 24, 2023

vincentarelbundock commented Dec 24, 2023

s3alfisc commented Dec 27, 2023 • edited Loading

vincentarelbundock commented Dec 27, 2023 • edited Loading

s3alfisc commented Dec 27, 2023 • edited Loading

Support `pyfixest` #56

Support `pyfixest` #56

vincentarelbundock commented Dec 23, 2023 •

edited

Loading

s3alfisc commented Dec 27, 2023 •

edited

Loading

vincentarelbundock commented Dec 27, 2023 •

edited

Loading

s3alfisc commented Dec 27, 2023 •

edited

Loading