Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Support for (in-memory) Regression compression a la duckreg #619

Merged
merged 45 commits into from
Sep 21, 2024

Conversation

s3alfisc
Copy link
Member

@s3alfisc s3alfisc commented Sep 11, 2024

Supersedes #574 .

  • Adds a class for compressed regression, FeolsCompressed.
  • Adds use_compression argument to pf.feols() that supports up to two-way fixed effects via the Mundlak transform, which is run automatically when users opt into fixed effects syntax.
  • Supported inference: iid, hetero. When Mundlak is used, only wild cluster bootstrap inference is supported. Cluster variables need to be part of the model.
  • No prediction method is supported yet.
%load_ext autoreload
%autoreload 2

import pyfixest as pf
from pyfixest.estimation.estimation import feols
import pytest
import pandas as pd
import numpy as np

data = pf.get_data()

fit = pf.feols("Y ~ X1 + X2| f1", data=data, vcov = "hetero", ssc = pf.ssc(adj = False, cluster_adj = False))
fit_c = pf.feols("Y ~ X1 + X2 + C(f1)", data = data, use_compression=True, vcov = "hetero", ssc = pf.ssc(adj = False, cluster_adj = False))

pf.etable([fit, fit_c], keep = ["X1", "X2"], digits = 6)

image

FYI @apoorvalal

@s3alfisc
Copy link
Member Author

s3alfisc commented Sep 14, 2024

Implementation / Status:

For compressed estimation, fixed effects are always handled via the Mundlak transform. Up to two-way fixed effects are supported. There are no checks if the input data is a panel when using two fixed effects. Maybe we should have one?

Unit tests are implemented and pass.

Currently not supported: if a cluster variable is not part of the model formula.

@s3alfisc
Copy link
Member Author

@pre-commit.ci autofix

@s3alfisc
Copy link
Member Author

s3alfisc commented Sep 14, 2024

Something appears off with Mundlak transform iid & hetero SEs. Coefs are matching though.

Base automatically changed from refactor-model-matrix to master September 15, 2024 13:58
@s3alfisc
Copy link
Member Author

pre-commit.ci autofix

Copy link

codecov bot commented Sep 18, 2024

Codecov Report

Attention: Patch coverage is 86.13861% with 28 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
pyfixest/estimation/feols_compressed_.py 83.33% 26 Missing ⚠️
pyfixest/estimation/estimation.py 91.66% 1 Missing ⚠️
pyfixest/report/summarize.py 75.00% 1 Missing ⚠️
Files with missing lines Coverage Δ
pyfixest/estimation/FixestMulti_.py 81.72% <100.00%> (+1.14%) ⬆️
pyfixest/estimation/feols_.py 82.12% <100.00%> (-1.83%) ⬇️
pyfixest/estimation/model_matrix_fixest_.py 85.08% <100.00%> (+0.16%) ⬆️
pyfixest/estimation/estimation.py 95.12% <91.66%> (+19.40%) ⬆️
pyfixest/report/summarize.py 90.09% <75.00%> (-0.25%) ⬇️
pyfixest/estimation/feols_compressed_.py 83.33% <83.33%> (ø)

... and 1 file with indirect coverage changes

@s3alfisc s3alfisc merged commit 44bed05 into master Sep 21, 2024
10 checks passed
@s3alfisc s3alfisc deleted the regression-compression branch September 21, 2024 15:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant