Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Incorrect prediction with WLS anf fixed effects and newdata not None #678

Closed
s3alfisc opened this issue Oct 27, 2024 · 4 comments · Fixed by #682
Closed

Bug: Incorrect prediction with WLS anf fixed effects and newdata not None #678

s3alfisc opened this issue Oct 27, 2024 · 4 comments · Fixed by #682
Labels
bug Something isn't working

Comments

@s3alfisc
Copy link
Member

For WLS estimation with newdata argument provided, the output of predict is incorrect.

For all other cases, it matched fixest.

See examples below:

No newdata, no weights

import pyfixest as pf
import pandas as pd
pd.options.display.precision = 7

df_castle = pd.read_stata("https://github.com/scunning1975/mixtape/raw/master/castle.dta")
df_castle = df_castle.astype({"post": bool})

data = df_castle.loc[df_castle["post"] == 0, :]
vcov = "hetero"

fit = pf.feols(
  fml = "l_homicide ~ 1 | state + year",
  data = data, 
)

fit.predict()[0:5]
# array([1.96511559, 1.98852372, 1.96735659, 2.01274515, 2.00770564])
library(fixest)
library(reticulate)

fit = feols(
  fml = l_homicide ~ 1 | state + year,
  data = py$data, 
)

predict(fit)[0:5]
# [1] 1.965116 1.988524 1.967357 2.012745 2.007706

No newdata, weights

fit = pf.feols(
  fml = "l_homicide ~ 1 | state + year",
  data = data, 
  weights = "popwt"
)

fit.predict()[0:5]
#array([1.98343013, 2.0046152 , 1.99374426, 2.01362497, 1.98860927])
fit = feols(
  fml = l_homicide ~ 1 | state + year,
  data = py$data, 
  weights = ~popwt
)

predict(fit)[0:5]
# [1] 1.983430 2.004615 1.993744 2.013625 1.988609

newdata, no weights

fit = pf.feols(
  fml = "l_homicide ~ 1 | state + year",
  data = data, 
)

fit.predict(data.iloc[0:100])[0:5]
# array([1.98873732, 2.01639826, 1.94689312, 1.99400977, 2.0091239 ])
fit = feols(
  fml = l_homicide ~ 1 | state + year,
  data = py$data, 
)

predict(fit, newdata = py$data[1:100,])[0:5]
# [1] 1.988737 2.016398 1.946893 1.994010 2.009124

newdata, weights

fit = pf.feols(
  fml = "l_homicide ~ 1 | state + year",
  data = data, 
  weights = "popwt"
)

fit.predict(newdata = data.iloc[0:100])[0:5]
# array([1.98028402, 1.99191327, 1.9958799 , 2.02384829, 1.99859166])

fit = feols(
  fml = l_homicide ~ 1 | state + year,
  data = py$data, 
  weights = ~popwt
)

predict(fit, newdata = py$data[1:100,])[0:5]
# [1] 1.983430 2.004615 1.993744 2.013625 1.988609

@all-contributors please add @marcandre259 for bug

@s3alfisc
Copy link
Member Author

s3alfisc commented Oct 27, 2024

@all-contributors please add @marcandre259 for bug

Copy link
Contributor

@s3alfisc

I've put up a pull request to add @marcandre259! 🎉

@s3alfisc
Copy link
Member Author

fixest produces different values for sumFE than PyFixest:

fit = pf.feols(
  fml = "l_homicide ~ cdl | state + year",
  data = data, 
  weights = "popwt"
)

fit.fixef()
fit._sumFE[0:5]
# array([1.96134687, 1.984755  , 1.96358787, 2.00897643, 2.00393692])


fit = feols(
  fml = l_homicide ~ cdl | state + year,
  data = py$data, 
  weights = ~popwt
)

fit$sumFE[1:5]
# [1] 1.979555 2.000740 1.989869 2.009749 1.984734

In all likelihood, the bug hence stems from the fixef() method, which does not seem to take weights into account at all.

In particular, the least squares problem solved via lsqr does not deal with weights:

alpha = lsqr(D2, uhat, atol=atol, btol=btol)[0]

@s3alfisc s3alfisc changed the title Bug: Incorrect prediction with WLS and newdata not None Bug: Incorrect prediction with WLS anf fixed effects and newdata not None Oct 27, 2024
@s3alfisc
Copy link
Member Author

Hi @marcandre259 & also @leostimpfle , I found the bug:

For a regression without weights, we are fitting a regression model of the following form:

$$ [ Y = \begin{bmatrix} X & D \end{bmatrix} \begin{bmatrix} \beta \ \alpha \end{bmatrix}^{T} + \epsilon ] $$

After fitting $\hat{\beta}$ via FWL, we have $\hat{\beta}$.

We can then solve for $\hat{\alpha}$ via the a sparse solver in lsqr.

$$ [ D'D\hat{\alpha} = D'(Y - X\hat{\beta}). ] $$

If we have weights, we need to multiply D, X and Y with the standard $\sqrt(weights)$, which we simply did not do!

All of this (minus the part about weights) is nicely explained in the lfe vignette: link.

I'll prepare the PR for this now + add some tests.

@leostimpfle, this is also what we have to use for retrieving the alphas in the Poisson class (with the correct weights from the Poisson fit), as the strategy to fit GLMs in PyFixest is to implement an iterated weighted least squares algo (and therefore, the link function does not matter). For the same reason, Fepois can inherit the vcov class from Feols.

@s3alfisc s3alfisc linked a pull request Oct 28, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant