Bug: Incorrect prediction with WLS anf fixed effects and newdata not None #678

s3alfisc · 2024-10-27T16:16:38Z

For WLS estimation with newdata argument provided, the output of predict is incorrect.

For all other cases, it matched fixest.

See examples below:

No newdata, no weights

import pyfixest as pf
import pandas as pd
pd.options.display.precision = 7

df_castle = pd.read_stata("https://github.com/scunning1975/mixtape/raw/master/castle.dta")
df_castle = df_castle.astype({"post": bool})

data = df_castle.loc[df_castle["post"] == 0, :]
vcov = "hetero"

fit = pf.feols(
  fml = "l_homicide ~ 1 | state + year",
  data = data, 
)

fit.predict()[0:5]
# array([1.96511559, 1.98852372, 1.96735659, 2.01274515, 2.00770564])

library(fixest)
library(reticulate)

fit = feols(
  fml = l_homicide ~ 1 | state + year,
  data = py$data, 
)

predict(fit)[0:5]
# [1] 1.965116 1.988524 1.967357 2.012745 2.007706

No newdata, weights

fit = pf.feols(
  fml = "l_homicide ~ 1 | state + year",
  data = data, 
  weights = "popwt"
)

fit.predict()[0:5]
#array([1.98343013, 2.0046152 , 1.99374426, 2.01362497, 1.98860927])

fit = feols(
  fml = l_homicide ~ 1 | state + year,
  data = py$data, 
  weights = ~popwt
)

predict(fit)[0:5]
# [1] 1.983430 2.004615 1.993744 2.013625 1.988609

newdata, no weights

fit = pf.feols(
  fml = "l_homicide ~ 1 | state + year",
  data = data, 
)

fit.predict(data.iloc[0:100])[0:5]
# array([1.98873732, 2.01639826, 1.94689312, 1.99400977, 2.0091239 ])

fit = feols(
  fml = l_homicide ~ 1 | state + year,
  data = py$data, 
)

predict(fit, newdata = py$data[1:100,])[0:5]
# [1] 1.988737 2.016398 1.946893 1.994010 2.009124

newdata, weights

fit = pf.feols(
  fml = "l_homicide ~ 1 | state + year",
  data = data, 
  weights = "popwt"
)

fit.predict(newdata = data.iloc[0:100])[0:5]
# array([1.98028402, 1.99191327, 1.9958799 , 2.02384829, 1.99859166])

fit = feols(
  fml = l_homicide ~ 1 | state + year,
  data = py$data, 
  weights = ~popwt
)

predict(fit, newdata = py$data[1:100,])[0:5]
# [1] 1.983430 2.004615 1.993744 2.013625 1.988609

@all-contributors please add @marcandre259 for bug

The text was updated successfully, but these errors were encountered:

s3alfisc · 2024-10-27T16:20:09Z

@all-contributors please add @marcandre259 for bug

allcontributors · 2024-10-27T16:20:18Z

@s3alfisc

I've put up a pull request to add @marcandre259! 🎉

s3alfisc · 2024-10-27T16:56:20Z

fixest produces different values for sumFE than PyFixest:

fit = pf.feols(
  fml = "l_homicide ~ cdl | state + year",
  data = data, 
  weights = "popwt"
)

fit.fixef()
fit._sumFE[0:5]
# array([1.96134687, 1.984755  , 1.96358787, 2.00897643, 2.00393692])

fit = feols(
  fml = l_homicide ~ cdl | state + year,
  data = py$data, 
  weights = ~popwt
)

fit$sumFE[1:5]
# [1] 1.979555 2.000740 1.989869 2.009749 1.984734

In all likelihood, the bug hence stems from the fixef() method, which does not seem to take weights into account at all.

In particular, the least squares problem solved via lsqr does not deal with weights:

pyfixest/pyfixest/estimation/feols_.py

Line 1535 in 5250fc3

alpha = lsqr(D2, uhat, atol=atol, btol=btol)[0]

s3alfisc · 2024-10-28T19:47:52Z

Hi @marcandre259 & also @leostimpfle , I found the bug:

For a regression without weights, we are fitting a regression model of the following form:

$$ [ Y = \begin{bmatrix} X & D \end{bmatrix} \begin{bmatrix} \beta \ \alpha \end{bmatrix}^{T} + \epsilon ] $$

After fitting $\hat{\beta}$ via FWL, we have $\hat{\beta}$.

We can then solve for $\hat{\alpha}$ via the a sparse solver in lsqr.

$$ [ D'D\hat{\alpha} = D'(Y - X\hat{\beta}). ] $$

If we have weights, we need to multiply D, X and Y with the standard $\sqrt(weights)$, which we simply did not do!

All of this (minus the part about weights) is nicely explained in the lfe vignette: link.

I'll prepare the PR for this now + add some tests.

@leostimpfle, this is also what we have to use for retrieving the alphas in the Poisson class (with the correct weights from the Poisson fit), as the strategy to fit GLMs in PyFixest is to implement an iterated weighted least squares algo (and therefore, the link function does not matter). For the same reason, Fepois can inherit the vcov class from Feols.

s3alfisc added the bug Something isn't working label Oct 27, 2024

s3alfisc mentioned this issue Oct 27, 2024

Support for weights as an optional parameter for did2s? #557

Closed

allcontributors bot mentioned this issue Oct 27, 2024

docs: add marcandre259 as a contributor for bug #679

Merged

s3alfisc changed the title ~~Bug: Incorrect prediction with WLS and newdata not None~~ Bug: Incorrect prediction with WLS anf fixed effects and newdata not None Oct 27, 2024

s3alfisc mentioned this issue Oct 28, 2024

Bugfix in fixef() method for WLS #682

Merged

s3alfisc linked a pull request Oct 28, 2024 that will close this issue

Bugfix in fixef() method for WLS #682

Merged

s3alfisc closed this as completed in #682 Oct 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: Incorrect prediction with WLS anf fixed effects and newdata not None #678

Bug: Incorrect prediction with WLS anf fixed effects and newdata not None #678

s3alfisc commented Oct 27, 2024

s3alfisc commented Oct 27, 2024 •

edited

Loading

allcontributors bot commented Oct 27, 2024

s3alfisc commented Oct 27, 2024

s3alfisc commented Oct 28, 2024

Bug: Incorrect prediction with WLS anf fixed effects and newdata not None #678

Bug: Incorrect prediction with WLS anf fixed effects and newdata not None #678

Comments

s3alfisc commented Oct 27, 2024

No newdata, no weights

No newdata, weights

newdata, no weights

newdata, weights

s3alfisc commented Oct 27, 2024 • edited Loading

allcontributors bot commented Oct 27, 2024

s3alfisc commented Oct 27, 2024

s3alfisc commented Oct 28, 2024

s3alfisc commented Oct 27, 2024 •

edited

Loading