Adding Standard Error of Prediction #566

greenguy33 · 2024-07-25T04:06:54Z

Pyfixest currently has predict() function but there is not currently functionality to provide estimates of the variance of the predictions. (Some) other tools do include this, such as:

Statsmodels. The get_prediction() function returns a PredictionResultsobject that includes an estimate of the variance of the prediction as the attribute var_pred_mean.
Stata. Includes stdp option for generating standard error of the prediction (see bottom of Pg. 1899 of this link).

This functionality would be useful for users who want to compute prediction intervals to estimate uncertainty in the prediction. If folks are interested in this feature, I am happy to work on adding this to Pyfixest. It would require calculating the full covariance matrix including all fixed effects. Then the variance of each prediction is simply X * M * X' where X is a row of covariate data to predict on and M is the full covariance matrix.

The text was updated successfully, but these errors were encountered:

s3alfisc · 2024-07-25T22:14:01Z

Hy Hayden, I think this would be an awesome addition to pyfixest!

I think the best place to add it would be via the 'predict' method? We could add an argument, eg named "ci", that is by default False but if set to True, returns the prediction interval with the predicted values. What do you think?

I agree that the key step will be to compute the full vcov matrix, including all estimated fixed effects. Do you already have an idea how to best built this based on the existing code base? I've thought about it for the last few minutes and am not sure if there is a "quick win". But I'll have to think about this more in the next days. Maybe one option would be to build a first MVP implementation that only works without fixed effects?

Anyways, I'd be super happy to have this feature! Cc'ing @b-knight as this might be of interest for him as well =)

greenguy33 · 2024-07-25T22:33:19Z

Hey Alex,

Glad you think this would be a worthwhile addition!

I agree with you that the calculation of the full covariance matrix is the challenging aspect of this task. I don't have an idea how to do it right now - but I view it as an opportunity to learn more about how covariance matrices work. The nice thing is that we can compare results to the existing packages that already offer this functionality to validate/double check.

I guess there are two ways to approach the task:

start with the "easy win" of computing the prediction error without fixed effects, and expand later to include the full covariance matrix
tackle the difficult part first and roll out the full covariance matrix before adding the standard error option to the predict function

In general, my software engineer instincts tell me to get the hard part done first, but I am ok with either approach here and will cede to your opinion as to which order you prefer to do things.

Also, as a quick aside, the functionality I am proposing would not actually calculate CI's on the prediction, it would calculate the standard error of the prediction from which intervals could be easily computed. If you like, we can also add functionality to add CI's (once we have the prediction error it is a simple arithmetic) but I haven't seen that other packages actually offer this last step built in. I usually just do it manually once I have the prediction error.

greenguy33 · 2024-07-25T23:30:01Z

There is also a related issue in the Fixest github which may have some pertinent discussion.
lrberge/fixest#22

s3alfisc · 2024-07-29T20:34:00Z

Also, as a quick aside, the functionality I am proposing would not actually calculate CI's on the prediction, it would calculate the standard error of the prediction from which intervals could be easily computed.

Sounds good to start with SEs, but I think support for CIs would be cool as well! In general, I'd be happy with either approach 1. and 2. I always try to go for the quick wins first, as I am motivated by getting stuff done / building new features😄

Have you thought about the design? How do you think about the strategy of adding a inference boolean to predict?

I think the method could look smth along these lines:

def predict(..., add_inference):
   ...
   # after computing y_hat, at the very end 
   if add_inference: 

    # store model objects that will be overwritten
    old_X = self._X.copy()
    old_Y = self._Y.copy()
    # etc

    self._u_hat = y_hat - self._Y 
    # update all other methods required for updating via the vcov method
    # note: need un-demeaned Y and X, but stored self._Y and self._X are demeaned
    # maybe add option to feols "add_undemeaned_Y", "add_undemeaned_X" that 
    # if True, store self._X_undemeaned and self._Y_undemeaned?
            
    self._X = [X, D] # make this sparse
    self._Y = Y
    self._tXX = self._X.T @ self._X
    #self._scores = ...

    vcov = self.vcov()

    # compute se, ci of prediction interval, return pd.DataFrame with columns: y_hat, se, ci_lower, ci_upper

    # restore old model objects
    self._X = old_X
    self._Y = old_Y
    # etc

What do you think @greenguy33 ?

s3alfisc · 2024-07-29T20:47:13Z

Oh and I just went through the very detailed fixest thread :D Are you using fixest & prediction intervals / errors in a similar way for your own research?

Fwiw, reading the article, I learned that fixest seems to implement prediction standard errors and intervals only when no fixed effects are included in the model:

library("fixest")
data(mtcars)
head(mtcars)

fit = feols(mpg ~ wt   + as.factor(disp), data = mtcars)
predict(fit, se.fit = TRUE) |> head()
# fit   se.fit
# Mazda RX4         21.36122 1.397038
# Mazda RX4 Wag     20.63878 1.397038
# Datsun 710        22.80000 1.732790
# Hornet 4 Drive    21.40000 1.732790
# Hornet Sportabout 16.68415 1.272143
# Valiant           18.10000 1.732790

fit = feols(mpg ~ wt   | disp, data = mtcars)
predict(fit, se.fit = TRUE) |> head()
# Error in predict.fixest(fit, se.fit = TRUE) : 
#   The standard-errors (SEs) of the prediction cannot be 
# computed in the presence of fixed-effects. To obtain the
# SEs, you would need to include the FEs as standard factors
# in the model.

greenguy33 · 2024-07-30T16:18:10Z

Hey Alex,
I'm actually using Statsmodels get_prediction() function to compute the prediction error in my current research. I'm not much of an R wonk and usually gravitate towards Python solutions.
I'm fine with following your instinct to implement this feature without fixed effects to get a quick win to start, and work on the harder part later. Maybe we will learn something in the process that will be informative for the subsequent task. I'll try to get started this week when I have some free time!
-Hayden

kylebutts · 2024-11-16T14:30:07Z

Just for some context, standard errors on fitted values requires you to know the entire covariance matrix of coefficients (including fixed-effects). You don't need them for inference on individual coefficients (from FWL theorem), but you do for the fitted values since they are function of the full vector of coefficients.

Derivation:

Fitted value at $x$:

$$ \begin{aligned} \hat{y} - x \beta_0 &= x \hat{\beta} - x \beta_0 \\ &= x (X'X)^{-1} X' y - x \beta_0 \\ &= x (X'X)^{-1} X' u \end{aligned} $$

So that the VCOV of the fitted value is

$$ \begin{aligned} x (X'X)^{-1} X' u u' X (X'X)^{-1} x', \end{aligned} $$

where the inner part is the full VCOV of the regression coefficients

s3alfisc · 2024-11-17T11:20:19Z

Thanks @kylebutts! We discussed this a bit and though it would likely be a good strategy to implement prediction intervals for models without fixed effects first (which we try to do in this PR), and then to tackle a more fundamental refactor of pyfixest's internals to support fixed effects as well. I think to get the full vcov matrix, we might have to update all vcov methods for Feols, which might turn out quite involved 😅

s3alfisc assigned greenguy33 Jul 30, 2024

s3alfisc added the enhancement New feature or request label Jul 30, 2024

s3alfisc added this to Feols Enhancements Jul 30, 2024

greenguy33 mentioned this issue Aug 1, 2024

Prediction error for non-fe model #573

Open

s3alfisc mentioned this issue Oct 26, 2024

Pyfixest testing gabors-data-analysis/da_case_studies#115

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding Standard Error of Prediction #566

Adding Standard Error of Prediction #566

greenguy33 commented Jul 25, 2024

s3alfisc commented Jul 25, 2024 •

edited

Loading

greenguy33 commented Jul 25, 2024

greenguy33 commented Jul 25, 2024

s3alfisc commented Jul 29, 2024 •

edited

Loading

s3alfisc commented Jul 29, 2024

greenguy33 commented Jul 30, 2024 •

edited

Loading

kylebutts commented Nov 16, 2024

s3alfisc commented Nov 17, 2024

Adding Standard Error of Prediction #566

Adding Standard Error of Prediction #566

Comments

greenguy33 commented Jul 25, 2024

s3alfisc commented Jul 25, 2024 • edited Loading

greenguy33 commented Jul 25, 2024

greenguy33 commented Jul 25, 2024

s3alfisc commented Jul 29, 2024 • edited Loading

s3alfisc commented Jul 29, 2024

greenguy33 commented Jul 30, 2024 • edited Loading

kylebutts commented Nov 16, 2024

s3alfisc commented Nov 17, 2024

s3alfisc commented Jul 25, 2024 •

edited

Loading

s3alfisc commented Jul 29, 2024 •

edited

Loading

greenguy33 commented Jul 30, 2024 •

edited

Loading