Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Standard Error of Prediction #566

Open
greenguy33 opened this issue Jul 25, 2024 · 8 comments
Open

Adding Standard Error of Prediction #566

greenguy33 opened this issue Jul 25, 2024 · 8 comments
Assignees
Labels
enhancement New feature or request

Comments

@greenguy33
Copy link
Contributor

Pyfixest currently has predict() function but there is not currently functionality to provide estimates of the variance of the predictions. (Some) other tools do include this, such as:

This functionality would be useful for users who want to compute prediction intervals to estimate uncertainty in the prediction. If folks are interested in this feature, I am happy to work on adding this to Pyfixest. It would require calculating the full covariance matrix including all fixed effects. Then the variance of each prediction is simply X * M * X' where X is a row of covariate data to predict on and M is the full covariance matrix.

@s3alfisc
Copy link
Member

s3alfisc commented Jul 25, 2024

Hy Hayden, I think this would be an awesome addition to pyfixest!

I think the best place to add it would be via the 'predict' method? We could add an argument, eg named "ci", that is by default False but if set to True, returns the prediction interval with the predicted values. What do you think?

I agree that the key step will be to compute the full vcov matrix, including all estimated fixed effects. Do you already have an idea how to best built this based on the existing code base? I've thought about it for the last few minutes and am not sure if there is a "quick win". But I'll have to think about this more in the next days. Maybe one option would be to build a first MVP implementation that only works without fixed effects?

Anyways, I'd be super happy to have this feature! Cc'ing @b-knight as this might be of interest for him as well =)

@greenguy33
Copy link
Contributor Author

Hey Alex,

Glad you think this would be a worthwhile addition!

I agree with you that the calculation of the full covariance matrix is the challenging aspect of this task. I don't have an idea how to do it right now - but I view it as an opportunity to learn more about how covariance matrices work. The nice thing is that we can compare results to the existing packages that already offer this functionality to validate/double check.

I guess there are two ways to approach the task:

  1. start with the "easy win" of computing the prediction error without fixed effects, and expand later to include the full covariance matrix
  2. tackle the difficult part first and roll out the full covariance matrix before adding the standard error option to the predict function

In general, my software engineer instincts tell me to get the hard part done first, but I am ok with either approach here and will cede to your opinion as to which order you prefer to do things.

Also, as a quick aside, the functionality I am proposing would not actually calculate CI's on the prediction, it would calculate the standard error of the prediction from which intervals could be easily computed. If you like, we can also add functionality to add CI's (once we have the prediction error it is a simple arithmetic) but I haven't seen that other packages actually offer this last step built in. I usually just do it manually once I have the prediction error.

@greenguy33
Copy link
Contributor Author

There is also a related issue in the Fixest github which may have some pertinent discussion.
lrberge/fixest#22

@s3alfisc
Copy link
Member

s3alfisc commented Jul 29, 2024

Also, as a quick aside, the functionality I am proposing would not actually calculate CI's on the prediction, it would calculate the standard error of the prediction from which intervals could be easily computed.

Sounds good to start with SEs, but I think support for CIs would be cool as well! In general, I'd be happy with either approach 1. and 2. I always try to go for the quick wins first, as I am motivated by getting stuff done / building new features😄

Have you thought about the design? How do you think about the strategy of adding a inference boolean to predict?

I think the method could look smth along these lines:

def predict(..., add_inference):
   ...
   # after computing y_hat, at the very end 
   if add_inference: 

    # store model objects that will be overwritten
    old_X = self._X.copy()
    old_Y = self._Y.copy()
    # etc

    self._u_hat = y_hat - self._Y 
    # update all other methods required for updating via the vcov method
    # note: need un-demeaned Y and X, but stored self._Y and self._X are demeaned
    # maybe add option to feols "add_undemeaned_Y", "add_undemeaned_X" that 
    # if True, store self._X_undemeaned and self._Y_undemeaned?
            
    self._X = [X, D] # make this sparse
    self._Y = Y
    self._tXX = self._X.T @ self._X
    #self._scores = ...

    vcov = self.vcov()

    # compute se, ci of prediction interval, return pd.DataFrame with columns: y_hat, se, ci_lower, ci_upper

    # restore old model objects
    self._X = old_X
    self._Y = old_Y
    # etc

What do you think @greenguy33 ?

@s3alfisc
Copy link
Member

Oh and I just went through the very detailed fixest thread :D Are you using fixest & prediction intervals / errors in a similar way for your own research?

Fwiw, reading the article, I learned that fixest seems to implement prediction standard errors and intervals only when no fixed effects are included in the model:

library("fixest")
data(mtcars)
head(mtcars)

fit = feols(mpg ~ wt   + as.factor(disp), data = mtcars)
predict(fit, se.fit = TRUE) |> head()
# fit   se.fit
# Mazda RX4         21.36122 1.397038
# Mazda RX4 Wag     20.63878 1.397038
# Datsun 710        22.80000 1.732790
# Hornet 4 Drive    21.40000 1.732790
# Hornet Sportabout 16.68415 1.272143
# Valiant           18.10000 1.732790

fit = feols(mpg ~ wt   | disp, data = mtcars)
predict(fit, se.fit = TRUE) |> head()
# Error in predict.fixest(fit, se.fit = TRUE) : 
#   The standard-errors (SEs) of the prediction cannot be 
# computed in the presence of fixed-effects. To obtain the
# SEs, you would need to include the FEs as standard factors
# in the model.

@greenguy33
Copy link
Contributor Author

greenguy33 commented Jul 30, 2024

Hey Alex,
I'm actually using Statsmodels get_prediction() function to compute the prediction error in my current research. I'm not much of an R wonk and usually gravitate towards Python solutions.
I'm fine with following your instinct to implement this feature without fixed effects to get a quick win to start, and work on the harder part later. Maybe we will learn something in the process that will be informative for the subsequent task. I'll try to get started this week when I have some free time!
-Hayden

@kylebutts
Copy link

Just for some context, standard errors on fitted values requires you to know the entire covariance matrix of coefficients (including fixed-effects). You don't need them for inference on individual coefficients (from FWL theorem), but you do for the fitted values since they are function of the full vector of coefficients.

Derivation:

Fitted value at $x$:

$$ \begin{aligned} \hat{y} - x \beta_0 &= x \hat{\beta} - x \beta_0 \\ &= x (X'X)^{-1} X' y - x \beta_0 \\ &= x (X'X)^{-1} X' u \end{aligned} $$

So that the VCOV of the fitted value is

$$ \begin{aligned} x (X'X)^{-1} X' u u' X (X'X)^{-1} x', \end{aligned} $$

where the inner part is the full VCOV of the regression coefficients

@s3alfisc
Copy link
Member

Thanks @kylebutts! We discussed this a bit and though it would likely be a good strategy to implement prediction intervals for models without fixed effects first (which we try to do in this PR), and then to tackle a more fundamental refactor of pyfixest's internals to support fixed effects as well. I think to get the full vcov matrix, we might have to update all vcov methods for Feols, which might turn out quite involved 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: No status
Development

No branches or pull requests

3 participants