Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bootstrap Regression for Linear Regression #100

Closed
sourish-cmi opened this issue Feb 12, 2023 · 3 comments · Fixed by #106
Closed

Bootstrap Regression for Linear Regression #100

sourish-cmi opened this issue Feb 12, 2023 · 3 comments · Fixed by #106
Assignees
Labels
good first issue Good for newcomers

Comments

@sourish-cmi
Copy link
Collaborator

sourish-cmi commented Feb 12, 2023

Bootstrap Regression
Most of the time Normality assumption of residual needs to be corrected. As a result, though the model can have good predictive accuracy - we cannot use the t-value & p-value based analysis for statistical inference. Now under such circumstances, one can resort to a nonparametric bootstrap regression method for further statistical inference

Residual Bootstrap Regression
If the issue is only with distribution, then one can resort to residual bootstrap regression.

Paired Bootstrap Regression
However, if homoscedasticity is violated, one can resort to paired Bootstrap regression.

Possible code would look like

container=fit(@formula(y~x1+x2+x3),DataFrame,LinearRegression(),Bootstrap_Residual(),1000)
container=fit(@formula(y~x1+x2+x3),DataFrame,LinearRegression(),Bootstrap_Paired(),1000)

The 1000 is the Bootstrap simulation size

@sourish-cmi sourish-cmi added the good first issue Good for newcomers label Feb 12, 2023
@sourish-cmi sourish-cmi self-assigned this Feb 12, 2023
@sourish-cmi
Copy link
Collaborator Author

sourish-cmi commented Feb 23, 2023

@ShouvikGhosh2048, can you please add the following?

R code for the Residual Bootstrap

using CRRao, RDatasets, StatsModels
using Statistics, DataFrames, Random
struct Boot_Residual end
export Boot_Residual

function fit(
    formula::FormulaTerm,
    data::DataFrame,
    modelClass::LinearRegression,
    bootstrap::Boot_Residual,
    sim_size::Int64 = 1000)

    formula = apply_schema(formula, schema(formula, data),RegressionModel)
    y, X = modelcols(formula, data);

    model = lm(formula, data);
    res = coeftable(model);
    res = DataFrame(res);
    e = residuals(model);


    β_hat = coef(model);
    p = length(β_hat);
    n = nrow(X);

    ## This line is inefficient
    A = vcov(model)/dispersion(model,true);

    ## Once the Mousum's code is merged we will revert to following line
    # A = GLM.inv(model)

    # set the seed for reproducibility
    Random.seed!(123)

    β_star_matrix = zeros(sim_size,p);

    for b in 1:sim_size
        e_resample = rand(e, n);
        β_star = β_hat+A*X'e_resample;
        β_star_matrix[b,:] = β_star;
    end

    bootstrap_coef_table=zeros(p,4);
    bootstrap_coef_table[:,1]= mean(β_star_matrix, dims=1)
    bootstrap_coef_table[:,2]= std(β_star_matrix, dims=1)
    for j in 1:p
        bootstrap_coef_table[j,3]=quantile(β_star_matrix[:,j], 0.05 )
        bootstrap_coef_table[j,4]=quantile(β_star_matrix[:,j], 0.95 )
    end
    col_names = ["Coef", "Std Error", "Lower 5%","Upper 95%"]
    bootstrap_coeftable = DataFrame(bootstrap_coef_table,col_names);
    row_names = res[!,1];
    bootstrap_coeftable = insertcols!(bootstrap_coeftable, 1, :Predictor =>row_names);
    
    return bootstrap_coeftable

end

Test

using CRRao, RDatasets, StatsPlots, StatsModels
df = dataset("datasets", "mtcars");

container = fit(@formula(MPG ~ HP + WT + Gear), df, LinearRegression(),Boot_Residual())

## output
4×5 DataFrame
 Row │ Predictor    Coef        Std Error  Lower 5%    Upper 95%  
     │ String       Float64     Float64    Float64     Float64    
─────┼────────────────────────────────────────────────────────────
   1 │ (Intercept)  32.0831      4.57446   25.0411     39.8355
   2 │ HP           -0.0362751   0.009656  -0.0518685  -0.0207119
   3 │ WT           -3.29078     0.840215  -4.51627    -1.94081
   4 │ Gear          1.07868     0.911066  -0.391544    2.62009

@ShouvikGhosh2048
Copy link
Collaborator

What's e in the fit function?

@sourish-cmi
Copy link
Collaborator Author

What's e in the fit function?

Sorry correct the code above - pls check. I added the following line.

e = residuals(model);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants