-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bootstrap Regression for Linear Regression #100
Labels
good first issue
Good for newcomers
Comments
@ShouvikGhosh2048, can you please add the following? R code for the Residual Bootstrap using CRRao, RDatasets, StatsModels
using Statistics, DataFrames, Random
struct Boot_Residual end
export Boot_Residual
function fit(
formula::FormulaTerm,
data::DataFrame,
modelClass::LinearRegression,
bootstrap::Boot_Residual,
sim_size::Int64 = 1000)
formula = apply_schema(formula, schema(formula, data),RegressionModel)
y, X = modelcols(formula, data);
model = lm(formula, data);
res = coeftable(model);
res = DataFrame(res);
e = residuals(model);
β_hat = coef(model);
p = length(β_hat);
n = nrow(X);
## This line is inefficient
A = vcov(model)/dispersion(model,true);
## Once the Mousum's code is merged we will revert to following line
# A = GLM.inv(model)
# set the seed for reproducibility
Random.seed!(123)
β_star_matrix = zeros(sim_size,p);
for b in 1:sim_size
e_resample = rand(e, n);
β_star = β_hat+A*X'e_resample;
β_star_matrix[b,:] = β_star;
end
bootstrap_coef_table=zeros(p,4);
bootstrap_coef_table[:,1]= mean(β_star_matrix, dims=1)
bootstrap_coef_table[:,2]= std(β_star_matrix, dims=1)
for j in 1:p
bootstrap_coef_table[j,3]=quantile(β_star_matrix[:,j], 0.05 )
bootstrap_coef_table[j,4]=quantile(β_star_matrix[:,j], 0.95 )
end
col_names = ["Coef", "Std Error", "Lower 5%","Upper 95%"]
bootstrap_coeftable = DataFrame(bootstrap_coef_table,col_names);
row_names = res[!,1];
bootstrap_coeftable = insertcols!(bootstrap_coeftable, 1, :Predictor =>row_names);
return bootstrap_coeftable
end Test using CRRao, RDatasets, StatsPlots, StatsModels
df = dataset("datasets", "mtcars");
container = fit(@formula(MPG ~ HP + WT + Gear), df, LinearRegression(),Boot_Residual())
## output
4×5 DataFrame
Row │ Predictor Coef Std Error Lower 5% Upper 95%
│ String Float64 Float64 Float64 Float64
─────┼────────────────────────────────────────────────────────────
1 │ (Intercept) 32.0831 4.57446 25.0411 39.8355
2 │ HP -0.0362751 0.009656 -0.0518685 -0.0207119
3 │ WT -3.29078 0.840215 -4.51627 -1.94081
4 │ Gear 1.07868 0.911066 -0.391544 2.62009
|
What's |
Sorry correct the code above - pls check. I added the following line. e = residuals(model); |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Bootstrap Regression
Most of the time Normality assumption of residual needs to be corrected. As a result, though the model can have good predictive accuracy - we cannot use the
t-value
&p-value
based analysis for statistical inference. Now under such circumstances, one can resort to a nonparametric bootstrap regression method for further statistical inferenceResidual Bootstrap Regression
If the issue is only with distribution, then one can resort to residual bootstrap regression.
Paired Bootstrap Regression
However, if homoscedasticity is violated, one can resort to paired Bootstrap regression.
Possible code would look like
The
1000
is the Bootstrap simulation sizeThe text was updated successfully, but these errors were encountered: