MLLinearModels is a small library tht provides functionality to train and use linear regression models, such as "Ordinary Least Squares", "Ridge", "Lasso", "Elastic Net". This library consists of 4 main classes:
RidgeModel
represents "Ridge" and "Ordinary least squares" models.RidgeModelCV
represents a cross-validation procedure for "Ridge"/"OLS" models.ElasticNetModel
represents "Lasso" and "Elastic Net" models.ElasticNetModelCV
represents a cross-validation procedure for "Lasso" and "Elastic Net" models.
Both "RidgeModel" and "ElasticNetModel" provide:
fit:X to: y checkInput: check
message to train model, that accepts X as PMMatrix class and y as PMVector, checkInput specifies whether data given should be preprocessed.predict: X
- returns a vector of predictions for matrix rowsscore: X output: y
- evaluates R2 coeficient error of prediction, if y is a vector of true values
In order to use this library, Polymath project is required to install https://github.com/PolyMathOrg/PolyMath.
In addition, DataFrame library is highly suggested (though not necessary) to manipulate data https://github.com/PolyMathOrg/DataFrame.
Afterwards library can simply be loaded from git repository using iceberg.
We will load housing data and split it into train and test sets
df := DataFrame loadHousing.
df addColumn: ((1 to: df size) collect:[:i | 100 random > 85]) named: #isTest.
trainX := (df selectAllWhere: [:isTest | isTest not ]) columnsFrom: 1 to: 3.
trainY := (df selectAllWhere: [:isTest | isTest not ]) columnAt: 4.
testX := (df selectAllWhere: [:isTest | isTest ]) columnsFrom: 1 to: 3.
testY := (df selectAllWhere: [:isTest | isTest ]) columnAt: 4.
In order, to interact with library though, we need to conver the dataframe data into PMMatrix class from Polymath.
trainXMatrix := PMMatrix rows: trainX asArrayOfRows .
trainYVec := trainY asPMVector .
testXMatrix := PMMatrix rows: testX asArrayOfRows .
testYVec := testY asPMVector.
olsModel :=
RidgeModel new alpha: 0;
shouldCenter: true;
shouldNormalize: true.
olsModel fit: trainXMatrix to: trainYVec checkInput: true.
r2coeficient = olsModel score: testXMatrix output: testYVec.
mseError = (((olsModel predict: testXMatrix) - testYVec) inject: 0 into: [ :a :b | a + b squared ]) / tY size.
tol - paramater that specifies accuracy of the solution
lasso :=
ElasticNetModel new
shouldCenter: true;
shouldNormalize: true;
l1Ratio: 1;
alpha: 6.36;
tol: 1e-3.
lasso fit: trainXMatrix to: trainYVec checkInput: true.
lasso score: testXMatrix output: testYVec.
This class requires to pass and array of alpha values to choose from.
nFolds - the number of groups to perform more efficient k-cross validation.
if nFolds = nill or: nFolds = 1 - efficient leave-one-out cross validation is performed.
As a result of training this model will contain:
- model property - which will contain the best estimated ridge model;
- mses property - evaluated MSE for each alpha
- minAlpha - the best alpha
- minMse - the smallest error that corresponds to minAlpha
ridgeCV := RidgeCVModel new
shouldCenter: true;
shouldNormalize: true;
alphas: {1e-3 . 5e-3 . 1e-2 . 3e-2 . 5e-2 . 7e-2 . 1e-1 . 3e-1 . 5e-1. 1 . 5 . 10 . 20}.
ridgeCV fit: trainXMatrix to: trainYVec checkInput: true.
ridgeCV model score: testXMatrix output: testYVec.
This class requires to pass and array of l1Ration values to choose from.
If an array of alphas is not passed, they will be autogenerated (though generated grid does not work too well when l1Ratio is small).
In that case, epsilon specifies the difference between max and min alpha generated for l1Ration.
nAlphas - number of alphas in range(minAlpha, maxAlpha).
nFolds - the number of groups to perform more efficient k-cross validation.
- model property - which will contain the best estimated ridge model;
- mses property - evaluated MSE fr l1Ration/alpha grid
- minAlpha - the best alpha
- minL1Ratio - the best l1Ratio
- minMse - the smalles error that corresponds to minAlpha
elasticNetCV:= ElasticNetCVModel new
shouldCenter: true;
shouldNormalize: true;
l1Ratios: { 0.1 . 0.2 . 0.3 . 0.4 . 0.5 . 0.6 .0.7 . 0.8. 0.9 . 0.99 . 1} ;
alphas: {1e-3 . 5e-3 . 1e-2 . 3e-2 . 5e-2 . 7e-2 . 1e-1 . 3e-1 . 5e-1. 1 . 5 . 10 . 20};
nFolds: 10.
elasticNetCVAutoAlpha:= ElasticNetCVModel new
shouldCenter: true;
shouldNormalize: true;
l1Ratios: { 0.1 . 0.2 . 0.3 . 0.4 . 0.5 . 0.6 .0.7 . 0.8. 0.9 . 0.99 . 1} ;
nAlphas: 100;
epsilon: 1e-3.
nFolds: 10.
elasticNetCV fit: trainXMatrix to: trainYVec checkInput: true.
elasticNetCV model score: testXMatrix output: testYVec.