eXtreme Gradient Boosting Package in Julia
This package is a Julia interface of XGBoost, which is short for eXtreme Gradient Boosting. It is an efficient and scalable implementation of gradient boosting framework. The package includes efficient linear model solver and tree learning algorithms. The library is parallelized using OpenMP, and it can be more than 10 times faster than some existing gradient boosting packages. It supports various objective functions, including regression, classification and ranking. The package is also made to be extensible, so that users are also allowed to define their own objectives easily.
- Sparse feature format, it allows easy handling of missing values, and improve computation efficiency.
- Advanced features, such as customized loss function, cross validation, see demo folder for walkthrough examples.
] add XGBoost
or
] develop "https://github.com/dmlc/XGBoost.jl.git"
] build XGBoost
By default, the package builds the latest stable version of the XGBoost library. To build the
latest master, set the environment variable XGBOOST_BUILD_VERSION to "master" prior to installing
or building the package (e.g. ENV["XGBOOST_BUILD_VERSION"] = "master"
).
To show how XGBoost works, here is an example of dataset Mushroom
- Prepare Data
XGBoost support Julia Array
, SparseMatrixCSC
, libSVM format text and XGBoost binary
file as input. Here is an example of Mushroom classification. This example will use the function
readlibsvm
in basic_walkthrough.jl. This function load libsvm
format text into Julia dense matrix.
using XGBoost
train_X, train_Y = readlibsvm("data/agaricus.txt.train", (6513, 126))
test_X, test_Y = readlibsvm("data/agaricus.txt.test", (1611, 126))
- Fit Model
num_round = 2
bst = xgboost(train_X, num_round, label = train_Y, eta = 1, max_depth = 2)
pred = predict(bst, test_X)
print("test-error=", sum((pred .> 0.5) .!= test_Y) / float(size(pred)[1]), "\n")
nfold = 5
param = ["max_depth" => 2,
"eta" => 1,
"objective" => "binary:logistic"]
metrics = ["auc"]
nfold_cv(train_X, num_round, nfold, label = train_Y, param = param, metrics = metrics)
Check demo
- Basic walkthrough of features
- Customize loss function, and evaluation metric
- Boosting from existing prediction
- Predicting using first n trees
- Generalized Linear Model
- Cross validation
Check XGBoost Wiki