Welcome to the world of LightGBM, a highly efficient gradient boosting implementation (Ke et al. 2017).
This vignette will guide you through its basic usage. It will show
how to build a simple binary classification model based on a subset of
bank dataset (Moro, Cortez, and Rita 2014). You will
use the two input features “age” and “balance” to predict whether a
client has subscribed a term deposit.
The dataset looks as follows.
The R package of LightGBM offers two functions to train a model:
lgb.train(): This is the main training logic. It offers full flexibility but requires a
Datasetobject created by the
lightgbm(): Simpler, but less flexible. Data can be passed without having to bother with
In a first step, you need to convert data to numeric. Afterwards, you
are ready to fit the model by the
# Numeric response and feature matrix y <- as.numeric(bank$y == "yes") X <- data.matrix(bank[, c("age", "balance")]) # Train fit <- lightgbm( data = X , label = y , params = list( num_leaves = 4L , learning_rate = 1.0 , objective = "binary" ) , nrounds = 10L , verbose = -1L ) # Result summary(predict(fit, X)) #> Min. 1st Qu. Median Mean 3rd Qu. Max. #> 0.01192 0.07370 0.09871 0.11593 0.14135 0.65796
It seems to have worked! And the predictions are indeed probabilities between 0 and 1.
Alternatively, you can go for the more flexible interface
lgb.train(). Here, as an additional step, you need to
X by the data API
lgb.Dataset() of LightGBM. Parameters are passed to
lgb.train() as a named list.
# Data interface dtrain <- lgb.Dataset(X, label = y) # Parameters params <- list( objective = "binary" , num_leaves = 4L , learning_rate = 1.0 ) # Train fit <- lgb.train( params , data = dtrain , nrounds = 10L , verbose = -1L )
Try it out! If stuck, visit LightGBM’s documentation for more details.
Ke, Guolin, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. “LightGBM: A Highly Efficient Gradient Boosting Decision Tree.” In Advances in Neural Information Processing Systems 30 (NIPS 2017).
Moro, Sérgio, Paulo Cortez, and Paulo Rita. 2014. “A Data-Driven Approach to Predict the Success of Bank Telemarketing.” Decision Support Systems 62: 22–31.