Predict method for LightGBM model — predict.lgb.Booster • lightgbm

Predicted values based on class lgb.Booster

New in version 4.0.0

Usage

# S3 method for class 'lgb.Booster'
predict(
  object,
  newdata,
  type = "response",
  start_iteration = NULL,
  num_iteration = NULL,
  header = FALSE,
  params = list(),
  ...
)

Arguments

object

Object of class lgb.Booster

newdata

a matrix object, a dgCMatrix, a dgRMatrix object, a dsparseVector object, or a character representing a path to a text file (CSV, TSV, or LibSVM).

For sparse inputs, if predictions are only going to be made for a single row, it will be faster to use CSR format, in which case the data may be passed as either a single-row CSR matrix (class dgRMatrix from package Matrix) or as a sparse numeric vector (class dsparseVector from package Matrix).

If single-row predictions are going to be performed frequently, it is recommended to pre-configure the model object for fast single-row sparse predictions through function lgb.configure_fast_predict.

Changed from 'data', in version 4.0.0

type

Type of prediction to output. Allowed types are:

"response": will output the predicted score according to the objective function being optimized (depending on the link function that the objective uses), after applying any necessary transformations - for example, for objective="binary", it will output class probabilities.
"class": for classification objectives, will output the class with the highest predicted probability. For other objectives, will output the same as "response". Note that "class" is not a supported type for lgb.configure_fast_predict (see the documentation of that function for more details).
"raw": will output the non-transformed numbers (sum of predictions from boosting iterations' results) from which the "response" number is produced for a given objective function - for example, for objective="binary", this corresponds to log-odds. For many objectives such as "regression", since no transformation is applied, the output will be the same as for "response".
"leaf": will output the index of the terminal node / leaf at which each observations falls in each tree in the model, outputted as integers, with one column per tree.
"contrib": will return the per-feature contributions for each prediction, including an intercept (each feature will produce one column).

Note that, if using custom objectives, types "class" and "response" will not be available and will default towards using "raw" instead.

If the model was fit through function lightgbm and it was passed a factor as labels, passing the prediction type through params instead of through this argument might result in factor levels for classification objectives not being applied correctly to the resulting output.

New in version 4.0.0

start_iteration

int or None, optional (default=None) Start index of the iteration to predict. If None or <= 0, starts from the first iteration.

num_iteration

int or None, optional (default=None) Limit number of iterations in the prediction. If None, if the best iteration exists and start_iteration is None or <= 0, the best iteration is used; otherwise, all iterations from start_iteration are used. If <= 0, all iterations from start_iteration are used (no limits).

header

only used for prediction for text file. True if text file has header

params

a list of additional named parameters. See the "Predict Parameters" section of the documentation for a list of parameters and valid values. Where these conflict with the values of keyword arguments to this function, the values in params take precedence.

...

ignored

Value

For prediction types that are meant to always return one output per observation (e.g. when predicting type="response" or type="raw" on a binary classification or regression objective), will return a vector with one element per row in newdata.

For prediction types that are meant to return more than one output per observation (e.g. when predicting type="response" or type="raw" on a multi-class objective, or when predicting type="leaf", regardless of objective), will return a matrix with one row per observation in newdata and one column per output.

For type="leaf" predictions, will return a matrix with one row per observation in newdata and one column per tree. Note that for multiclass objectives, LightGBM trains one tree per class at each boosting iteration. That means that, for example, for a multiclass model with 3 classes, the leaf predictions for the first class can be found in columns 1, 4, 7, 10, etc.

For type="contrib", will return a matrix of SHAP values with one row per observation in newdata and columns corresponding to features. For regression, ranking, cross-entropy, and binary classification objectives, this matrix contains one column per feature plus a final column containing the Shapley base value. For multiclass objectives, this matrix will represent num_classes such matrices, in the order "feature contributions for first class, feature contributions for second class, feature contributions for third class, etc.".

If the model was fit through function lightgbm and it was passed a factor as labels, predictions returned from this function will retain the factor levels (either as values for type="class", or as column names for type="response" and type="raw" for multi-class objectives). Note that passing the requested prediction type under params instead of through type might result in the factor levels not being present in the output.

Details

If the model object has been configured for fast single-row predictions through lgb.configure_fast_predict, this function will use the prediction parameters that were configured for it - as such, extra prediction parameters should not be passed here, otherwise the configuration will be ignored and the slow route will be taken.

Examples

# \donttest{
data(agaricus.train, package = "lightgbm")
train <- agaricus.train
dtrain <- lgb.Dataset(train$data, label = train$label)
data(agaricus.test, package = "lightgbm")
test <- agaricus.test
dtest <- lgb.Dataset.create.valid(dtrain, test$data, label = test$label)
params <- list(
  objective = "regression"
  , metric = "l2"
  , min_data = 1L
  , learning_rate = 1.0
  , num_threads = 2L
)
valids <- list(test = dtest)
model <- lgb.train(
  params = params
  , data = dtrain
  , nrounds = 5L
  , valids = valids
)
#> [LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000441 seconds.
#> You can set `force_row_wise=true` to remove the overhead.
#> And if memory is not enough, you can set `force_col_wise=true`.
#> [LightGBM] [Info] Total Bins 232
#> [LightGBM] [Info] Number of data points in the train set: 6513, number of used features: 116
#> [LightGBM] [Info] Start training from score 0.482113
#> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf
#> [1]:  test's l2:6.44165e-17 
#> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf
#> [2]:  test's l2:1.97215e-31 
#> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf
#> [3]:  test's l2:0 
#> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf
#> [LightGBM] [Warning] Stopped training because there are no more leaves that meet the split requirements
#> [4]:  test's l2:0 
#> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf
#> [LightGBM] [Warning] Stopped training because there are no more leaves that meet the split requirements
#> [5]:  test's l2:0 
preds <- predict(model, test$data)

# pass other prediction parameters
preds <- predict(
    model,
    test$data,
    params = list(
        predict_disable_shape_check = TRUE
   )
)
# }