Predict method for LightGBM model — predict.lgb.Booster • lightgbm

Predicted values based on class lgb.Booster

# S3 method for lgb.Booster
predict(
  object,
  data,
  start_iteration = NULL,
  num_iteration = NULL,
  rawscore = FALSE,
  predleaf = FALSE,
  predcontrib = FALSE,
  header = FALSE,
  reshape = FALSE,
  params = list(),
  ...
)

Arguments

object	Object of class `lgb.Booster`
data	a `matrix` object, a `dgCMatrix` object or a character representing a path to a text file (CSV, TSV, or LibSVM)
start_iteration	int or None, optional (default=None) Start index of the iteration to predict. If None or <= 0, starts from the first iteration.
num_iteration	int or None, optional (default=None) Limit number of iterations in the prediction. If None, if the best iteration exists and start_iteration is None or <= 0, the best iteration is used; otherwise, all iterations from start_iteration are used. If <= 0, all iterations from start_iteration are used (no limits).
rawscore	whether the prediction should be returned in the for of original untransformed sum of predictions from boosting iterations' results. E.g., setting `rawscore=TRUE` for logistic regression would result in predictions for log-odds instead of probabilities.
predleaf	whether predict leaf index instead.
predcontrib	return per-feature contributions for each record.
header	only used for prediction for text file. True if text file has header
reshape	whether to reshape the vector of predictions to a matrix form when there are several prediction outputs per case.
params	a list of additional named parameters. See the "Predict Parameters" section of the documentation for a list of parameters and valid values.
...	Additional prediction parameters. NOTE: deprecated as of v3.3.0. Use `params` instead.

Value

For regression or binary classification, it returns a vector of length nrows(data). For multiclass classification, either a num_class * nrows(data) vector or a (nrows(data), num_class) dimension matrix is returned, depending on the reshape value.

When predleaf = TRUE, the output is a matrix object with the number of columns corresponding to the number of trees.

Examples

# \donttest{
data(agaricus.train, package = "lightgbm")
train <- agaricus.train
dtrain <- lgb.Dataset(train$data, label = train$label)
data(agaricus.test, package = "lightgbm")
test <- agaricus.test
dtest <- lgb.Dataset.create.valid(dtrain, test$data, label = test$label)
params <- list(
  objective = "regression"
  , metric = "l2"
  , min_data = 1L
  , learning_rate = 1.0
)
valids <- list(test = dtest)
model <- lgb.train(
  params = params
  , data = dtrain
  , nrounds = 5L
  , valids = valids
)
#> [LightGBM] [Warning] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000943 seconds.
#> You can set `force_row_wise=true` to remove the overhead.
#> And if memory is not enough, you can set `force_col_wise=true`.
#> [LightGBM] [Info] Total Bins 232
#> [LightGBM] [Info] Number of data points in the train set: 6513, number of used features: 116
#> [LightGBM] [Info] Start training from score 0.482113
#> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf
#> [1] "[1]:  test's l2:6.44165e-17"
#> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf
#> [1] "[2]:  test's l2:1.97215e-31"
#> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf
#> [1] "[3]:  test's l2:0"
#> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf
#> [LightGBM] [Warning] Stopped training because there are no more leaves that meet the split requirements
#> [1] "[4]:  test's l2:0"
#> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf
#> [LightGBM] [Warning] Stopped training because there are no more leaves that meet the split requirements
#> [1] "[5]:  test's l2:0"
preds <- predict(model, test$data)

# pass other prediction parameters
preds <- predict(
    model,
    test$data,
    params = list(
        predict_disable_shape_check = TRUE
   )
)
# }