Logic to train with LightGBM

lgb.train(
  params = list(),
  data,
  nrounds = 100L,
  valids = list(),
  obj = NULL,
  eval = NULL,
  verbose = 1L,
  record = TRUE,
  eval_freq = 1L,
  init_model = NULL,
  colnames = NULL,
  categorical_feature = NULL,
  early_stopping_rounds = NULL,
  callbacks = list(),
  reset_data = FALSE,
  ...
)

Arguments

params

a list of parameters. See the "Parameters" section of the documentation for a list of parameters and valid values.

data

a lgb.Dataset object, used for training. Some functions, such as lgb.cv, may allow you to pass other types of data like matrix and then separately supply label as a keyword argument.

nrounds

number of training rounds

valids

a list of lgb.Dataset objects, used for validation

obj

objective function, can be character or custom objective function. Examples include regression, regression_l1, huber, binary, lambdarank, multiclass, multiclass

eval

evaluation function(s). This can be a character vector, function, or list with a mixture of strings and functions.

  • a. character vector: If you provide a character vector to this argument, it should contain strings with valid evaluation metrics. See The "metric" section of the documentation for a list of valid metrics.

  • b. function: You can provide a custom evaluation function. This should accept the keyword arguments preds and dtrain and should return a named list with three elements:

    • name: A string with the name of the metric, used for printing and storing results.

    • value: A single number indicating the value of the metric for the given predictions and true values

    • higher_better: A boolean indicating whether higher values indicate a better fit. For example, this would be FALSE for metrics like MAE or RMSE.

  • c. list: If a list is given, it should only contain character vectors and functions. These should follow the requirements from the descriptions above.

verbose

verbosity for output, if <= 0, also will disable the print of evaluation during training

record

Boolean, TRUE will record iteration message to booster$record_evals

eval_freq

evaluation output frequency, only effect when verbose > 0

init_model

path of model file of lgb.Booster object, will continue training from this model

colnames

feature names, if not null, will use this to overwrite the names in dataset

categorical_feature

categorical features. This can either be a character vector of feature names or an integer vector with the indices of the features (e.g. c(1L, 10L) to say "the first and tenth columns").

early_stopping_rounds

int. Activates early stopping. When this parameter is non-null, training will stop if the evaluation of any metric on any validation set fails to improve for early_stopping_rounds consecutive boosting rounds. If training stops early, the returned model will have attribute best_iter set to the iteration number of the best iteration.

callbacks

List of callback functions that are applied at each iteration.

reset_data

Boolean, setting it to TRUE (not the default value) will transform the booster model into a predictor model which frees up memory and the original datasets

...

other parameters, see the "Parameters" section of the documentation for more information. A few key parameters:

  • boosting: Boosting type. "gbdt", "rf", "dart" or "goss".

  • num_leaves: Maximum number of leaves in one tree.

  • max_depth: Limit the max depth for tree model. This is used to deal with overfitting. Tree still grow by leaf-wise.

  • num_threads: Number of threads for LightGBM. For the best speed, set this to the number of real CPU cores(parallel::detectCores(logical = FALSE)), not the number of threads (most CPU using hyper-threading to generate 2 threads per CPU core).

NOTE: As of v3.3.0, use of ... is deprecated. Add parameters to params directly.

Value

a trained booster model lgb.Booster.

Early Stopping

"early stopping" refers to stopping the training process if the model's performance on a given validation set does not improve for several consecutive iterations.

If multiple arguments are given to eval, their order will be preserved. If you enable early stopping by setting early_stopping_rounds in params, by default all metrics will be considered for early stopping.

If you want to only consider the first metric for early stopping, pass first_metric_only = TRUE in params. Note that if you also specify metric in params, that metric will be considered the "first" one. If you omit metric, a default metric will be used based on your choice for the parameter obj (keyword argument) or objective (passed into params).

Examples

# \donttest{ data(agaricus.train, package = "lightgbm") train <- agaricus.train dtrain <- lgb.Dataset(train$data, label = train$label) data(agaricus.test, package = "lightgbm") test <- agaricus.test dtest <- lgb.Dataset.create.valid(dtrain, test$data, label = test$label) params <- list( objective = "regression" , metric = "l2" , min_data = 1L , learning_rate = 1.0 ) valids <- list(test = dtest) model <- lgb.train( params = params , data = dtrain , nrounds = 5L , valids = valids , early_stopping_rounds = 3L )
#> [LightGBM] [Warning] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000980 seconds. #> You can set `force_row_wise=true` to remove the overhead. #> And if memory is not enough, you can set `force_col_wise=true`. #> [LightGBM] [Info] Total Bins 232 #> [LightGBM] [Info] Number of data points in the train set: 6513, number of used features: 116 #> [LightGBM] [Info] Start training from score 0.482113 #> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf #> [1] "[1]: test's l2:6.44165e-17" #> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf #> [1] "[2]: test's l2:1.97215e-31" #> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf #> [1] "[3]: test's l2:0" #> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf #> [LightGBM] [Warning] Stopped training because there are no more leaves that meet the split requirements #> [1] "[4]: test's l2:0" #> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf #> [LightGBM] [Warning] Stopped training because there are no more leaves that meet the split requirements #> [1] "[5]: test's l2:0"
# }