High-level R interface to train a LightGBM model. Unlike lgb.train, this function
is focused on compatibility with other statistics and machine learning interfaces in R.
This focus on compatibility means that this interface may experience more frequent breaking API changes
than lgb.train.
For efficiency-sensitive applications, or for applications where breaking API changes across releases
is very expensive, use lgb.train.
Usage
lightgbm(
data,
label = NULL,
weights = NULL,
params = list(),
nrounds = 100L,
verbose = 1L,
eval_freq = 1L,
early_stopping_rounds = NULL,
init_model = NULL,
callbacks = list(),
serializable = TRUE,
objective = "auto",
init_score = NULL,
num_threads = NULL,
colnames = NULL,
categorical_feature = NULL,
...
)Arguments
- data
a
lgb.Datasetobject, used for training. Some functions, such aslgb.cv, may allow you to pass other types of data likematrixand then separately supplylabelas a keyword argument.- label
Vector of labels, used if
datais not anlgb.Dataset- weights
Sample / observation weights for rows in the input data. If
NULL, will assume that all observations / rows have the same importance / weight.Changed from 'weight', in version 4.0.0
- params
a list of parameters. See the "Parameters" section of the documentation for a list of parameters and valid values.
- nrounds
number of training rounds
- verbose
verbosity for output, if <= 0 and
validshas been provided, also will disable the printing of evaluation during training- eval_freq
evaluation output frequency, only effective when verbose > 0 and
validshas been provided- early_stopping_rounds
int. Activates early stopping. When this parameter is non-null, training will stop if the evaluation of any metric on any validation set fails to improve for
early_stopping_roundsconsecutive boosting rounds. If training stops early, the returned model will have attributebest_iterset to the iteration number of the best iteration.- init_model
path of model file or
lgb.Boosterobject, will continue training from this model- callbacks
List of callback functions that are applied at each iteration.
- serializable
whether to make the resulting objects serializable through functions such as
saveorsaveRDS(see section "Model serialization").- objective
Optimization objective (e.g. `"regression"`, `"binary"`, etc.). For a list of accepted objectives, see the "objective" item of the "Parameters" section of the documentation.
If passing
"auto"anddatais not of typelgb.Dataset, the objective will be determined according to what is passed forlabel:If passing a factor with two variables, will use objective
"binary".If passing a factor with more than two variables, will use objective
"multiclass"(note that parameternum_classin this case will also be determined automatically fromlabel).Otherwise (or if passing
lgb.Datasetas input), will use objective"regression".
New in version 4.0.0
- init_score
initial score is the base prediction lightgbm will boost from
New in version 4.0.0
- num_threads
Number of parallel threads to use. For best speed, this should be set to the number of physical cores in the CPU - in a typical x86-64 machine, this corresponds to half the number of maximum threads.
Be aware that using too many threads can result in speed degradation in smaller datasets (see the parameters documentation for more details).
If passing zero, will use the default number of threads configured for OpenMP (typically controlled through an environment variable
OMP_NUM_THREADS).If passing
NULL(the default), will try to use the number of physical cores in the system, but be aware that getting the number of cores detected correctly requires packageRhpcBLASctlto be installed.This parameter gets overridden by
num_threadsand its aliases underparamsif passed there.New in version 4.0.0
- colnames
Character vector of features. Only used if
datais not anlgb.Dataset.- categorical_feature
categorical features. This can either be a character vector of feature names or an integer vector with the indices of the features (e.g.
c(1L, 10L)to say "the first and tenth columns"). Only used ifdatais not anlgb.Dataset.- ...
Additional arguments passed to
lgb.train. For examplevalids: a list oflgb.Datasetobjects, used for validationobj: objective function, can be character or custom objective function. Examples includeregression,regression_l1,huber,binary,lambdarank,multiclass,multiclasseval: evaluation function, can be (a list of) character or custom eval functionrecord: Boolean, TRUE will record iteration message tobooster$record_evalsreset_data: Boolean, setting it to TRUE (not the default value) will transform the booster model into a predictor model which frees up memory and the original datasets
Early Stopping
"early stopping" refers to stopping the training process if the model's performance on a given validation set does not improve for several consecutive iterations.
If multiple arguments are given to eval, their order will be preserved. If you enable
early stopping by setting early_stopping_rounds in params, by default all
metrics will be considered for early stopping.
If you want to only consider the first metric for early stopping, pass
first_metric_only = TRUE in params. Note that if you also specify metric
in params, that metric will be considered the "first" one. If you omit metric,
a default metric will be used based on your choice for the parameter obj (keyword argument)
or objective (passed into params).
NOTE: if using boosting_type="dart", any early stopping configuration will be ignored
and early stopping will not be performed.