lightgbm.cv
- lightgbm.cv(params, train_set, num_boost_round=100, folds=None, nfold=5, stratified=True, shuffle=True, metrics=None, feval=None, init_model=None, fpreproc=None, seed=0, callbacks=None, eval_train_metric=False, return_cvbooster=False)[source]
Perform the cross-validation with given parameters.
- Parameters:
params (dict) – Parameters for training. Values passed through
params
take precedence over those supplied via arguments.train_set (Dataset) – Data to be trained on.
num_boost_round (int, optional (default=100)) – Number of boosting iterations.
folds (generator or iterator of (train_idx, test_idx) tuples, scikit-learn splitter object or None, optional (default=None)) – If generator or iterator, it should yield the train and test indices for each fold. If object, it should be one of the scikit-learn splitter classes (https://scikit-learn.org/stable/modules/classes.html#splitter-classes) and have
split
method. This argument has highest priority over other data split arguments.nfold (int, optional (default=5)) – Number of folds in CV.
stratified (bool, optional (default=True)) – Whether to perform stratified sampling.
shuffle (bool, optional (default=True)) – Whether to shuffle before splitting data.
metrics (str, list of str, or None, optional (default=None)) – Evaluation metrics to be monitored while CV. If not None, the metric in
params
will be overridden.feval (callable, list of callable, or None, optional (default=None)) –
Customized evaluation function. Each evaluation function should accept two parameters: preds, eval_data, and return (eval_name, eval_result, is_higher_better) or list of such tuples.
- predsnumpy 1-D array or numpy 2-D array (for multi-class task)
The predicted values. For multi-class task, preds are numpy 2-D array of shape = [n_samples, n_classes]. If custom objective function is used, predicted values are returned before any transformation, e.g. they are raw margin instead of probability of positive class for binary task in this case.
- eval_dataDataset
A
Dataset
to evaluate.- eval_namestr
The name of evaluation function (without whitespace).
- eval_resultfloat
The eval result.
- is_higher_betterbool
Is eval result higher better, e.g. AUC is
is_higher_better
.
To ignore the default metric corresponding to the used objective, set
metrics
to the string"None"
.init_model (str, pathlib.Path, Booster or None, optional (default=None)) – Filename of LightGBM model or Booster instance used for continue training.
fpreproc (callable or None, optional (default=None)) – Preprocessing function that takes (dtrain, dtest, params) and returns transformed versions of those.
seed (int, optional (default=0)) – Seed used to generate the folds (passed to numpy.random.seed).
callbacks (list of callable, or None, optional (default=None)) – List of callback functions that are applied at each iteration. See Callbacks in Python API for more information.
eval_train_metric (bool, optional (default=False)) – Whether to display the train metric in progress. The score of the metric is calculated again after each training step, so there is some impact on performance.
return_cvbooster (bool, optional (default=False)) – Whether to return Booster models trained on each fold through
CVBooster
.
Note
A custom objective function can be provided for the
objective
parameter. It should accept two parameters: preds, train_data and return (grad, hess).- predsnumpy 1-D array or numpy 2-D array (for multi-class task)
The predicted values. Predicted values are returned before any transformation, e.g. they are raw margin instead of probability of positive class for binary task.
- train_dataDataset
The training dataset.
- gradnumpy 1-D array or numpy 2-D array (for multi-class task)
The value of the first order derivative (gradient) of the loss with respect to the elements of preds for each sample point.
- hessnumpy 1-D array or numpy 2-D array (for multi-class task)
The value of the second order derivative (Hessian) of the loss with respect to the elements of preds for each sample point.
For multi-class task, preds are numpy 2-D array of shape = [n_samples, n_classes], and grad and hess should be returned in the same format.
- Returns:
eval_results – History of evaluation results of each metric. The dictionary has the following format: {‘valid metric1-mean’: [values], ‘valid metric1-stdv’: [values], ‘valid metric2-mean’: [values], ‘valid metric2-stdv’: [values], …}. If
return_cvbooster=True
, also returns trained boosters wrapped in aCVBooster
object viacvbooster
key. Ifeval_train_metric=True
, also returns the train metric history. In this case, the dictionary has the following format: {‘train metric1-mean’: [values], ‘valid metric1-mean’: [values], ‘train metric2-mean’: [values], ‘valid metric2-mean’: [values], …}.- Return type:
dict