lightgbm.cv

lightgbm.cv(params, train_set, num_boost_round=100, folds=None, nfold=5, stratified=True, shuffle=True, metrics=None, feval=None, init_model=None, fpreproc=None, seed=0, callbacks=None, eval_train_metric=False, return_cvbooster=False)[source]

Perform the cross-validation with given parameters.

Parameters:
  • params (dict) – Parameters for training. Values passed through params take precedence over those supplied via arguments.

  • train_set (Dataset) – Data to be trained on.

  • num_boost_round (int, optional (default=100)) – Number of boosting iterations.

  • folds (generator or iterator of (train_idx, test_idx) tuples, scikit-learn splitter object or None, optional (default=None)) – If generator or iterator, it should yield the train and test indices for each fold. If object, it should be one of the scikit-learn splitter classes (https://scikit-learn.org/stable/modules/classes.html#splitter-classes) and have split method. This argument has highest priority over other data split arguments.

  • nfold (int, optional (default=5)) – Number of folds in CV.

  • stratified (bool, optional (default=True)) – Whether to perform stratified sampling.

  • shuffle (bool, optional (default=True)) – Whether to shuffle before splitting data.

  • metrics (str, list of str, or None, optional (default=None)) – Evaluation metrics to be monitored while CV. If not None, the metric in params will be overridden.

  • feval (callable, list of callable, or None, optional (default=None)) –

    Customized evaluation function. Each evaluation function should accept two parameters: preds, eval_data, and return (eval_name, eval_result, is_higher_better) or list of such tuples.

    predsnumpy 1-D array or numpy 2-D array (for multi-class task)

    The predicted values. For multi-class task, preds are numpy 2-D array of shape = [n_samples, n_classes]. If custom objective function is used, predicted values are returned before any transformation, e.g. they are raw margin instead of probability of positive class for binary task in this case.

    eval_dataDataset

    A Dataset to evaluate.

    eval_namestr

    The name of evaluation function (without whitespace).

    eval_resultfloat

    The eval result.

    is_higher_betterbool

    Is eval result higher better, e.g. AUC is is_higher_better.

    To ignore the default metric corresponding to the used objective, set metrics to the string "None".

  • init_model (str, pathlib.Path, Booster or None, optional (default=None)) – Filename of LightGBM model or Booster instance used for continue training.

  • fpreproc (callable or None, optional (default=None)) – Preprocessing function that takes (dtrain, dtest, params) and returns transformed versions of those.

  • seed (int, optional (default=0)) – Seed used to generate the folds (passed to numpy.random.seed).

  • callbacks (list of callable, or None, optional (default=None)) – List of callback functions that are applied at each iteration. See Callbacks in Python API for more information.

  • eval_train_metric (bool, optional (default=False)) – Whether to display the train metric in progress. The score of the metric is calculated again after each training step, so there is some impact on performance.

  • return_cvbooster (bool, optional (default=False)) – Whether to return Booster models trained on each fold through CVBooster.

Note

A custom objective function can be provided for the objective parameter. It should accept two parameters: preds, train_data and return (grad, hess).

predsnumpy 1-D array or numpy 2-D array (for multi-class task)

The predicted values. Predicted values are returned before any transformation, e.g. they are raw margin instead of probability of positive class for binary task.

train_dataDataset

The training dataset.

gradnumpy 1-D array or numpy 2-D array (for multi-class task)

The value of the first order derivative (gradient) of the loss with respect to the elements of preds for each sample point.

hessnumpy 1-D array or numpy 2-D array (for multi-class task)

The value of the second order derivative (Hessian) of the loss with respect to the elements of preds for each sample point.

For multi-class task, preds are numpy 2-D array of shape = [n_samples, n_classes], and grad and hess should be returned in the same format.

Returns:

eval_results – History of evaluation results of each metric. The dictionary has the following format: {‘valid metric1-mean’: [values], ‘valid metric1-stdv’: [values], ‘valid metric2-mean’: [values], ‘valid metric2-stdv’: [values], …}. If return_cvbooster=True, also returns trained boosters wrapped in a CVBooster object via cvbooster key. If eval_train_metric=True, also returns the train metric history. In this case, the dictionary has the following format: {‘train metric1-mean’: [values], ‘valid metric1-mean’: [values], ‘train metric2-mean’: [values], ‘valid metric2-mean’: [values], …}.

Return type:

dict