lightgbm.Booster

class lightgbm.Booster(params=None, train_set=None, model_file=None, model_str=None, silent=False)[source]

Bases: object

Booster in LightGBM.

__init__(params=None, train_set=None, model_file=None, model_str=None, silent=False)[source]

Initialize the Booster.

Parameters:
  • params (dict or None, optional (default=None)) – Parameters for Booster.
  • train_set (Dataset or None, optional (default=None)) – Training dataset.
  • model_file (string or None, optional (default=None)) – Path to the model file.
  • model_str (string or None, optional (default=None)) – Model will be loaded from this string.
  • silent (bool, optional (default=False)) – Whether to print messages during construction.

Methods

__init__([params, train_set, model_file, …]) Initialize the Booster.
add_valid(data, name) Add validation data.
attr(key) Get attribute string from the Booster.
current_iteration() Get the index of the current iteration.
dump_model([num_iteration, start_iteration]) Dump Booster to JSON format.
eval(data, name[, feval]) Evaluate for data.
eval_train([feval]) Evaluate for training data.
eval_valid([feval]) Evaluate for validation data.
feature_importance([importance_type, iteration]) Get feature importances.
feature_name() Get names of features.
free_dataset() Free Booster’s Datasets.
free_network() Free Booster’s network.
get_leaf_output(tree_id, leaf_id) Get the output of a leaf.
get_split_value_histogram(feature[, bins, …]) Get split value histogram for the specified feature.
model_from_string(model_str[, verbose]) Load Booster from a string.
model_to_string([num_iteration, start_iteration]) Save Booster to string.
num_feature() Get number of features.
num_model_per_iteration() Get number of models per iteration.
num_trees() Get number of weak sub-models.
predict(data[, num_iteration, raw_score, …]) Make a prediction.
refit(data, label[, decay_rate]) Refit the existing Booster by new data.
reset_parameter(params) Reset parameters of Booster.
rollback_one_iter() Rollback one iteration.
save_model(filename[, num_iteration, …]) Save Booster to file.
set_attr(**kwargs) Set attributes to the Booster.
set_network(machines[, local_listen_port, …]) Set the network configuration.
set_train_data_name(name) Set the name to the training Dataset.
shuffle_models([start_iteration, end_iteration]) Shuffle models.
update([train_set, fobj]) Update Booster for one iteration.
add_valid(data, name)[source]

Add validation data.

Parameters:
  • data (Dataset) – Validation data.
  • name (string) – Name of validation data.
Returns:

self – Booster with set validation data.

Return type:

Booster

attr(key)[source]

Get attribute string from the Booster.

Parameters:key (string) – The name of the attribute.
Returns:value – The attribute value. Returns None if attribute does not exist.
Return type:string or None
current_iteration()[source]

Get the index of the current iteration.

Returns:cur_iter – The index of the current iteration.
Return type:int
dump_model(num_iteration=None, start_iteration=0)[source]

Dump Booster to JSON format.

Parameters:
  • num_iteration (int or None, optional (default=None)) – Index of the iteration that should be dumped. If None, if the best iteration exists, it is dumped; otherwise, all iterations are dumped. If <= 0, all iterations are dumped.
  • start_iteration (int, optional (default=0)) – Start index of the iteration that should be dumped.
Returns:

json_repr – JSON format of Booster.

Return type:

dict

eval(data, name, feval=None)[source]

Evaluate for data.

Parameters:
  • data (Dataset) – Data for the evaluating.
  • name (string) – Name of the data.
  • feval (callable or None, optional (default=None)) –

    Customized evaluation function. Should accept two parameters: preds, eval_data, and return (eval_name, eval_result, is_higher_better) or list of such tuples.

    preds : list or numpy 1-D array
    The predicted values.
    eval_data : Dataset
    The evaluation dataset.
    eval_name : string
    The name of evaluation function (without whitespaces).
    eval_result : float
    The eval result.
    is_higher_better : bool
    Is eval result higher better, e.g. AUC is is_higher_better.

    For multi-class task, the preds is group by class_id first, then group by row_id. If you want to get i-th row preds in j-th class, the access way is preds[j * num_data + i].

Returns:

result – List with evaluation results.

Return type:

list

eval_train(feval=None)[source]

Evaluate for training data.

Parameters:feval (callable or None, optional (default=None)) –

Customized evaluation function. Should accept two parameters: preds, train_data, and return (eval_name, eval_result, is_higher_better) or list of such tuples.

preds : list or numpy 1-D array
The predicted values.
train_data : Dataset
The training dataset.
eval_name : string
The name of evaluation function (without whitespaces).
eval_result : float
The eval result.
is_higher_better : bool
Is eval result higher better, e.g. AUC is is_higher_better.

For multi-class task, the preds is group by class_id first, then group by row_id. If you want to get i-th row preds in j-th class, the access way is preds[j * num_data + i].

Returns:result – List with evaluation results.
Return type:list
eval_valid(feval=None)[source]

Evaluate for validation data.

Parameters:feval (callable or None, optional (default=None)) –

Customized evaluation function. Should accept two parameters: preds, valid_data, and return (eval_name, eval_result, is_higher_better) or list of such tuples.

preds : list or numpy 1-D array
The predicted values.
valid_data : Dataset
The validation dataset.
eval_name : string
The name of evaluation function (without whitespaces).
eval_result : float
The eval result.
is_higher_better : bool
Is eval result higher better, e.g. AUC is is_higher_better.

For multi-class task, the preds is group by class_id first, then group by row_id. If you want to get i-th row preds in j-th class, the access way is preds[j * num_data + i].

Returns:result – List with evaluation results.
Return type:list
feature_importance(importance_type='split', iteration=None)[source]

Get feature importances.

Parameters:
  • importance_type (string, optional (default="split")) – How the importance is calculated. If “split”, result contains numbers of times the feature is used in a model. If “gain”, result contains total gains of splits which use the feature.
  • iteration (int or None, optional (default=None)) – Limit number of iterations in the feature importance calculation. If None, if the best iteration exists, it is used; otherwise, all trees are used. If <= 0, all trees are used (no limits).
Returns:

result – Array with feature importances.

Return type:

numpy array

feature_name()[source]

Get names of features.

Returns:result – List with names of features.
Return type:list
free_dataset()[source]

Free Booster’s Datasets.

Returns:self – Booster without Datasets.
Return type:Booster
free_network()[source]

Free Booster’s network.

Returns:self – Booster with freed network.
Return type:Booster
get_leaf_output(tree_id, leaf_id)[source]

Get the output of a leaf.

Parameters:
  • tree_id (int) – The index of the tree.
  • leaf_id (int) – The index of the leaf in the tree.
Returns:

result – The output of the leaf.

Return type:

float

get_split_value_histogram(feature, bins=None, xgboost_style=False)[source]

Get split value histogram for the specified feature.

Parameters:
  • feature (int or string) –

    The feature name or index the histogram is calculated for. If int, interpreted as index. If string, interpreted as name.

    Warning

    Categorical features are not supported.

  • bins (int, string or None, optional (default=None)) – The maximum number of bins. If None, or int and > number of unique split values and xgboost_style=True, the number of bins equals number of unique split values. If string, it should be one from the list of the supported values by numpy.histogram() function.
  • xgboost_style (bool, optional (default=False)) – Whether the returned result should be in the same form as it is in XGBoost. If False, the returned value is tuple of 2 numpy arrays as it is in numpy.histogram() function. If True, the returned value is matrix, in which the first column is the right edges of non-empty bins and the second one is the histogram values.
Returns:

  • result_tuple (tuple of 2 numpy arrays) – If xgboost_style=False, the values of the histogram of used splitting values for the specified feature and the bin edges.
  • result_array_like (numpy array or pandas DataFrame (if pandas is installed)) – If xgboost_style=True, the histogram of used splitting values for the specified feature.

model_from_string(model_str, verbose=True)[source]

Load Booster from a string.

Parameters:
  • model_str (string) – Model will be loaded from this string.
  • verbose (bool, optional (default=True)) – Whether to print messages while loading model.
Returns:

self – Loaded Booster object.

Return type:

Booster

model_to_string(num_iteration=None, start_iteration=0)[source]

Save Booster to string.

Parameters:
  • num_iteration (int or None, optional (default=None)) – Index of the iteration that should be saved. If None, if the best iteration exists, it is saved; otherwise, all iterations are saved. If <= 0, all iterations are saved.
  • start_iteration (int, optional (default=0)) – Start index of the iteration that should be saved.
Returns:

str_repr – String representation of Booster.

Return type:

string

num_feature()[source]

Get number of features.

Returns:num_feature – The number of features.
Return type:int
num_model_per_iteration()[source]

Get number of models per iteration.

Returns:model_per_iter – The number of models per iteration.
Return type:int
num_trees()[source]

Get number of weak sub-models.

Returns:num_trees – The number of weak sub-models.
Return type:int
predict(data, num_iteration=None, raw_score=False, pred_leaf=False, pred_contrib=False, data_has_header=False, is_reshape=True, **kwargs)[source]

Make a prediction.

Parameters:
  • data (string, numpy array, pandas DataFrame, H2O DataTable's Frame or scipy.sparse) – Data source for prediction. If string, it represents the path to txt file.
  • num_iteration (int or None, optional (default=None)) – Limit number of iterations in the prediction. If None, if the best iteration exists, it is used; otherwise, all iterations are used. If <= 0, all iterations are used (no limits).
  • raw_score (bool, optional (default=False)) – Whether to predict raw scores.
  • pred_leaf (bool, optional (default=False)) – Whether to predict leaf index.
  • pred_contrib (bool, optional (default=False)) –

    Whether to predict feature contributions.

    Note

    If you want to get more explanations for your model’s predictions using SHAP values, like SHAP interaction values, you can install the shap package (https://github.com/slundberg/shap). Note that unlike the shap package, with pred_contrib we return a matrix with an extra column, where the last column is the expected value.

  • data_has_header (bool, optional (default=False)) – Whether the data has header. Used only if data is string.
  • is_reshape (bool, optional (default=True)) – If True, result is reshaped to [nrow, ncol].
  • **kwargs – Other parameters for the prediction.
Returns:

result – Prediction result.

Return type:

numpy array

refit(data, label, decay_rate=0.9, **kwargs)[source]

Refit the existing Booster by new data.

Parameters:
  • data (string, numpy array, pandas DataFrame, H2O DataTable's Frame or scipy.sparse) – Data source for refit. If string, it represents the path to txt file.
  • label (list, numpy 1-D array or pandas Series / one-column DataFrame) – Label for refit.
  • decay_rate (float, optional (default=0.9)) – Decay rate of refit, will use leaf_output = decay_rate * old_leaf_output + (1.0 - decay_rate) * new_leaf_output to refit trees.
  • **kwargs – Other parameters for refit. These parameters will be passed to predict method.
Returns:

result – Refitted Booster.

Return type:

Booster

reset_parameter(params)[source]

Reset parameters of Booster.

Parameters:params (dict) – New parameters for Booster.
Returns:self – Booster with new parameters.
Return type:Booster
rollback_one_iter()[source]

Rollback one iteration.

Returns:self – Booster with rolled back one iteration.
Return type:Booster
save_model(filename, num_iteration=None, start_iteration=0)[source]

Save Booster to file.

Parameters:
  • filename (string) – Filename to save Booster.
  • num_iteration (int or None, optional (default=None)) – Index of the iteration that should be saved. If None, if the best iteration exists, it is saved; otherwise, all iterations are saved. If <= 0, all iterations are saved.
  • start_iteration (int, optional (default=0)) – Start index of the iteration that should be saved.
Returns:

self – Returns self.

Return type:

Booster

set_attr(**kwargs)[source]

Set attributes to the Booster.

Parameters:**kwargs – The attributes to set. Setting a value to None deletes an attribute.
Returns:self – Booster with set attributes.
Return type:Booster
set_network(machines, local_listen_port=12400, listen_time_out=120, num_machines=1)[source]

Set the network configuration.

Parameters:
  • machines (list, set or string) – Names of machines.
  • local_listen_port (int, optional (default=12400)) – TCP listen port for local machines.
  • listen_time_out (int, optional (default=120)) – Socket time-out in minutes.
  • num_machines (int, optional (default=1)) – The number of machines for parallel learning application.
Returns:

self – Booster with set network.

Return type:

Booster

set_train_data_name(name)[source]

Set the name to the training Dataset.

Parameters:name (string) – Name for the training Dataset.
Returns:self – Booster with set training Dataset name.
Return type:Booster
shuffle_models(start_iteration=0, end_iteration=-1)[source]

Shuffle models.

Parameters:
  • start_iteration (int, optional (default=0)) – The first iteration that will be shuffled.
  • end_iteration (int, optional (default=-1)) – The last iteration that will be shuffled. If <= 0, means the last available iteration.
Returns:

self – Booster with shuffled models.

Return type:

Booster

update(train_set=None, fobj=None)[source]

Update Booster for one iteration.

Parameters:
  • train_set (Dataset or None, optional (default=None)) – Training data. If None, last training data is used.
  • fobj (callable or None, optional (default=None)) –

    Customized objective function. Should accept two parameters: preds, train_data, and return (grad, hess).

    preds : list or numpy 1-D array
    The predicted values.
    train_data : Dataset
    The training dataset.
    grad : list or numpy 1-D array
    The value of the first order derivative (gradient) for each sample point.
    hess : list or numpy 1-D array
    The value of the second order derivative (Hessian) for each sample point.

    For multi-class task, the preds is group by class_id first, then group by row_id. If you want to get i-th row preds in j-th class, the access way is score[j * num_data + i] and you should group grad and hess in this way as well.

Returns:

is_finished – Whether the update was successfully finished.

Return type:

bool