Experiments

Comparison Experiment

For the detailed experiment scripts and output logs, please refer to this repo.

History

08 Mar, 2020: update according to the latest master branch (1b97eaf for XGBoost, bcad692 for LightGBM). (xgboost_exact is not updated for it is too slow.)

27 Feb, 2017: first version.

Data

We used 5 datasets to conduct our comparison experiments. Details of data are listed in the following table:

Data	Task	Link	#Train_Set	#Feature	Comments
Higgs	Binary classification	link	10,500,000	28	last 500,000 samples were used as test set
Yahoo LTR	Learning to rank	link	473,134	700	set1.train as train, set1.test as test
MS LTR	Learning to rank	link	2,270,296	137	{S1,S2,S3} as train set, {S5} as test set
Expo	Binary classification	link	11,000,000	700	last 1,000,000 samples were used as test set
Allstate	Binary classification	link	13,184,290	4228	last 1,000,000 samples were used as test set

Environment

We ran all experiments on a single Linux server (Azure ND24s) with the following specifications:

OS	CPU	Memory
Ubuntu 16.04 LTS	2 * E5-2690 v4	448GB

Baseline

We used xgboost as a baseline.

Both xgboost and LightGBM were built with OpenMP support.

Settings

We set up total 3 settings for experiments. The parameters of these settings are:

xgboost:

eta = 0.1
max_depth = 8
num_round = 500
nthread = 16
tree_method = exact
min_child_weight = 100

xgboost_hist (using histogram based algorithm):

eta = 0.1
num_round = 500
nthread = 16
min_child_weight = 100
tree_method = hist
grow_policy = lossguide
max_depth = 0
max_leaves = 255

LightGBM:

learning_rate = 0.1
num_leaves = 255
num_trees = 500
num_threads = 16
min_data_in_leaf = 0
min_sum_hessian_in_leaf = 100

xgboost grows trees depth-wise and controls model complexity by max_depth. LightGBM uses a leaf-wise algorithm instead and controls model complexity by num_leaves. So we cannot compare them in the exact same model setting. For the tradeoff, we use xgboost with max_depth=8, which will have max number leaves to 255, to compare with LightGBM with num_leaves=255.

Other parameters are default values.

Result

Speed

We compared speed using only the training task without any test or metric output. We didn’t count the time for IO. For the ranking tasks, since XGBoost and LightGBM implement different ranking objective functions, we used regression objective for speed benchmark, for the fair comparison.

The following table is the comparison of time cost:

Data	xgboost	xgboost_hist	LightGBM
Higgs	3794.34 s	165.575 s	130.094 s
Yahoo LTR	674.322 s	131.462 s	76.229 s
MS LTR	1251.27 s	98.386 s	70.417 s
Expo	1607.35 s	137.65 s	62.607 s
Allstate	2867.22 s	315.256 s	148.231 s

LightGBM ran faster than xgboost on all experiment data sets.

Accuracy

We computed all accuracy metrics only on the test data set.

Data	Metric	xgboost	xgboost_hist	LightGBM
Higgs	AUC	0.839593	0.845314	0.845724
Yahoo LTR	NDCG₁	0.719748	0.720049	0.732981
	NDCG₃	0.717813	0.722573	0.735689
	NDCG₅	0.737849	0.740899	0.75352
	NDCG₁₀	0.78089	0.782957	0.793498
MS LTR	NDCG₁	0.483956	0.485115	0.517767
	NDCG₃	0.467951	0.47313	0.501063
	NDCG₅	0.472476	0.476375	0.504648
	NDCG₁₀	0.492429	0.496553	0.524252
Expo	AUC	0.756713	0.776224	0.776935
Allstate	AUC	0.607201	0.609465	0.609072

Memory Consumption

We monitored RES while running training task. And we set two_round=true (this will increase data-loading time and reduce peak memory usage but not affect training speed or accuracy) in LightGBM to reduce peak memory usage.

Data	xgboost	xgboost_hist	LightGBM (col-wise)	LightGBM (row-wise)
Higgs	4.853GB	7.335GB	0.897GB	1.401GB
Yahoo LTR	1.907GB	4.023GB	1.741GB	2.161GB
MS LTR	5.469GB	7.491GB	0.940GB	1.296GB
Expo	1.553GB	2.606GB	0.555GB	0.711GB
Allstate	6.237GB	12.090GB	1.116GB	1.755GB

Parallel Experiment

History

27 Feb, 2017: first version.

Data

We used a terabyte click log dataset to conduct parallel experiments. Details are listed in following table:

Data	Task	Link	#Data	#Feature
Criteo	Binary classification	link	1,700,000,000	67

This data contains 13 integer features and 26 categorical features for 24 days of click logs. We statisticized the click-through rate (CTR) and count for these 26 categorical features from the first ten days. Then we used next ten days’ data, after replacing the categorical features by the corresponding CTR and count, as training data. The processed training data have a total of 1.7 billions records and 67 features.

Environment

We ran our experiments on 16 Windows servers with the following specifications:

OS	CPU	Memory	Network Adapter
Windows Server 2012	2 * E5-2670 v2	DDR3 1600Mhz, 256GB	Mellanox ConnectX-3, 54Gbps, RDMA support

Settings

learning_rate = 0.1
num_leaves = 255
num_trees = 100
num_thread = 16
tree_learner = data

We used data parallel here because this data is large in #data but small in #feature. Other parameters were default values.

Results

#Machine	Time per Tree	Memory Usage(per Machine)
1	627.8 s	176GB
2	311 s	87GB
4	156 s	43GB
8	80 s	22GB
16	42 s	11GB

The results show that LightGBM achieves a linear speedup with distributed learning.

GPU Experiments

Refer to GPU Performance.