Hist 10x slower than Exact #5405

shenkev · 2020-03-11T18:09:54Z

XGBoost version: 0.90
System: linux
CPU Cores: 40
Language: Python

I’m training with nthreads=40 on a dataset of size 12M and 48 features. “Exact” mode boosts trees at a rate of 1 tree per 12 seconds. With the same hyperparameters, “hist” mode (I’ve only changed “tree_method”) boosts trees at a rate of 1 per 2 minutes (10x slower). I am loading train and val data from libsvm files.

Furthermore, "hist" has a much longer startup time than "exact".

When I inspect the CPU usage, both “exact” and “hist” uses all 40 cores. The CPU usage of “exact” oscillates around 20-100% while the CPU usage of “hist” stays saturated around 100%.

hcho3 · 2020-03-11T18:28:33Z

@shenkev Can you try 1.0.2? We made lots of performance improvement in ‘hist’.

trivialfis · 2020-03-11T18:49:19Z

ping @SmirnovEgorRu here.

shenkev · 2020-03-11T19:07:02Z

Thanks for getting back to this, yes let me try the newest version.

SmirnovEgorRu · 2020-03-11T19:38:33Z

@shenkev, thank you for reporting the issue.
How did you obtain 20-100% CPU usage? If you used "top" tool, for example, it means that your cpu utilized only 1 core from 40 available (it should be 4000% in ideal case).
If it's not a private information - could you, please, send the full list of parameters what you used for training? I can try to reproduce the numbers.

shenkev · 2020-03-12T18:29:21Z

Sorry for the slow reply, I've tried the new stable release 1.0.0. Hist is no longer 10x slower than exact. However, it's still a bit slower.

Given my dataset size, I'm boosting 1 tree per 13 seconds in "exact" and 1 tree per 17 seconds in "hist".

The parameters I'm using for both algorithms are:

{
'eta': 0.01
'colsample_bytree: 0.7,
max_depth: 10,
objective: binary:logistic
}

Is "hist" expected to be slightly slower than "exact"? I've noticed from previous experience that hist doesn't have as much benefit over "exact" for small max_depth.

shenkev · 2020-03-12T18:30:49Z

@SmirnovEgorRu I'm using the Gnome system monitor app that let's me see the CPU usage for each CPU. By oscillating between 20% and 100% I mean each CPU oscillates in that range.

trivialfis · 2020-03-13T05:09:42Z

@shenkev How many boosted round did you run?

SmirnovEgorRu · 2020-03-13T09:06:30Z

@shenkev, I tested XGBoost 1.0.2 on your dimensions + your parameters:
Performance depending on tree method:

not set (selected to 'approx'): 136.856 sec
'exact': 141.309 sec
'hist': 23.859 sec

My reproducer:

import timeit
import xgboost as xgb
from sklearn.datasets import make_classification

print("XGBoost version: ", xgb.__version__)

print("Data generation...")
trainX, trainY = make_classification(n_samples=12000000, n_features=48)

param = {
    'n_estimators': 10,
    'eta': 0.01,
    'colsample_bytree': 0.7,
    'max_depth': 10,
    'objective': 'binary:logistic',
    'verbosity': 3,
    'tree_method': 'hist',
}

print("XGB Training...")
dtrain = xgb.DMatrix(trainX, label=trainY)
t1 = timeit.default_timer()
model_xgb = xgb.train(param, dtrain, param['n_estimators'])
t2 = timeit.default_timer()

print("Time =", (t2-t1)*1000, "ms")

HW: Xeon 5120 @ 2.20GHz, 14 cores/socket, 2 sockets, HT: on

Do you see similar numbers on your HW for the bench?

P.S. current master contains even stronger optimizations of 'hist' method vs, 1.0 version due to this PR #5244. So, you can try this and obtain even better results.

SmirnovEgorRu · 2020-03-13T10:50:32Z

@shenkev,

P.S. current master contains even stronger optimizations of 'hist' method vs, 1.0 version due to this PR #5244. So, you can try this and obtain even better results.

For example for 100 iteration on the same dataset and parameters with hist method I see:
XGB 1.0 - 116.053 sec
XGB master - 92.691 sec

shenkev · 2020-03-13T15:33:36Z

@SmirnovEgorRu Thanks for reproducing this. I'll try again with the new 1.0.2 version. Maybe the problem is with our particular dataset or environment.

@trivialfis I only ran 20 rounds to time the algorithm but our full model requires hundreds of rounds.

shenkev · 2020-03-13T18:42:53Z

I tried training in a different environment and the performance of hist was much better, it's now ~1.7 faster than exact.

My original environment was in a docker image using python. My other environment was using xgboost4j not inside a docker image.

In both environments, "exact" runs at about the same speed. "Hist" is slower only in the docker + python environment.

Any thoughts as to why I'm seeing difference in "hist" runtime between the different environments? Please close the issue otherwise.

SmirnovEgorRu · 2020-03-13T20:18:14Z

@shenkev, just for my understanding - do you use spark APIs?
I'm interesting, because I haven't checked perf of this. It looks like perf of 'hist' in case of single-node python API is much better than spark APIs. If so - it's the good reason to invest in optimizations of java.

trivialfis · 2020-03-13T22:20:54Z

If the data is extremely sparse, distributed algorithm can be much slower. I optimized quantile building for sparse data, but it doesn't work on distributed environment.

shenkev · 2020-03-13T22:25:39Z

No, we don't use Spark nor parallel computing (i'm 95% sure).

hcho3 closed this as completed Jun 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hist 10x slower than Exact #5405

Hist 10x slower than Exact #5405

shenkev commented Mar 11, 2020

hcho3 commented Mar 11, 2020 •

edited by trivialfis

Loading

trivialfis commented Mar 11, 2020

shenkev commented Mar 11, 2020

SmirnovEgorRu commented Mar 11, 2020

shenkev commented Mar 12, 2020

shenkev commented Mar 12, 2020

trivialfis commented Mar 13, 2020

SmirnovEgorRu commented Mar 13, 2020

SmirnovEgorRu commented Mar 13, 2020

shenkev commented Mar 13, 2020

shenkev commented Mar 13, 2020 •

edited

Loading

SmirnovEgorRu commented Mar 13, 2020

trivialfis commented Mar 13, 2020

shenkev commented Mar 13, 2020

Hist 10x slower than Exact #5405

Hist 10x slower than Exact #5405

Comments

shenkev commented Mar 11, 2020

hcho3 commented Mar 11, 2020 • edited by trivialfis Loading

trivialfis commented Mar 11, 2020

shenkev commented Mar 11, 2020

SmirnovEgorRu commented Mar 11, 2020

shenkev commented Mar 12, 2020

shenkev commented Mar 12, 2020

trivialfis commented Mar 13, 2020

SmirnovEgorRu commented Mar 13, 2020

SmirnovEgorRu commented Mar 13, 2020

shenkev commented Mar 13, 2020

shenkev commented Mar 13, 2020 • edited Loading

SmirnovEgorRu commented Mar 13, 2020

trivialfis commented Mar 13, 2020

shenkev commented Mar 13, 2020

hcho3 commented Mar 11, 2020 •

edited by trivialfis

Loading

shenkev commented Mar 13, 2020 •

edited

Loading