-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hist 10x slower than Exact #5405
Comments
@shenkev Can you try 1.0.2? We made lots of performance improvement in ‘hist’. |
ping @SmirnovEgorRu here. |
Thanks for getting back to this, yes let me try the newest version. |
@shenkev, thank you for reporting the issue. |
Sorry for the slow reply, I've tried the new stable release 1.0.0. Hist is no longer 10x slower than exact. However, it's still a bit slower. Given my dataset size, I'm boosting 1 tree per 13 seconds in "exact" and 1 tree per 17 seconds in "hist". The parameters I'm using for both algorithms are: { Is "hist" expected to be slightly slower than "exact"? I've noticed from previous experience that hist doesn't have as much benefit over "exact" for small max_depth. |
@SmirnovEgorRu I'm using the Gnome system monitor app that let's me see the CPU usage for each CPU. By oscillating between 20% and 100% I mean each CPU oscillates in that range. |
@shenkev How many boosted round did you run? |
@shenkev, I tested XGBoost 1.0.2 on your dimensions + your parameters:
My reproducer: import timeit
import xgboost as xgb
from sklearn.datasets import make_classification
print("XGBoost version: ", xgb.__version__)
print("Data generation...")
trainX, trainY = make_classification(n_samples=12000000, n_features=48)
param = {
'n_estimators': 10,
'eta': 0.01,
'colsample_bytree': 0.7,
'max_depth': 10,
'objective': 'binary:logistic',
'verbosity': 3,
'tree_method': 'hist',
}
print("XGB Training...")
dtrain = xgb.DMatrix(trainX, label=trainY)
t1 = timeit.default_timer()
model_xgb = xgb.train(param, dtrain, param['n_estimators'])
t2 = timeit.default_timer()
print("Time =", (t2-t1)*1000, "ms") HW: Xeon 5120 @ 2.20GHz, 14 cores/socket, 2 sockets, HT: on Do you see similar numbers on your HW for the bench? P.S. current master contains even stronger optimizations of 'hist' method vs, 1.0 version due to this PR #5244. So, you can try this and obtain even better results. |
For example for 100 iteration on the same dataset and parameters with hist method I see: |
@SmirnovEgorRu Thanks for reproducing this. I'll try again with the new 1.0.2 version. Maybe the problem is with our particular dataset or environment. @trivialfis I only ran 20 rounds to time the algorithm but our full model requires hundreds of rounds. |
I tried training in a different environment and the performance of hist was much better, it's now ~1.7 faster than exact. My original environment was in a docker image using python. My other environment was using xgboost4j not inside a docker image. In both environments, "exact" runs at about the same speed. "Hist" is slower only in the docker + python environment. Any thoughts as to why I'm seeing difference in "hist" runtime between the different environments? Please close the issue otherwise. |
@shenkev, just for my understanding - do you use spark APIs? |
If the data is extremely sparse, distributed algorithm can be much slower. I optimized quantile building for sparse data, but it doesn't work on distributed environment. |
No, we don't use Spark nor parallel computing (i'm 95% sure). |
XGBoost version: 0.90
System: linux
CPU Cores: 40
Language: Python
I’m training with nthreads=40 on a dataset of size 12M and 48 features. “Exact” mode boosts trees at a rate of 1 tree per 12 seconds. With the same hyperparameters, “hist” mode (I’ve only changed “tree_method”) boosts trees at a rate of 1 per 2 minutes (10x slower). I am loading train and val data from libsvm files.
Furthermore, "hist" has a much longer startup time than "exact".
When I inspect the CPU usage, both “exact” and “hist” uses all 40 cores. The CPU usage of “exact” oscillates around 20-100% while the CPU usage of “hist” stays saturated around 100%.
The text was updated successfully, but these errors were encountered: