-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[jvm-packages] bridge the gaps between jvm package and native xgboost #7802
Comments
Related #4793 |
Since this is checked off does this mean xgboost4j-spark-gpu supports multi-worker training? I have not been able to get anything other than 1 worker to work. Is there a particular configuration that needs to be applied to enable multi-worker training? FYI I'm using XGBoost 1.6.1 and Spark 3.2.1. |
@mallman, Thx for testing xgboost4j-spark-gpu. Please note that each xgboost worker requires 1 GPU for 1 process, so if you are trying multi-worker, please be sure that you have multi-gpus. And you should also configure your spark cluster with GPU support, please refer to https://nvidia.github.io/spark-rapids/Getting-Started/ And as to how to submit the xgboost job, please follow up https://xgboost.readthedocs.io/en/latest/jvm/xgboost4j_spark_gpu_tutorial.html#submit-the-application. Please feel free to feed back. Thx very much. |
BTW, @mallman have you seen the obvious speed up? |
Hi @wbo4958. I think there's some ambiguity in my question. Let me clarify. What I want to do is run distributed training with a single worker per executor, like we can do in CPU mode. I have been able to make it work if I configure my Spark job with I'm starting to think that what I want is not achievable, at least not with ordinary Spark configuration. I think that maybe what I need is to use Spark's stage-level scheduling, introduced in Spark 3.1. We're using the standalone scheduler, which does not support this capability yet. So we may be stuck unless we switch to YARN or Kubernetes. So my question is, is it possible to run distributed-mode training in GPU mode without limiting the number of running tasks per executor to 1? Cheers. |
@mallman, I got you. Hmm. If, at any time, there is only 1 xgboost application running on your cluster (without any other spark application), then it's okay to set |
Hi @wbo4958. If I do that, all of the xgboost tasks run on a single executor, but no progress is made. I don't get an error either. It just waits. |
@mallman Could we file an issue to describe your issue, including env, script and so on? |
@wbo4958 I'm sorry, but I don't know when I'll return to this effort. But basically the question is whether one can run distributed xgboost with gpus without sacrificing task-parallelism in non-xgboost stages. |
The answer is yes just like #7802 (comment). So if you can't make it, I mean, you can file an issue with detailed information, so we can figure out why you can't run it successfully. |
We have a strong appetite for categorical feature support for the jvm package and willing to contribute, but it would help to get a bit more granular overview what still needs to happen, and which components we can contribute to in order to get this feature in. @wbo4958 any chance that we could extend the list of action points to get clarity what is done and what still needs to happen? "Support categorical data in jvm" is a bit vaguely defined for me, as a new contributor, to see where I can help. |
Hi @shadyelgewily-slimstock, according to #8727 (comment), seems you'd like to use java APIs to handle the categorical data instead of spark? if that is so, I think current the xgboost4j package has covered your requirement, please see https://github.com/dmlc/xgboost/pull/7966/files#diff-303feb16c30765909c132d10a2a38788c0a5e6cce038eed115e58322c0016f2fR268-R270 and https://github.com/dmlc/xgboost/pull/7966/files#diff-303feb16c30765909c132d10a2a38788c0a5e6cce038eed115e58322c0016f2fR286-R288. And you can refer this test https://github.com/dmlc/xgboost/pull/7966/files#diff-350a33aa9a66e2d51e745c5dc6a190113d2f0a2853a5974878686a30a2b0e47cR408-R430 for the usage. Currently, the item to support categorical data in xgboost4j-spark has not been implemented, you're welcome to contribute it. Thx |
Close task, tracked by #10415 |
JVM-packages is far behind the native XGBoost. I would like to file this issue to track some missing features or bugs that should be fixed in the incoming 2.0.0 release. Please feel free to add some.
New Features
getNumClasses
in XGBoostClassifier for XGBoost4j-Spark-GPU.Bugs
The text was updated successfully, but these errors were encountered: