You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Feb 1, 2022. It is now read-only.
However, I found that dmatrix already has an internal logic to only read parts of data when it detects distributed mode. Then my manual data reading causes each rank to only read 1/N*N instead of 1/N data.
I think it could be better if adding a comment in that function to guide the users to rewrite it.
The text was updated successfully, but these errors were encountered:
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
When I started to deploy xgboost-operator on my kubeflow cluster, I referred to https://github.com/kubeflow/xgboost-operator/blob/master/config/samples/xgboost-dist/utils.py#L47 to implement my own version to read my own data. It's very common I follow this function to read parts of the whole data according to the rank manually.
However, I found that dmatrix already has an internal logic to only read parts of data when it detects distributed mode. Then my manual data reading causes each rank to only read 1/N*N instead of 1/N data.
I think it could be better if adding a comment in that function to guide the users to rewrite it.
The text was updated successfully, but these errors were encountered: