Skip to content
This repository has been archived by the owner on Jul 16, 2021. It is now read-only.

Migrate to XGBoost mainline repository #39

Closed
mrocklin opened this issue May 2, 2019 · 23 comments
Closed

Migrate to XGBoost mainline repository #39

mrocklin opened this issue May 2, 2019 · 23 comments

Comments

@mrocklin
Copy link
Member

mrocklin commented May 2, 2019

@RAMitchell (an xgboost maintainer) was mentioning that it might be possible to migrate the whole of the dask-xgboost codebase into xgboost itself.

Any thoughts or concerns about this?

I think that the proposed API change would be to add a dask_client= keyword (or something similar) to the official train and predict methods.

@TomAugspurger
Copy link
Member

No objections here, though I think the CI is failing with the latest xgboost. I have a half-written tutorial copying https://xgboost.readthedocs.io/en/latest/jvm/xgboost4j_spark_tutorial.html. I’ll try to finish that up in a couple weeks.

@javabrett
Copy link
Contributor

Someone should verify that this code with license (BSD 3-clause) can be absorbed into and under their license (ASF 2.0) without changes.

I expect this is fine as BSD-3-clause is permissive, but best make sure they have checked before they accept the code, that no relicensing is required.

@jakirkham
Copy link
Member

As xgboost now has native Dask support in mainline ( dmlc/xgboost#4473 ), what would like to do with this repo?

@TomAugspurger
Copy link
Member

TomAugspurger commented May 31, 2019 via email

@TomAugspurger
Copy link
Member

TomAugspurger commented May 31, 2019 via email

@jakirkham
Copy link
Member

That sounds reasonable. How do we handle new issues that are reported here?

@mrocklin
Copy link
Member Author

mrocklin commented May 31, 2019 via email

@ksangeek
Copy link
Contributor

@mrocklin Could you please elaborate a bit on how the implementation in this repository is different from the one included in dmlc/xgboost? And if you have some more time on what circumstances this one could be preferred?

@mrocklin
Copy link
Member Author

mrocklin commented Jun 18, 2019 via email

@RAMitchell
Copy link

@ksangeek See demos here: https://github.com/dmlc/xgboost/tree/master/demo/dask

The xgboost dask integration is currently slightly more low-level than what is here. We are currently considering if we can extend the API to provide more high level functionality currently available in dask-xgboost without duplicating existing xgboost APIs. If this can be achieved we should be able to definitively move users from dask-xgboost to native xgboost integration and deprecate this without loss of user experience.

@ksangeek
Copy link
Contributor

Thank you @mrocklin and @RAMitchell for the useful information.

@kylejn27
Copy link
Contributor

Has this been thought about further? There's functionality in this library that I'd like to implement that's already been done in dmlc/xgboost's native dask integration

@TomAugspurger
Copy link
Member

From what I understand, the implementation in xgboost is closer to what we have here now.

Personally, I'd like to see this handled within xgboost itself, to reduce the maintenance load here :)

Are there any features within dask-xgboost not handled within xgboost itself?

@kylejn27
Copy link
Contributor

kylejn27 commented Nov 26, 2019

the implementation in xgboost is closer to what we have here now.

I've been bouncing around the two codebases in the past few weeks, their implementation is extremely similar to what's in this repo, but supports more features in the underlying xgboost library

Are there any features within dask-xgboost not handled within xgboost itself

Nope. The peers that I work with are requesting features for this library that have already been implemented in the native xgboost dask implementation

@TomAugspurger
Copy link
Member

This topic came up in the monthly Dask meeting today. On the call @kylejn27 repeated that dmlc/xgboost handles everything dask-xgboost handles and more. If that is the case then maintaining dask-xgboost is wasted effort.

XGBoost hasn't made a release since the refactored dask handling went in. So we'll need to wait until their next release before making any concrete decisions. But my preference is to gather feedback from users (here and from followers of the dask_dev Twitter account) on whether the implementation in dmlc/xgboost suffices after the next release. If so, then we'll archive this repository and direct people there.

@hcho3
Copy link

hcho3 commented Feb 19, 2020

FYI, I am currently working on the upcoming release for XGBoost (1.0.0). ETA is tomorrow (2/19).

@jakirkham
Copy link
Member

@mmccarty @kylejn27 @gforsyth, would it be possible to get a summary of the current outstanding issues for migrating to xgboost for Dask usage?

cc @JohnZed (for vis)

@jakirkham
Copy link
Member

On a different note, there is currently an RC for xgboost 1.2.0. Would be great if people can try and share feedback here ( dmlc/xgboost#5970 ).

@trivialfis
Copy link

Please loop me in to related issues around migration.

@mmccarty
Copy link
Member

This are the issues that I'm aware of

@jameslamb
Copy link
Member

Now that release 1.3.0 is out (congrats!!!) and it includes fixes for all the issues in #39 (comment), can this be closed?

@jakirkham
Copy link
Member

Now that release 1.3.0 is out (congrats!!!) and it includes fixes for all the issues in #39 (comment), can this be closed?

@mmccarty? 🙂

@mmccarty
Copy link
Member

Yes! Let's close this issue. Thank you everyone for the hard work!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests