Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] 4.0.0 Release #5153

Closed
31 of 60 tasks
jameslamb opened this issue Apr 14, 2022 · 36 comments · Fixed by #5952
Closed
31 of 60 tasks

[RFC] 4.0.0 Release #5153

jameslamb opened this issue Apr 14, 2022 · 36 comments · Fixed by #5952

Comments

@jameslamb
Copy link
Collaborator

jameslamb commented Apr 14, 2022

Summary

@StrikerRUS @guolinke @shiyu1994 @jmoralez @btrotta @Laurae2 could you please try to list out what you all feel is required before a v4.0.0 release of LightGBM is prepared?

Please edit this description and add issues that you feel should be fixed prior to a v4.0.0 release. I've proposed an initial list below.

Python package

R package

CUDA

GPU (non-CUDA)

Quantized Training

Other

Motivation

The most recent release of LightGBM, v3.3.1, was about 6 months ago (October 27, 2021.

There was a v3.3.2 release on January 7, 2022, but it just contained a single small patch to satisfy CRAN

We have been talking for even longer than that about putting out a 4.0.0 release of LightGBM. Many many fixes, features, and breaking changes have been merged since v3.3.1, and the state of this project on master is now significantly different from what users will get running pip install lightgbm.

I'd like to try to list out everything that we feel needs to be done before the 4.0.0 release, to focus our efforts and hopefully get that release out to users soon. I think an issue like this was successful with the v3.3.0 release (#4310).

Other Items for Release Checklist

  • enable building stable tag on readthedocs
@microsoft microsoft locked and limited conversation to collaborators Apr 14, 2022
@jameslamb
Copy link
Collaborator Author

For now, I've locked the conversation on this issue so that only collaborators in the repo are able to comment, to keep the conversation focused on what maintainers are comfortable committing to for v4.0.0.

Users and other outside interested parties can open other issues referencing this one with questions or concerns.

@StrikerRUS
Copy link
Collaborator

StrikerRUS commented Apr 15, 2022

Great plan!
But sorry, I don't believe we'll able to release v4.0.0 in the near future with the current project activity level.

I strongly believe we should focus on bug fixes and reviewing PRs from an outside contributors to keep users' loyalty.

@jameslamb
Copy link
Collaborator Author

I don't believe we'll able to release v4.0.0 in the near future with the current project activity level

I also feel that it's probably far away given the current project activity level, but I wanted to at least try to push v4.0.0 forward. I hope maybe the conversation on this issue will encourage Microsoft and other maintainers here to devote more time and resources to making it happen.

I expect that many (maybe even most) LightGBM users won't rely on specific commits, either via git clone or nightly builds, that aren't tagged as releases or published to package repositories like CRAN, PyPI, and conda-forge (which just re-bundles the PyPI package). For them, improvements to this project aren't "real" until they make it into a release.

And I can say that for me personally, it's difficult to stay motivated to spend time on challenging bug fixes, pull request reviews, and user questions in issues if it seems like that work might not ever be released 😞

@guolinke
Copy link
Collaborator

For the GPU(non-CUDA) bugs, I think it is hard to fix them, as we cannot reach its original developer.
I think we can focus on the cuda_exp version, and deprecate the old GPU codes.

@guolinke
Copy link
Collaborator

How about we make two lists? one is for necessary changes, another one is optional.
we can focus on the bug fixes, breaking but necessary new features/changes.
We can also mark the assignees in the list (you can also include me), and I will try my best to finish them.

@jameslamb
Copy link
Collaborator Author

How about we make two lists?

My goal with this issue was to define the list of "what is required for v4.0.0", and I think that's enough. The implication of that would be that anything not listed here is optional.

@guolinke
Copy link
Collaborator

@jameslamb haha, there are 52 items, so I think maybe it is too large, and some "hard" items may block us to release. So I propose to have a "core" list that must be done for the v4.0.0.

@StrikerRUS
Copy link
Collaborator

@guolinke

I think we can focus on the cuda_exp version, and deprecate the old GPU codes.

I'm very disappointed in this decision 🙁

As I said earlier, OpenCL-based version that is able to run on AMD and Intel GPUs is a competitive advantage of LightGBM.

For the GPU(non-CUDA) bugs, I think it is hard to fix them, as we cannot reach its original developer.

Have you tried to reach him via email?

@guolinke
Copy link
Collaborator

guolinke commented Apr 17, 2022

@StrikerRUS for the AMD GPU, we can use rocm, and there are several tools that can covert CUDA code to rocm code.
As for Intel GPU, I am not sure, are there any powerful GPUs that are widely used?

What I mean is to deprecate old GPU algorithm, and move to the new one. And we can adapt the new GPU algorithm to more platforms.

@shiyu1994
Copy link
Collaborator

shiyu1994 commented Apr 17, 2022

@jameslamb Thanks for opening this! I'm glad and excited to work towards 4.0.0. I'll commit more time from now on to guarantee the progress.

I think we can focus on the cuda_exp version, and deprecate the old GPU codes.

We will make cuda_exp version supporting as many platforms as now by LightGBM. However, before cuda_exp is stable enough, I think we can still maintain the current CUDA and GPU versions, including bug fixing.

@StrikerRUS
Copy link
Collaborator

StrikerRUS commented Apr 17, 2022

for the AMD GPU, we can use rocm, and there are several tools that can covert CUDA code to rocm code.

Oh, interesting idea, I've never heard about such tools! I was sure that CUDA can be run exclusively on NVIDIA cards. However, do you think code transpiling will be convenient for LightGBM users? Honestly, I don't believe that ordinary users will do such things. In contrast, OpenCL version can be used just out of the box on Windows and possibly on Linux in the future.

As for Intel GPU, I am not sure, are there any powerful GPUs that are widely used?

Probably, they will:

Currently we know Intel plans to launch the first Arc GPUs for desktops as soon as the summer of 2022, and at the end of its promo video for the first Arc 3 laptop GPUs the company included a teaser image (above) of a full-size Arc desktop GPU. It sure looks like to be as big as an Nvidia GeForce RTX 3060 — will it perform as well too? We'll have to wait and see.
https://www.tomsguide.com/news/intel-arc-gpu-specs-release-date-features-and-more

@guolinke
Copy link
Collaborator

guolinke commented Apr 17, 2022

@StrikerRUS the code conversion should be manually done by our, and commit to repo.
refer to https://rocmdocs.amd.com/en/latest/Programming_Guides/HIP-porting-guide.html

@guolinke guolinke reopened this Apr 17, 2022
@StrikerRUS
Copy link
Collaborator

@guolinke
Ah, I see now. That changes a lot!

@jameslamb jameslamb pinned this issue May 8, 2022
@jameslamb
Copy link
Collaborator Author

jameslamb commented Jul 15, 2022

Since the previous release, {lightgbm} is now depended on by several more packages.

image

https://cran.r-project.org/web/packages/lightgbm/index.html

Including a fairly high-profile one, {bonsai}, with support from RStudio: https://www.tidyverse.org/blog/2022/06/bonsai-0-1-0/.

I'm planning to try to do some work on those projects to make them compatible with v3.3.2 AND v4.0.0.

@jameslamb
Copy link
Collaborator Author

I've opened issues with offers of support for all of {lightgbm}'s reverse dependencies.

I also want to document this small function in R (proposed in tidymodels/bonsai#42) that can be used to check for the installed version of {lightgbm}:

using_newer_lightgbm_version <- function(){
    utils::packageVersion("lightgbm") > package_version("3.3.2")
}

# example
if (using_newer_lightgbm_version()) {
    predict(bst, X, rawscore = TRUE)
} else {
    predict(bst, X, type = "raw")
}

I hope we can get v4.0.0 out soon, and that we'll start releasing more frequently in the future so that we can just use a cycle of deprecation warnings to communicate such changes, instead of needing to go submit patches.

@jameslamb
Copy link
Collaborator Author

@shiyu1994 @guolinke can you please devote some time to LightGBM over the next few weeks, and can we try to get a 4.0 release out?

At this point, it's been more than a year since the last release with substantive changes (v3.3.1, in October 2021). We really need your help.

@shiyu1994
Copy link
Collaborator

@jameslamb Sure. Personally I feel very anxious whenever I think of that our goal is not finished. Sorry for being blocked by another project in the past few weeks. I'm coming back and will devote more time. Hopefully we can get 4.0.0 done before the new year.

@shiyu1994
Copy link
Collaborator

@jameslamb And thanks for your support all the time.

@jameslamb
Copy link
Collaborator Author

Of course! At this point, since it's been more than a year since the last release, I think we need to accept that all the things we'd hoped to get into v4.0.0 aren't going to make it there. It would be better to get a v4.0.0 out and then plan other breaking changes for a v5.0.0 in the future, in my opinion, then to delay v4.0.0 another 6+ months.

@shiyu1994
Copy link
Collaborator

It would be better to get a v4.0.0 out and then plan other breaking changes for a v5.0.0 in the future, in my opinion, then to delay v4.0.0 another 6+ months.

Strongly agree with that.

@jameslamb
Copy link
Collaborator Author

jameslamb commented Nov 18, 2022

I'm focusing most of my attention right now on fixing the issues related to Python packaging and CI as soon as possible, in preparation for this release.

It has gotten really complicated, so typing out here the sequence that things need to happen.

  1. get CI working again
  2. merge integrated aarch64 wheels PR
  3. switch to manylinux_2_28 images for building Linux wheels
  4. merge all the new guolinke/lightgbm-ci-docker stuff to master in that repo
  5. update to macOS-11 images on Azure DevOps and build macOS wheels with newer XCode
  6. switch from setup.py to pyproject.toml for wheel building or put a version ceiling on pip in CI

I'm happy to try to do most of the work, but really hope @StrikerRUS will be available to provide his expertise.

If he isn't, then @jmoralez @shiyu1994 @guolinke I will really need your help with reviews on these. If we are really struggling I can also try to recruit a volunteer from outside of LightGBM who's very familiar with Python packaging.

@guolinke
Copy link
Collaborator

@jameslamb Thank you! ping me when you need reviews.

@shiyu1994
Copy link
Collaborator

Just added quantized training (#5606) in the TODO list.

@jameslamb
Copy link
Collaborator Author

jameslamb commented Dec 29, 2022

I'd like to revisit this comment from April (#5153 (comment))

We will make cuda_exp version supporting as many platforms as now by LightGBM. However, before cuda_exp is stable enough, I think we can still maintain the current CUDA and GPU versions, including bug fixing

I'm happy to see the progress on implementing more objective functions in the cuda_exp version.

@shiyu1994 @guolinke Do you think we should just remove the version that's currently called cuda and just focus on cuda_exp? The in LightGBM v4.0.0, users who are installing with device_type = "cuda" will get what we're now calling "cuda_exp".

I think we should do that at this point, to:

  • reduce maintenance burden in the repo
  • ensure that we get bug reports only for the new CUDA version
  • reduce the time it takes CI to run

I don't think this project is healthy enough right now (e.g. releasing frequently enough) to do something like release v4.0.0 with both cuda and cuda_exp, then support both for a long time, then eventually remove the old cuda code.

@shiyu1994
Copy link
Collaborator

Do you think we should just remove the version that's currently called cuda and just focus on cuda_exp? The in LightGBM v4.0.0, users who are installing with device_type = "cuda" will get what we're now calling "cuda_exp".

Yes. We can do that in v4.0.0.

@jameslamb
Copy link
Collaborator Author

Excellent, thanks! I just opened #5677 for your consideration.

@jameslamb
Copy link
Collaborator Author

jameslamb commented Mar 3, 2023

The last thing I absolutely want to get done for v4.0.0 is fix #5061, updating the way lightgbm's wheels and sdists are built.

I'm working on that in #5759, and have started opening up some other small PRs to help with it (e.g. #5761).

@shiyu1994 @guolinke what else do you think absolutely needs to be done before we can put up a v4.0 release?

@shiyu1994
Copy link
Collaborator

@jameslamb For objective functions and metrics on new CUDA version, we will make most of them available on GPU. We may leave some of them (which are not very frequently used) falling back to CPU. This part we've almost done.

In addition, I think we should include quantized training. The code is ready in https://github.com/Quantized-GBDT/Quantized-GBDT. We just need to merge it into the official repo. (1~2 weeks ETA)

Finally, if possible, we may have a initial support for multi-GPU training on CUDA. An initial support of multi-GPU training may again takes 1~2 weeks. Depending on the progress of development and the date we want to make 4.0.0 released, we may adjust this goal.

@jameslamb
Copy link
Collaborator Author

Ok works for me! I'm happy to help with reviews, docs, tests, whatever you need.

@jameslamb
Copy link
Collaborator Author

There are now 1.5+ years worth of fixes and improvements sitting on master which haven't yet made it into a release.

@shiyu1994 @guolinke @jmoralez I think now that #5061 has been addressed and CPU-based quantized training has been added (#5800) we should finally do a v4.0.0 release and not wait any longer.

Do you agree? If yes, I'll create the release PR and we can talk more there.

@guolinke
Copy link
Collaborator

@shiyu1994 Do we have any other breaking changes?

@shiyu1994
Copy link
Collaborator

@jameslamb @guolinke What about we at least have #5933 merged into v4.0.0?

@jameslamb
Copy link
Collaborator Author

That PR doesn't contain breaking changes, so I don't think we should wait for it. It could be part of a 4.1 or 4.0.1 release.

If you can't think of any other breaking changes, I think we should move forward with v4.0.0.

@shiyu1994
Copy link
Collaborator

That's OK. We can move forward.

@jameslamb
Copy link
Collaborator Author

I have one more small breaking change that is ready to be reviewed:

#5947

After that, I'll make a PR for the 4.0 release. Exciting!!!

@jameslamb
Copy link
Collaborator Author

For anyone subscribed to notifications and wondering why v4.0.0 of the R package isn't on CRAN yet... it won't be until at least mid-August.

See #5952 (comment)

@jameslamb jameslamb unpinned this issue Oct 8, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants