[RFC] version changes + more frequent releases #3210

jameslamb · 2020-07-07T04:53:07Z

I'd like to open this request for comment to discuss a proposal.

After releasing v 3.0.0 (#3071 ), I'd like to propose that we use 4-part version numbers for language wrappers, broken down like this:

So, for example, if you see version 3.1.0.8 of the R package, that means "the 8th released version of the R package which wraps LightGBM version 3.1.0".

Example

The examples below don't propose that every new merge to master becomes a release, but the changes below are examples used to show what might cause different components of a 4-part version number to change.

Event 1: 3.0.0 is release

LightGBM version set to 3.0.0
lightgbm (Python) 3.0.0.0 released to PyPi
{lightgbm} (R) 3.0.0.0 released to CRAN
LightGBM (lib for .NET extensions) 3.0.0.0 released to NuGet

Event 2: bug fix to LightGBM, like fixing #3209

LightGBM version set to 3.0.1
lightgbm (Python) 3.0.1.0 released to PyPi
{lightgbm} (R) 3.0.1.0 released to CRAN
LightGBM (lib for .NET extensions) 3.0.1.0 released to NuGet

Event 3: bug fix in {lightgbm} (R), like #3117

{lightgbm} (R) 3.0.1.1 released to CRAN

Event 4: LightGBM adds a new type of boosting, like #2644

LightGBM version set to 3.1.0
lightgbm (Python) 3.1.0.0 released to PyPi
{lightgbm} (R) 3.1.0.0 released to CRAN
LightGBM (lib for .NET extensions) 3.1.0.0 released to NuGet

How this makes LightGBM better

This approach would allow us to release fixes to individual components of LightGBM more frequently.

This would allow us to avoid the current situation, where the PyPi package (for example), has not had an update in 7 months: https://pypi.org/project/lightgbm/#history. More frequent updates allow our users to rely on package managers more, instead of building from GitHub, which I think is a better user experience.

Releasing more frequently would also reduce the gap between the current state of this repo and the documentation at https://lightgbm.readthedocs.io/en/latest/, so that that documentation is more likely to answer a user's questions accurately.

Allowing the version numbers to be different between R and Python (for example), is important since this two libraries are at very different stages in their development. The R package is still somewhat immature and there is a lot of work ahead for it, while the Python package is fairly mature and stable by comparison. A 4-part version number would allow the R package to be more frequently updated than the Python package, while preserving the use of the first three version components for LightGBM itself..

The text was updated successfully, but these errors were encountered:

guolinke · 2020-07-07T09:52:34Z

will this conflict with semantic versioning? https://semver.org/

StrikerRUS · 2020-07-07T12:50:10Z

Good suggestion!
If I'm not mistaken something similar @imatiach-msft uses for JAVA binding in MMLSpark: #3041 (comment).

However, I vote for the consistent version number across all official LightGBM components. 4-part version numbers will greatly increase the maintenance burden. Also, it will be very hard to make separate changelogs across all components, because you will need to list all commits multiple times and keep track of them per component.

This approach would allow us to release fixes to individual components of LightGBM more frequently.

I'm not sure that we are able to do that due to the lack of time and other resources. Instead, I suggest to get back and try to stick to bi-monthly releases. I believe, it'll be enough for the most of our users.

imatiach-msft · 2020-07-07T14:17:51Z

@StrikerRUS yes, I do almost exactly this, except instead of a fourth version I extend the third version, eg 2.3.150 corresponds to 2.3.1. I like this proposed versioning schema and can migrate to it for the JAVA wrapper, I'm open to any new ideas. I can't really keep to 2.3.1 because the JAVA releases are separate and I sometimes have blocking issues that span both (JAVA JNI + native jar) and mmlspark Scala wrapper code - and waiting for the next official LightGBM release to create the jar would be an extra burden, especially since MMLSpark is not as stable as LightGBM and often users hit new blocking issues. I kept it to 3 versions ..* when I originally released because that seems to be the standard way for semantic versioning.

mirekphd · 2020-08-06T22:29:50Z

will this conflict with semantic versioning? https://semver.org/

Even if it is permissible under all world conventions, it is rare enough that most CI CD systems have not been tested for it. One needs a sizeable collection of packages installed in their environment to encounter first case of this kind. We do, so I can confirm that the incidence of 4-part tokens among python packages used for data science and machine learning is around 1.5%. Three of these packages are even very well known (at least in the ML community).

Here's the list of such packages (among 850 we have installed in our largest container heavily influenced by Kaggle Kernels):

dill
ephem                     
gettext                   
h2o                       
lime                      
mkl-random
msgpack-numpy
opencv-python 
pkginfo                   
ppft                      
pystan                    
singledispatch            
typing

mirekphd · 2020-08-06T22:35:40Z

Can we please try to separate the red herring of 4-part versions with the urgent bug of no releases having been made for 8 months, which was raised e.g. in #3274?

guolinke · 2020-08-06T22:55:15Z

@mirekphd
I think the delay of the current release is due to many new changes in the 3.0 version.
3.0 provides about 2x speed-up in CPU, and many new (breaking) features. There are still some on-going works, so we will release a pre-release now, and continued to work on the rest items.
It is not the usual case, normally, we will release by monthly or bi-monthly.

BTW, currently, the release process is manually. It will be better if we can fully automate it, so that we can have a more frequent release.

mirekphd · 2020-08-07T09:23:33Z

3.0 provides about 2x speed-up in CPU, and many new (breaking) features. There are still some on-going works, so we will release a pre-release now, and continued to work on the rest items.

Excellent news! I did not know that such large improvements were still possible! It means that in v3.0.0 CPU training will most likely overtake GPU training...:) the difference in favor of GPU is so small even for huge datasets and under the new CUDA implementation, as we saw in #3160

By the way, I happen to know that there is still a room for substantial improvement in your CPU implementation for a very frequent use case, but now I will wait for your 3.0.0 release to see if my ideas will still work in that version too before making them public.

guolinke · 2020-08-07T09:53:59Z

@mirekphd
the remaining works of 3.0 are the more new features, the CPU efficiency part is almost done.
you can have a try, we just released 3.0.0rc1 .

AlbertoEAF · 2020-11-07T18:28:11Z

Hello, just to be sure, are we migrating to the 4-part versioning or no? We're already at 3.0.0.99 after all.

But yes, having more releases would be nice, maybe it would be a good time to launch a new one :)

Should we close this issue?

StrikerRUS · 2020-11-07T18:52:01Z

@AlbertoEAF

We're already at 3.0.0.99 after all.

I believe that current 4-part versioning has a bit different semantics. #3344 (comment)

maybe it would be a good time to launch a new one :)

Already is in progress: #3484! 🙂

StrikerRUS · 2021-03-23T11:00:00Z

@jameslamb I think we can close this. Seems this maintenance burden doesn't worth it. One synced release for all components is better I believe. WDYT?

jameslamb · 2021-03-23T13:40:34Z

Seems I was outvoted on this, yes.

mirekphd · 2022-07-10T15:22:07Z

The question of infrequent releases has returned. Currently there has been no new tag or release added for half a year.

If manual tagging is an excessive burden, then maybe adding automated daily tags instead of the patch version, e.g. in the format:

<major>.<minor>.YYYYMMDD

would work for you? This can be probably easily automated, of course at the cost of violating some semantic versioning rules, e.g. running a risk of introducing breaking changes without proper warning to the users (via an increase in the major version).

For such auto-tagged releases no release notes are expected either, so it's enough to ensure that code committed to the master branch gets covered by build tests before it gets auto-tagged and auto-released.

Of course you can make the auto-release frequency lower than daily (e.g. monthly) and use version increments, but then users would start expecting some release notes.

@jameslamb it seems this is still an unresolved issue - why not reopen it here or create a new one to address the low update frequency part?

jameslamb · 2022-08-16T02:51:05Z

Thanks @mirekphd . I promise, I understand the frustration with how long this project has gone without a release. I've described some of that pain in #5153.

Operational concerns like "manual tagging ... burden" are not the main reasons LightGBM has gone so long without a new release.

Some projects started 18+ months ago (e.g. #3234, items under "CUDA" at #5153) promised to introduce significant breaking changes on master, so many other breaking changes have accumulated on master in anticipation of a 4.0.0 release that would include them. Until those projects are merged and in a releasable state (or until something significant changes about the direction of the project), there won't be a new release.

cc @shiyu1994 @StrikerRUS @jmoralez @guolinke if you want to add anything else

guolinke · 2022-08-16T06:48:50Z

@jameslamb we can focus on "breaking" changes first, and make the next release faster.

jameslamb · 2022-08-19T02:51:59Z

That would be great. I really hope we can do a release soon.

github-actions · 2023-08-15T20:34:00Z

This issue has been automatically locked since there has not been any recent activity since it was closed.
To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues
including a reference to this.

jameslamb added the question label Jul 7, 2020

StrikerRUS mentioned this issue Aug 5, 2020

[bug] No release for 8 months #3274

Closed

StrikerRUS mentioned this issue Nov 22, 2020

WIP: [R-package] new release to fix CRAN error on 32-bit Windows #3586

Closed

jameslamb closed this as completed Mar 23, 2021

jameslamb mentioned this issue Apr 9, 2021

[ci] Add debian-clang-devel CI job for the R package #4164

Merged

jameslamb mentioned this issue Apr 30, 2021

[RFC] [R-package] Remove support for passing parameters through '...' #4226

Closed

jameslamb changed the title ~~RFC: version changes + more frequent releases~~ [RFC] version changes + more frequent releases Jan 3, 2022

jameslamb mentioned this issue Jan 4, 2022

[R-package] Apply patch for R4.2 on Windows #4923

Merged

jameslamb mentioned this issue Sep 14, 2022

Question: 3.4 release schedule #5483

Closed

github-actions bot locked as resolved and limited conversation to collaborators Aug 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] version changes + more frequent releases #3210

[RFC] version changes + more frequent releases #3210

jameslamb commented Jul 7, 2020

guolinke commented Jul 7, 2020

StrikerRUS commented Jul 7, 2020 •

edited

Loading

imatiach-msft commented Jul 7, 2020

mirekphd commented Aug 6, 2020 •

edited

Loading

mirekphd commented Aug 6, 2020

guolinke commented Aug 6, 2020

mirekphd commented Aug 7, 2020 •

edited

Loading

guolinke commented Aug 7, 2020

AlbertoEAF commented Nov 7, 2020

StrikerRUS commented Nov 7, 2020

StrikerRUS commented Mar 23, 2021

jameslamb commented Mar 23, 2021

mirekphd commented Jul 10, 2022 •

edited

Loading

jameslamb commented Aug 16, 2022

guolinke commented Aug 16, 2022

jameslamb commented Aug 19, 2022

github-actions bot commented Aug 15, 2023

[RFC] version changes + more frequent releases #3210

[RFC] version changes + more frequent releases #3210

Comments

jameslamb commented Jul 7, 2020

Example

How this makes LightGBM better

guolinke commented Jul 7, 2020

StrikerRUS commented Jul 7, 2020 • edited Loading

imatiach-msft commented Jul 7, 2020

mirekphd commented Aug 6, 2020 • edited Loading

mirekphd commented Aug 6, 2020

guolinke commented Aug 6, 2020

mirekphd commented Aug 7, 2020 • edited Loading

guolinke commented Aug 7, 2020

AlbertoEAF commented Nov 7, 2020

StrikerRUS commented Nov 7, 2020

StrikerRUS commented Mar 23, 2021

jameslamb commented Mar 23, 2021

mirekphd commented Jul 10, 2022 • edited Loading

jameslamb commented Aug 16, 2022

guolinke commented Aug 16, 2022

jameslamb commented Aug 19, 2022

github-actions bot commented Aug 15, 2023

StrikerRUS commented Jul 7, 2020 •

edited

Loading

mirekphd commented Aug 6, 2020 •

edited

Loading

mirekphd commented Aug 7, 2020 •

edited

Loading

mirekphd commented Jul 10, 2022 •

edited

Loading