-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] version changes + more frequent releases #3210
Comments
will this conflict with semantic versioning? https://semver.org/ |
Good suggestion! However, I vote for the consistent version number across all official LightGBM components. 4-part version numbers will greatly increase the maintenance burden. Also, it will be very hard to make separate changelogs across all components, because you will need to list all commits multiple times and keep track of them per component.
I'm not sure that we are able to do that due to the lack of time and other resources. Instead, I suggest to get back and try to stick to bi-monthly releases. I believe, it'll be enough for the most of our users. |
@StrikerRUS yes, I do almost exactly this, except instead of a fourth version I extend the third version, eg 2.3.150 corresponds to 2.3.1. I like this proposed versioning schema and can migrate to it for the JAVA wrapper, I'm open to any new ideas. I can't really keep to 2.3.1 because the JAVA releases are separate and I sometimes have blocking issues that span both (JAVA JNI + native jar) and mmlspark Scala wrapper code - and waiting for the next official LightGBM release to create the jar would be an extra burden, especially since MMLSpark is not as stable as LightGBM and often users hit new blocking issues. I kept it to 3 versions ..* when I originally released because that seems to be the standard way for semantic versioning. |
Even if it is permissible under all world conventions, it is rare enough that most CI CD systems have not been tested for it. One needs a sizeable collection of packages installed in their environment to encounter first case of this kind. We do, so I can confirm that the incidence of 4-part tokens among python packages used for data science and machine learning is around 1.5%. Three of these packages are even very well known (at least in the ML community). Here's the list of such packages (among 850 we have installed in our largest container heavily influenced by Kaggle Kernels):
|
Can we please try to separate the red herring of 4-part versions with the urgent bug of no releases having been made for 8 months, which was raised e.g. in #3274? |
@mirekphd BTW, currently, the release process is manually. It will be better if we can fully automate it, so that we can have a more frequent release. |
Excellent news! I did not know that such large improvements were still possible! It means that in By the way, I happen to know that there is still a room for substantial improvement in your CPU implementation for a very frequent use case, but now I will wait for your 3.0.0 release to see if my ideas will still work in that version too before making them public. |
@mirekphd |
Hello, just to be sure, are we migrating to the 4-part versioning or no? We're already at 3.0.0.99 after all. But yes, having more releases would be nice, maybe it would be a good time to launch a new one :) Should we close this issue? |
I believe that current 4-part versioning has a bit different semantics. #3344 (comment)
Already is in progress: #3484! 🙂 |
@jameslamb I think we can close this. Seems this maintenance burden doesn't worth it. One synced release for all components is better I believe. WDYT? |
Seems I was outvoted on this, yes. |
The question of infrequent releases has returned. Currently there has been no new tag or release added for half a year. If manual tagging is an excessive burden, then maybe adding automated daily tags instead of the patch version, e.g. in the format:
would work for you? This can be probably easily automated, of course at the cost of violating some semantic versioning rules, e.g. running a risk of introducing breaking changes without proper warning to the users (via an increase in the major version). For such auto-tagged releases no release notes are expected either, so it's enough to ensure that code committed to the Of course you can make the auto-release frequency lower than daily (e.g. monthly) and use version increments, but then users would start expecting some release notes. @jameslamb it seems this is still an unresolved issue - why not reopen it here or create a new one to address the low update frequency part? |
Thanks @mirekphd . I promise, I understand the frustration with how long this project has gone without a release. I've described some of that pain in #5153. Operational concerns like "manual tagging ... burden" are not the main reasons LightGBM has gone so long without a new release. Some projects started 18+ months ago (e.g. #3234, items under "CUDA" at #5153) promised to introduce significant breaking changes on cc @shiyu1994 @StrikerRUS @jmoralez @guolinke if you want to add anything else |
@jameslamb we can focus on "breaking" changes first, and make the next release faster. |
That would be great. I really hope we can do a release soon. |
This issue has been automatically locked since there has not been any recent activity since it was closed. |
I'd like to open this request for comment to discuss a proposal.
After releasing v 3.0.0 (#3071 ), I'd like to propose that we use 4-part version numbers for language wrappers, broken down like this:
So, for example, if you see version
3.1.0.8
of the R package, that means "the 8th released version of the R package which wrapsLightGBM
version 3.1.0".Example
The examples below don't propose that every new merge to
master
becomes a release, but the changes below are examples used to show what might cause different components of a 4-part version number to change.Event 1: 3.0.0 is release
LightGBM
version set to 3.0.0lightgbm
(Python) 3.0.0.0 released to PyPi{lightgbm}
(R) 3.0.0.0 released to CRANLightGBM
(lib for .NET extensions) 3.0.0.0 released to NuGetEvent 2: bug fix to LightGBM, like fixing #3209
LightGBM
version set to 3.0.1lightgbm
(Python) 3.0.1.0 released to PyPi{lightgbm}
(R) 3.0.1.0 released to CRANLightGBM
(lib for .NET extensions) 3.0.1.0 released to NuGetEvent 3: bug fix in {lightgbm} (R), like #3117
{lightgbm}
(R) 3.0.1.1 released to CRANEvent 4: LightGBM adds a new type of boosting, like #2644
LightGBM
version set to 3.1.0lightgbm
(Python) 3.1.0.0 released to PyPi{lightgbm}
(R) 3.1.0.0 released to CRANLightGBM
(lib for .NET extensions) 3.1.0.0 released to NuGetHow this makes LightGBM better
This approach would allow us to release fixes to individual components of LightGBM more frequently.
This would allow us to avoid the current situation, where the PyPi package (for example), has not had an update in 7 months: https://pypi.org/project/lightgbm/#history. More frequent updates allow our users to rely on package managers more, instead of building from GitHub, which I think is a better user experience.
Releasing more frequently would also reduce the gap between the current state of this repo and the documentation at https://lightgbm.readthedocs.io/en/latest/, so that that documentation is more likely to answer a user's questions accurately.
Allowing the version numbers to be different between R and Python (for example), is important since this two libraries are at very different stages in their development. The R package is still somewhat immature and there is a lot of work ahead for it, while the Python package is fairly mature and stable by comparison. A 4-part version number would allow the R package to be more frequently updated than the Python package, while preserving the use of the first three version components for LightGBM itself..
The text was updated successfully, but these errors were encountered: