Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CUDA] CUDA Quantized Training (fixes #5606) #5933

Merged
merged 52 commits into from
Oct 8, 2023
Merged

Conversation

shiyu1994
Copy link
Collaborator

@shiyu1994 shiyu1994 commented Jun 16, 2023

Fixes #5606.

Adds quantized training for CUDA version.

fix msvc compilation errors and warnings
@jameslamb jameslamb mentioned this pull request Sep 8, 2023
@jameslamb jameslamb changed the title [CUDA] CUDA Quantized Training [CUDA] CUDA Quantized Training (fixes #5606) Sep 8, 2023
@shiyu1994
Copy link
Collaborator Author

@guolinke This is ready. Please check.

@@ -40,6 +40,9 @@ CUDABestSplitFinder::CUDABestSplitFinder(
select_features_by_node_(select_features_by_node),
cuda_hist_(cuda_hist) {
InitFeatureMetaInfo(train_data);
if (has_categorical_feature_ && config->use_quantized_grad) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Link #6119

@shiyu1994
Copy link
Collaborator Author

@jameslamb I've enlarged the size limitation for distributed package to 100M. Because we add a few more templates in the PR which add to the size of compiled file. Do you think it is OK?

@jameslamb
Copy link
Collaborator

I've enlarged the size limitation for distributed package to 100M. Because we add a few more templates in the PR which add to the size of compiled file. Do you think it is OK?

Thanks for the @.

For now, since we're not distributing these CUDA wheels on PyPI, I think it's ok. Let's not let it block this PR.

But if we pursue shipping a fat wheel in the future with CUDA support precompiled (like we talked about in Slack), 100MB will be a problem.

There are limits on PyPI for both individual file size and cumulative project size. I don't know the exact numbers but shipping 100MB wheels would put us in the range of hitting them, I think.

See these discussions:

There are also other concerns with such large wheels, e.g. for people using function-as-a-service things like AWS Lambda. See for example:

I'll open a new issue in the next few days to discuss publishing wheels with CUDA support.

@jameslamb
Copy link
Collaborator

I removed the feature label from this and left efficiency. For release-drafter, I think it can only be one of the labels specified here, not multiple:

categories:
- title: '💡 New Features'
label: 'feature'
- title: '🔨 Breaking'
label: 'breaking'
- title: '🚀 Efficiency Improvement'
label: 'efficiency'
- title: '🐛 Bug Fixes'
label: 'fix'
- title: '📖 Documentation'
label: 'doc'
- title: '🧰 Maintenance'
label: 'maintenance'

Copy link
Collaborator

@guolinke guolinke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@shiyu1994 shiyu1994 merged commit f901f47 into master Oct 8, 2023
39 checks passed
@shiyu1994 shiyu1994 deleted the cuda-quantized-training branch October 8, 2023 15:25
Ten0 pushed a commit to Ten0/LightGBM that referenced this pull request Jan 12, 2024
* add quantized training (first stage)

* add histogram construction functions for integer gradients

* add stochastic rounding

* update docs

* fix compilation errors by adding template instantiations

* update files for compilation

* fix compilation of gpu version

* initialize gradient discretizer before share states

* add a test case for quantized training

* add quantized training for data distributed training

* Delete origin.pred

* Delete ifelse.pred

* Delete LightGBM_model.txt

* remove useless changes

* fix lint error

* remove debug loggings

* fix mismatch of vector and allocator types

* remove changes in main.cpp

* fix bugs with uninitialized gradient discretizer

* initialize ordered gradients in gradient discretizer

* disable quantized training with gpu and cuda

fix msvc compilation errors and warnings

* fix bug in data parallel tree learner

* make quantized training test deterministic

* make quantized training in test case more accurate

* refactor test_quantized_training

* fix leaf splits initialization with quantized training

* check distributed quantized training result

* add cuda gradient discretizer

* add quantized training for CUDA version in tree learner

* remove cuda computability 6.1 and 6.2

* fix parts of gpu quantized training errors and warnings

* fix build-python.sh to install locally built version

* fix memory access bugs

* fix lint errors

* mark cuda quantized training on cuda with categorical features as unsupported

* rename cuda_utils.h to cuda_utils.hu

* enable quantized training with cuda

* fix cuda quantized training with sparse row data

* allow using global memory buffer in histogram construction with cuda quantized training

* recover build-python.sh

enlarge allowed package size to 100M
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add quantized training
3 participants