-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
clear memory allocated for sampled data when constructing Dataset from text file #4890
Conversation
…ave memory Sample data is useless after BinMapper is constructed, but the corresponding memory is still there before feature extraction is finished.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
excellent, thanks for this!
I support this change.
Since users can configure how much data is sampled during the calculation of bins by setting parameter bin_construct_sample_cnt
, it's definitely possible for the sampled data to have a noticable memory footprint.
However, a reviewer more familiar with C++, like @shiyu1994 or @guolinke , should probably review this before it's merged.
LightGBM/src/io/dataset_loader.cpp
Lines 926 to 927 in 9f13a9c
std::vector<std::string> DatasetLoader::SampleTextDataFromMemory(const std::vector<std::string>& data) { | |
int sample_cnt = config_.bin_construct_sample_cnt; |
@xuchuanyin I hope you don't mind, I've proposed a re-wording of this pull request's title. Since pull request titles become items in this project's release notes, I've proposed a title that I think will be a bit easier to understand for users of LightGBM who aren't familiar with its internals. |
@jameslamb yeah, the optimized title is much better |
/gha run r-valgrind Workflow R valgrind tests has been triggered! 🚀 Status: success ✔️. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution!
Sorry for the late response. I was busy with another project in previous days. And now I'm back.
This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
Sample data is useless after BinMapper is constructed, but the corresponding memory is still there before feature extraction is finished.