Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clear memory allocated for sampled data when constructing Dataset from text file #4890

Merged
merged 1 commit into from
Dec 23, 2021

Conversation

xuchuanyin
Copy link
Contributor

Sample data is useless after BinMapper is constructed, but the corresponding memory is still there before feature extraction is finished.

…ave memory

Sample data is useless after BinMapper is constructed, but the corresponding memory is still there before feature extraction is finished.
Copy link
Collaborator

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

excellent, thanks for this!

I support this change.

Since users can configure how much data is sampled during the calculation of bins by setting parameter bin_construct_sample_cnt, it's definitely possible for the sampled data to have a noticable memory footprint.

However, a reviewer more familiar with C++, like @shiyu1994 or @guolinke , should probably review this before it's merged.

std::vector<std::string> DatasetLoader::SampleTextDataFromMemory(const std::vector<std::string>& data) {
int sample_cnt = config_.bin_construct_sample_cnt;

@jameslamb jameslamb changed the title clear memory of sample data right after BinMapper is constructed to save memory clear memory allocated for sampled data when constructing Dataset from text file Dec 14, 2021
@jameslamb
Copy link
Collaborator

@xuchuanyin I hope you don't mind, I've proposed a re-wording of this pull request's title. Since pull request titles become items in this project's release notes, I've proposed a title that I think will be a bit easier to understand for users of LightGBM who aren't familiar with its internals.

@xuchuanyin
Copy link
Contributor Author

@jameslamb yeah, the optimized title is much better

@jameslamb
Copy link
Collaborator

jameslamb commented Dec 22, 2021

/gha run r-valgrind

Workflow R valgrind tests has been triggered! 🚀
https://github.com/microsoft/LightGBM/actions/runs/1612905185

Status: success ✔️.

Copy link
Collaborator

@shiyu1994 shiyu1994 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution!

Sorry for the late response. I was busy with another project in previous days. And now I'm back.

@shiyu1994 shiyu1994 merged commit 2ef3cb8 into microsoft:master Dec 23, 2021
@StrikerRUS StrikerRUS mentioned this pull request Jan 6, 2022
13 tasks
@jameslamb jameslamb mentioned this pull request Oct 7, 2022
40 tasks
@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants