-
Notifications
You must be signed in to change notification settings - Fork 902
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Higher memory footprint when writing strings to orc #7661
Comments
Narrowed it down to the size of cudf/cpp/src/io/orc/writer_impl.cu Line 606 in 34cccfe
It is 1.72 GB but it shouldn't be. |
I messed with this code in a recent PR, can take a look today. |
It looks like the code hasn't changed in function. The part which is supposed to calculate the amount of memory to allocate is unchanged. |
Fixes #7661 Corrects the field order in `std::accumulate` that computes the string column size w.r.t encoding. Authors: - Vukasin Milovanovic (@vuule) Approvers: - Kumar Aatish (@kaatish) - Ram (Ramakrishna Prabhu) (@rgsl888prabhu) URL: #7737
Addresses #7661. Dictionary related device_uvector were released after use. Authors: - Kumar Aatish (https://github.com/kaatish) - Vukasin Milovanovic (https://github.com/vuule) Approvers: - Devavret Makkar (https://github.com/devavret) - Vukasin Milovanovic (https://github.com/vuule) URL: #7719
Describe the bug
df.to_orc
has a higher memory footprint in 0.19 nightlies vs 0.18 when writing string columns.Steps/Code to reproduce bug
Peak memory usage in 0.18:
5864 MB
Peak memory usage in 0.19 @ a568432:
8432MB
Expected behavior
Similar memory usage
Environment overview (please complete the following information)
docker pull
&docker run
commands usedAdditional Context
Script used to measure memory usage.
cc: @randerzander
The text was updated successfully, but these errors were encountered: