Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Improve performance of loading long strings from Parquet #7545

Closed
jlowe opened this issue Mar 10, 2021 · 3 comments
Closed

[FEA] Improve performance of loading long strings from Parquet #7545

jlowe opened this issue Mar 10, 2021 · 3 comments
Labels
feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. Performance Performance related issue strings strings issues (C++ and Python)

Comments

@jlowe
Copy link
Member

jlowe commented Mar 10, 2021

Is your feature request related to a problem? Please describe.
Loading a Parquet dataset containing relatively long strings per row (e.g.: 500+ characters per row) takes quite a bit of time due to the time spent in make_strings_column as shown in this Nsight Systems trace:
image

It looks like make_strings_column may be using a row-level parallelism algorithm which will not perform well when there are a large number of characters per row.

Describe the solution you'd like
Ideally Parquet string decoding for long strings should be fast, whether that be via optimizing make_strings_column or using a different approach to string decoding altogether. Updating make_strings_column to use a char-parallel algorithm may be appropriate.

@jlowe jlowe added feature request New feature or request Needs Triage Need team to review and classify libcudf Affects libcudf (C++/CUDA) code. strings strings issues (C++ and Python) labels Mar 10, 2021
@jrhemstad jrhemstad removed the Needs Triage Need team to review and classify label Mar 10, 2021
@jrhemstad
Copy link
Contributor

I believe this is the code that is suffering on long strings:

auto copy_chars = [d_chars] __device__(auto item) {
string_index_pair str = thrust::get<0>(item);
size_type offset = thrust::get<1>(item);
if (str.first != nullptr) memcpy(d_chars + offset, str.first, str.second);
};
thrust::for_each_n(rmm::exec_policy(stream),
thrust::make_zip_iterator(
thrust::make_tuple(begin, offsets_column->view().template begin<int32_t>())),
strings_count,
copy_chars);

This could use the same treatment as your optimization to gather. In fact, I wonder if there's a way to cast this factory as a gather in order to take of advantage of the optimization that is already there.

@davidwendt
Copy link
Contributor

This seems similar if not the same as #7571
The make_strings_column was improved for long strings in #7576
Can this be closed?

@jlowe
Copy link
Member Author

jlowe commented Mar 23, 2021

Yes, this is much improved.

@jlowe jlowe closed this as completed Mar 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. Performance Performance related issue strings strings issues (C++ and Python)
Projects
None yet
Development

No branches or pull requests

3 participants