Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add multi-threaded writing to GDS writes #9372

Merged
merged 8 commits into from
Oct 8, 2021

Conversation

devavret
Copy link
Contributor

@devavret devavret commented Oct 5, 2021

Closes #9260

@devavret devavret requested a review from a team as a code owner October 5, 2021 12:38
@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Oct 5, 2021
@devavret devavret added cuIO cuIO issue improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Oct 5, 2021
@codecov
Copy link

codecov bot commented Oct 5, 2021

Codecov Report

Merging #9372 (e04a4a6) into branch-21.12 (ab4bfaa) will decrease coverage by 0.04%.
The diff coverage is 0.00%.

❗ Current head e04a4a6 differs from pull request most recent head 7f78657. Consider uploading reports for the commit 7f78657 to get more accurate results
Impacted file tree graph

@@               Coverage Diff                @@
##           branch-21.12    #9372      +/-   ##
================================================
- Coverage         10.79%   10.74%   -0.05%     
================================================
  Files               116      116              
  Lines             18869    19081     +212     
================================================
+ Hits               2036     2051      +15     
- Misses            16833    17030     +197     
Impacted Files Coverage Δ
python/cudf/cudf/__init__.py 0.00% <0.00%> (ø)
python/cudf/cudf/_lib/__init__.py 0.00% <ø> (ø)
python/cudf/cudf/core/_base_index.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/column/categorical.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/column/column.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/column/datetime.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/column/lists.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/column/numerical.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/column/string.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/column/struct.py 0.00% <0.00%> (ø)
... and 26 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c6bc111...7f78657. Read the comment docs.

cpp/src/io/parquet/writer_impl.cu Outdated Show resolved Hide resolved
cpp/src/io/utilities/file_io_utilities.cpp Outdated Show resolved Hide resolved
cpp/src/io/utilities/file_io_utilities.cpp Outdated Show resolved Hide resolved
cpp/src/io/utilities/file_io_utilities.cpp Outdated Show resolved Hide resolved
cpp/src/io/utilities/file_io_utilities.cpp Outdated Show resolved Hide resolved
cpp/src/io/utilities/file_io_utilities.cpp Outdated Show resolved Hide resolved
cpp/src/io/utilities/file_io_utilities.cpp Outdated Show resolved Hide resolved
cpp/src/io/utilities/file_io_utilities.hpp Show resolved Hide resolved
cpp/tests/io/parquet_test.cpp Outdated Show resolved Hide resolved
@devavret devavret requested review from bdice and vuule October 6, 2021 20:55
Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few more minor comments. Thanks!

cpp/include/cudf/io/data_sink.hpp Outdated Show resolved Hide resolved
cpp/src/io/orc/writer_impl.cu Outdated Show resolved Hide resolved
cpp/src/io/orc/writer_impl.cu Outdated Show resolved Hide resolved
cpp/src/io/utilities/data_sink.cpp Show resolved Hide resolved
cpp/src/io/utilities/data_sink.cpp Outdated Show resolved Hide resolved
cpp/src/io/utilities/data_sink.cpp Outdated Show resolved Hide resolved
cpp/src/io/utilities/file_io_utilities.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work @devavret! LGTM. Are the CI failures related or not?

@devavret
Copy link
Contributor Author

devavret commented Oct 7, 2021

Nice work @devavret! LGTM. Are the CI failures related or not?

The java CI failure has been seen for a while on other PRs too.

Copy link
Contributor

@vuule vuule left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just one question before ✔️

cpp/src/io/utilities/file_io_utilities.cpp Outdated Show resolved Hide resolved
Comment on lines +1443 to +1445
for (auto const& task : write_tasks) {
task.wait();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this out of the loop so that we only wait after all encoding is done? Hard to tell if buffers are reused between batches.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hard to tell if buffers are reused between batches.

It is indeed.

@devavret devavret requested a review from vuule October 8, 2021 17:55
@devavret
Copy link
Contributor Author

devavret commented Oct 8, 2021

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 8bb1e86 into rapidsai:branch-21.12 Oct 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuIO cuIO issue improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Perform GDS writes in parallel
3 participants