-
Notifications
You must be signed in to change notification settings - Fork 904
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor pinned memory vector and ORC+Parquet writers #13206
Refactor pinned memory vector and ORC+Parquet writers #13206
Conversation
This reverts commit 01d35ea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving ops-codeowner
file changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, pinned memory is always on the host side thus not sure this renaming is really needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO the name *_host_vector
is better expressing its purpose, similar to having thurst::host_vector
instead of just thrust::vector
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read the name as host_vector in pinned memory, so the name looks good.
* @brief Helper for pinned host memory | ||
*/ | ||
template <typename T> | ||
using pinned_buffer = std::unique_ptr<T, decltype(&cudaFreeHost)>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good, with one minor concern. For places that already used host_vector
, there's no change. But, for this use case, we are introducing initialization into the memory that does not need to be initialized (and previously wasn't). If you don't mind, please run ORC or Parquet writer benchmarks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I didn't see where is the initialization that you mentioned? The new pinned_host_vector
uses allocator that also doesn't initialize the internal buffer:
__host__ inline pointer allocate(size_type cnt, const_pointer /*hint*/ = 0)
{
if (cnt > this->max_size()) { throw std::bad_alloc(); } // end if
pointer result(0);
CUDF_CUDA_TRY(cudaMallocHost(reinterpret_cast<void**>(&result), cnt * sizeof(value_type)));
return result;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see benchmarks here: #13206 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought that host_vector has its own initialization outside of the allocator (same as std::vector
). Either way, it does not seem to impact the overall performance.
@vuule Here is some bechmarks for ORC writer:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for taking the time to make these improvements.
/merge |
Similar to #12949, this refactors Parquet writer to support retry mechanism. The internal `writer::impl::write()` function is rewritten such that it is separated into multiple pieces: * A free function that performs compressing/encoding the input table into intermediate results. These intermediate results are totally independent of the writer. * After having the intermediate results in the previous step, these results will be actually applied to the output data sink to start the actual data writing. Closes: * #13042 Depends on: * #13206 Authors: - Nghia Truong (https://github.com/ttnghia) Approvers: - Vukasin Milovanovic (https://github.com/vuule) - https://github.com/nvdbaranec URL: #13076
Currently,
cudf::detail::pinned_allocator
is used in various places to implement a pinned host vector. This standardizes such usage, removingcudf::detail::pinned_allocator
from the usage sites and replacing its usage by a standardcudf::detail::pinned_host_vector
instead.Some small changes are also made for ORC/Parquet writer classes, replacing
bool _single_write_mode
bySingleWriteMode _single_write_mode
.This is needed for #13076.