-
Notifications
You must be signed in to change notification settings - Fork 902
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add column constructor from device_uvector&& #11356
Add column constructor from device_uvector&& #11356
Conversation
This seems unnecessary since we have a constructor that accepts moving a |
Oh yeah, the existing constructor seems to be enough, right:
|
I think this would be a helpful feature for simplifying code in libcudf. In a lot of places, it would reduce the number of explicitly passed constructor arguments from three ( This move constructor benefits correctness: it prevents users from providing a While it's true that the snippet from @ttnghia quoted below is equivalent, I think there is a significant benefit in brevity if we can directly construct the
We've previously settled in favor of adding these types of conversions for non-owning types (#10302, #9656). Previous input from @jrhemstad and others have also indicated support for this: #9656 (comment) I think this is a straightforward extension of work we've already done for ensuring semantically safe, accurate, and short code in libcudf. Could we measure the build time changes to address @davidwendt's concern about includes? I don't expect a significant change because |
Perhaps we should remove the device-buffer constructor then? |
My concern for the extra include goes beyond compile time. More include files increases the burden on building applications using libcudf functions. |
cpp/include/cudf/column/column.hpp
Outdated
template <typename T> | ||
column(rmm::device_uvector<T>&& other) noexcept | ||
: _type{cudf::type_to_id<T>()}, | ||
_size{static_cast<cudf::size_type>(other.size())}, | ||
_data{other.release()} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be constrained in some way to prevent construction from a device_uvector<string_view>
or device_uvector<list_view>
.
It should probably be:
template <typename T, CUDF_ENABLE_IF(is_fixed_width<T>())>
Double check if fixed-point types are considered fixed-width, because I don't think you want this for a device_uvector<decimal32/64>
either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed point types are fixed width, and we don't want those. We probably want the same condition as what we use for the non-owning column_view
from device_span
construction: cudf::is_numeric<T>() or cudf::is_chrono<T>()
cudf/cpp/include/cudf/column/column_view.hpp
Line 428 in e7e5f45
template <typename T, CUDF_ENABLE_IF(cudf::is_numeric<T>() or cudf::is_chrono<T>())> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another reason this should probably just be a make_fixed_width_column
factory function instead of a constructor.
This factory: https://docs.rapids.ai/api/libcudf/stable/group__column__factories.html#ga595aed5d1e6f3d4b3d8da9eeffed916d
takes a buffer for a null mask. It seems reasonable to add one that accepts a device-uvector.
Here is one we use for strings that takes 2 uvectors with move semantics: https://docs.rapids.ai/api/libcudf/stable/group__column__factories.html#ga86f7623f0d230c96491ef88d665385cc
Another argument for adopting this as a constructor: it's easy to write a significant bug with the current constructor from Consider the following: rmm::device_uvector<int> v{...};
column c{type_to_id<int>(), v.size(), v.release()}; This contains a bug: there is no guaranteed ordering in which
A constructor from
On top of that, there are a couple smaller issues:
Credit to @jrhemstad for mentioning this to me. |
This is a very good point. I would still like to consider a |
Codecov Report
@@ Coverage Diff @@
## branch-22.10 #11356 +/- ##
===============================================
Coverage ? 86.47%
===============================================
Files ? 144
Lines ? 22856
Branches ? 0
===============================================
Hits ? 19764
Misses ? 3092
Partials ? 0 Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
I totally support this.
If using the current implementation of this PR, I will have to do this:
If we implement a
|
This is a very easy-to-step-on bug. I just got it myself, when implementing the code above ^^. |
The ctor in this PR just needs to be updated to optionally accept a null mask and null count.
|
I spoke with @jrhemstad and @davidwendt over the last few days and I have a few final notes from those discussions that I'll share here. My understanding of the historical evolution of constructors-vs-factories (see #2207, #3252, #3461 (comment)) is that all factories except strings factories construct columns with uninitialized buffers. Strings factories (#2811, #7397) are the primary exception, by accepting initialized data buffers of characters and offsets. The proposal in this issue is to make a constructor. I think this is the right path because the other existing In some cases, the addition of this constructor will mean that developers will follow a pattern like "create The possibility of a "slippery slope" is that we could make |
Co-authored-by: Nghia Truong <[email protected]>
…thub.com/SrikarVanavasam/cudf into column_constructor_from_device_uvector&&
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @SrikarVanavasam!
Co-authored-by: Nghia Truong <[email protected]>
@gpucibot merge |
This extends the `cudf::contains` API to support nested types (lists + structs) with arbitrarily nested levels. As such, `cudf::contains` will work with literally any type of input data. In addition, this fixes null handling of `cudf::contains` with structs column + struct scalar input when the structs column contains null rows at the top level while the scalar key is valid but all nulls at children levels. Closes: #8965 Depends on: * #10730 * #10883 * #10802 * #10997 * NVIDIA/cuCollections#172 * NVIDIA/cuCollections#173 * #11037 * #11356 Authors: - Nghia Truong (https://github.com/ttnghia) - Devavret Makkar (https://github.com/devavret) - Bradley Dice (https://github.com/bdice) - Karthikeyan (https://github.com/karthikeyann) Approvers: - AJ Schmidt (https://github.com/ajschmidt8) - Bradley Dice (https://github.com/bdice) - Yunsong Wang (https://github.com/PointKernel) URL: #10656
Closes #11115
This PR adds a
column
constructor to be constructible from adevice_uvector&&
using move semantics.