-
Notifications
You must be signed in to change notification settings - Fork 902
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add column type tests #8505
Add column type tests #8505
Conversation
Co-authored-by: David Wendt <[email protected]>
cpp/src/utilities/type_checks.cpp
Outdated
return lhs.num_children() > 0 and rhs.num_children() > 0 | ||
? lhs.child(kidx).type() == rhs.child(kidx).type() | ||
: true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would return true
if either lhs
or rhs
were empty but the other was not.
return lhs.num_children() > 0 and rhs.num_children() > 0 | |
? lhs.child(kidx).type() == rhs.child(kidx).type() | |
: true; | |
return lhs.num_children() > 0 and rhs.num_children() > 0 | |
? lhs.child(kidx).type() == rhs.child(kidx).type() | |
: lhs.is_empty() == rhs.is_empty(); |
I see there is a test for this case, so was that intended?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing this out. From arrow:
In [16]: empty = pa.array(pd.Series([], dtype="category"), from_pandas=True)
In [18]: arr = pa.array(pd.Series([1, 1, 2], dtype="category"), from_pandas=True)
In [19]: empty.type == empty.type
Out[19]: True
In [20]: arr.type == empty.type
Out[20]: False
So what you suggested above is correct.
* @return false `T` is not structs-type | ||
*/ | ||
template <typename T> | ||
constexpr inline bool is_structs() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about the plural "s" here and with is_lists
. All the other traits are singular like is_fixed_width
or is_chrono
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, it probably make more sense align with other namings in traits.hpp
.
I got the plural form idea from lists_column_view
and structs_column_view
and was under the impression they were named that way since.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why notis_list_view
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The below reveals the fact that we use list_view
as an alias for LIST
type during type-dispatching, which is too detailed IMO.
template <typename T, CUDF_ENABLE_IF(is_list_view<T>())>
while
template <typename T, CUDF_ENABLE_IF(is_list<T>())>
seems just clear enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But LIST
is not a C++ type, and is_X()
is a type trait. After all, is_list<T>()
is just syntactic sugar for std::is_same<T, list_view>
, so it's barely needed at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously the code was doing rhs.type().id()==type_id::LIST
and rhs.type().id()==type_id::STRUCT
so the thought was to make is_list(data_type)
and is_struct(data_type)
functions. Since the code is now using the type-dispatcher, these are no longer required since we can just do std::is_same
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might be able to get rid of is_same
in all if we use explicit specialization:
template <>
bool columns_equal_fn::operator()<list_view>(column_view const& lhs, column_view const& rhs)
Codecov Report
@@ Coverage Diff @@
## branch-21.08 #8505 +/- ##
===============================================
Coverage ? 83.01%
===============================================
Files ? 109
Lines ? 18225
Branches ? 0
===============================================
Hits ? 15129
Misses ? 3096
Partials ? 0 Continue to review full report at Codecov.
|
rerun tests |
Just wondering if this function is something that will be used outside of libcudf? Like from Python cudf or Spark Java? the #8357 issue only mentions internal libcudf. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving ops-codeowner
file changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CMake changes LGTM
@gpucibot merge |
Addresses column requests for #8357
This PR adds nested type checks for
cudf::column
.