-
Notifications
You must be signed in to change notification settings - Fork 907
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Apply metadata to keys before returning in Frame._encode
#8560
Conversation
The tests are failing when comparing > assert_eq(
pdf.unstack(level=level), gdf.unstack(level=level), check_dtype=False,
)
cudf/tests/test_reshape.py:425:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
cudf/tests/utils.py:99: in assert_eq
tm.assert_frame_equal(left, right, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
left = CategoricalIndex(['A', 'B', 'C', 'A', 'B', 'C'], categories=['A', 'B', 'C'], ordered=False, name='bar', dtype='category')
right = Index(['A', 'B', 'C', 'A', 'B', 'C'], dtype='object', name='bar'), obj = 'MultiIndex level [2]'
def _check_types(left, right, obj="Index"):
if exact:
> assert_class_equal(left, right, exact=exact, obj=obj)
E AssertionError: MultiIndex level [2] are different
E
E MultiIndex level [2] classes are not equivalent
E [left]: CategoricalIndex(['A', 'B', 'C', 'A', 'B', 'C'], categories=['A', 'B', 'C'], ordered=False, name='bar', dtype='category')
E [right]: Index(['A', 'B', 'C', 'A', 'B', 'C'], dtype='object', name='bar')
../../../compose/etc/conda/cuda_11.2/envs/rapids/lib/python3.8/site-packages/pandas/_testing.py:740: AssertionError It looks like this is because when we are constructing a Pandas MultiIndex from the cuDF dataframe's column accessor, we set a dtype of cudf/python/cudf/cudf/core/column_accessor.py Lines 246 to 261 in d183d50
Is there anything we could do here to recreate the exact Pandas MultiIndex, or should we just create a different test case for this change that doesn't check equality with Pandas? |
Ended up xfailing the tests, as it seems like the underlying issue here is that we don't fully support categorical column indexes. Opened up an issue discussing this #8743 |
Codecov Report
@@ Coverage Diff @@
## branch-21.08 #8560 +/- ##
================================================
+ Coverage 10.66% 10.67% +0.01%
================================================
Files 109 109
Lines 18302 18669 +367
================================================
+ Hits 1951 1993 +42
- Misses 16351 16676 +325
Continue to review full report at Codecov.
|
@gpucibot merge |
Fixes #7365
Applies column metadata to the output columns of
keys
inFrame._encode
; skipping this step meant that the output ofDataFrame.unstack
would not have the expected metadata for index columns: