Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add struct accessor to dask-cudf #8874

Merged
merged 25 commits into from
Aug 24, 2021

Conversation

NV-jpt
Copy link
Contributor

@NV-jpt NV-jpt commented Jul 27, 2021

This PR implements 'Struct Accessor' requested feature in dask-cudf (Issue #8658)

StructMethod class implemented to expose 'field(key)' method in dask-cudf

    Examples
    --------
    >>> s = cudf.Series([{'a': 1, 'b': 2}, {'a': 3, 'b': 4}])
    >>> ds = dask_cudf.from_cudf(s, 2)
    >>> ds.struct.field(0).compute()
    0    1
    1    3
    dtype: int64
    >>> ds.struct.field('a').compute()
    0    1
    1    3
    dtype: int64

@NV-jpt NV-jpt requested review from a team as code owners July 27, 2021 21:34
@github-actions github-actions bot added the Python Affects Python cuDF API. label Jul 27, 2021
@rjzamora rjzamora added 2 - In Progress Currently a work in progress dask Dask issue improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Jul 27, 2021
Copy link
Member

@rjzamora rjzamora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this @NV-jpt - Looking good so far! It looks like you need to resolve some conflicts with branch-21.10 and add pre-commit to your environment to help with formatting.

python/dask_cudf/dask_cudf/accessors.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/column/column.py Outdated Show resolved Hide resolved
python/dask_cudf/dask_cudf/tests/test_accessor.py Outdated Show resolved Hide resolved
@rjzamora rjzamora marked this pull request as draft July 27, 2021 22:02
@rjzamora
Copy link
Member

@sarahyurick - I suspect that the most-natural place for the dask-cudf explode feature (requested in #8660) is the struct accessor being added here. It looks like #8729 added the cudf-level feature within the cudf.Series-struct accessor, and not for Series.explode (is this correct @shwina ?)

Copy link
Member

@rjzamora rjzamora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is getting very close - Nice work @NV-jpt !

python/dask_cudf/dask_cudf/accessors.py Outdated Show resolved Hide resolved
python/dask_cudf/dask_cudf/accessors.py Outdated Show resolved Hide resolved
python/dask_cudf/dask_cudf/tests/test_accessor.py Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Jul 29, 2021

Codecov Report

❗ No coverage uploaded for pull request base (branch-21.10@0789a81). Click here to learn what that means.
The diff coverage is n/a.

❗ Current head 91125e7 differs from pull request most recent head b552833. Consider uploading reports for the commit b552833 to get more accurate results
Impacted file tree graph

@@               Coverage Diff               @@
##             branch-21.10    #8874   +/-   ##
===============================================
  Coverage                ?   10.77%           
===============================================
  Files                   ?      114           
  Lines                   ?    19069           
  Branches                ?        0           
===============================================
  Hits                    ?     2054           
  Misses                  ?    17015           
  Partials                ?        0           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0789a81...b552833. Read the comment docs.

Copy link
Member

@rjzamora rjzamora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM. My only suggestion is to avoid repeating the same "data" parameters in test_accessor.py (see my comment). Also, could you make the title of the PR slightly more descriptive? For example: "Add struct accessor to dask-cudf"

python/dask_cudf/dask_cudf/tests/test_accessor.py Outdated Show resolved Hide resolved
@NV-jpt NV-jpt changed the title Struct Accessor Add struct accessor to dask-cudf Aug 18, 2021
@NV-jpt
Copy link
Contributor Author

NV-jpt commented Aug 19, 2021

rerun tests

@shwina
Copy link
Contributor

shwina commented Aug 23, 2021

rerun tests

1 similar comment
@beckernick
Copy link
Member

rerun tests

@shwina
Copy link
Contributor

shwina commented Aug 24, 2021

rerun tests

Copy link
Contributor

@shwina shwina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

@shwina
Copy link
Contributor

shwina commented Aug 24, 2021

@gpucibot merge

@rapids-bot rapids-bot bot merged commit abba33f into rapidsai:branch-21.10 Aug 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2 - In Progress Currently a work in progress dask Dask issue improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants