Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixed_point + cudf::binary_operation API Changes #7435

Merged
merged 35 commits into from
Mar 12, 2021

Conversation

codereport
Copy link
Contributor

@codereport codereport commented Feb 24, 2021

This resolves #7442

Recently while working with @razajafri on fixed_point binary ops, it became clear that the cudf::binary_operation is breaking the "easy to use, hard to misuse" # 1 design guideline. I knew about this but I slotted it as technical debt to be cleaned up later. Long story short, after discussions with both @razajafri, @jrhemstad and comments on the #7442, we will implement the following:

  • For fixed_point + cudf::binary_operation + DIV always use the cudf::data_type output_type parameter
  • For fixed_point + cudf::binary_operation + TRUE_DIV, require that the columns/scalars provided as arguments (lhs and rhs) will result in the specified data_type/scale
  • Provide a convenience function (something like binary_operation_fixed_point_scale()) that will compute the "expected" scale given two input columns/scalars and a binary_operator
  • Remove TRUE_DIV
  • Add unit tests for different output data_types
  • Update Python/Cython

This will be a breaking change for all fixed_point + cudf::binary_operation.

@codereport codereport added 2 - In Progress Currently a work in progress libcudf Affects libcudf (C++/CUDA) code. tech debt improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Feb 24, 2021
@codereport codereport self-assigned this Feb 24, 2021
@razajafri
Copy link
Contributor

Thanks for the quick turn around

@codereport
Copy link
Contributor Author

Thanks for the quick turn around

Well this PR is going to be modified based on the proposal laid out here: #7442. Post any comments if you have them.

@codereport codereport changed the title Add error for fixed_point cudf::binary_operation specified output_type parameter fixed_point + cudf::binary_operation API Changes Feb 25, 2021
@codecov
Copy link

codecov bot commented Feb 25, 2021

Codecov Report

Merging #7435 (43c2def) into branch-0.19 (53929eb) will increase coverage by 0.42%.
The diff coverage is 92.85%.

Impacted file tree graph

@@               Coverage Diff               @@
##           branch-0.19    #7435      +/-   ##
===============================================
+ Coverage        81.88%   82.30%   +0.42%     
===============================================
  Files              101      101              
  Lines            16900    17273     +373     
===============================================
+ Hits             13838    14216     +378     
+ Misses            3062     3057       -5     
Impacted Files Coverage Δ
python/cudf/cudf/core/column/decimal.py 94.73% <90.00%> (-1.10%) ⬇️
python/cudf/cudf/core/dtypes.py 91.13% <100.00%> (+0.86%) ⬆️
python/cudf/cudf/utils/dtypes.py 89.51% <100.00%> (ø)
python/cudf/cudf/io/feather.py 100.00% <0.00%> (ø)
python/cudf/cudf/comm/serialize.py 0.00% <0.00%> (ø)
python/cudf/cudf/_fuzz_testing/io.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/column/struct.py 100.00% <0.00%> (ø)
python/dask_cudf/dask_cudf/_version.py 0.00% <0.00%> (ø)
python/dask_cudf/dask_cudf/io/tests/test_csv.py 100.00% <0.00%> (ø)
python/dask_cudf/dask_cudf/io/tests/test_orc.py 100.00% <0.00%> (ø)
... and 40 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 53929eb...43c2def. Read the comment docs.

@github-actions github-actions bot added the Python Affects Python cuDF API. label Mar 4, 2021
@codereport
Copy link
Contributor Author

I am not able to get a BOOL8 column when comparing Decimal columns with a scale of 0. Other scales are working as expected.

dec64_c1.binaryOp(EQUAL, dec64_c2, BOOL8) => Decimal64 column

I have tried other predicates and they are also returning Decimal64 col. Can you please check if this is a regression in this PR?

Fixed! This was an awesome catch! Thank you!

Copy link
Contributor

@hyperbolic2346 hyperbolic2346 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall here. I think this cleans things up.

cpp/src/binaryop/binaryop.cpp Outdated Show resolved Hide resolved
cpp/src/binaryop/binaryop.cpp Outdated Show resolved Hide resolved
cpp/src/binaryop/binaryop.cpp Outdated Show resolved Hide resolved
cpp/tests/binaryop/binop-integration-test.cpp Show resolved Hide resolved
@codereport
Copy link
Contributor Author

rerun tests

@codereport codereport added breaking Breaking change and removed non-breaking Non-breaking change labels Mar 11, 2021
@razajafri
Copy link
Contributor

@shwina @trxcllnt @brandon-b-miller @jrhemstad Can we get this going as my other PRs are blocked by this.

@shwina
Copy link
Contributor

shwina commented Mar 12, 2021

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 04f9021 into rapidsai:branch-0.19 Mar 12, 2021
rapids-bot bot pushed a commit that referenced this pull request Mar 16, 2021
@codereport is making changes to the way `DIV` will behave for fixed-point types #7435. This PR contains Java changes to support those changes. 

Note: This is a draft until #7435 is merged

Authors:
  - Raza Jafri (@razajafri)

Approvers:
  - MithunR (@mythrocks)
  - Jason Lowe (@jlowe)
  - Gera Shegalov (@gerashegalov)

URL: #7527
hyperbolic2346 pushed a commit to hyperbolic2346/cudf that referenced this pull request Mar 25, 2021
This resolves rapidsai#7442

Recently while working with @razajafri on `fixed_point` binary ops, it became clear that the `cudf::binary_operation` is breaking the "easy to use, **hard to misuse**" # 1 design guideline. I knew about this but I slotted it as technical debt to be cleaned up later. Long story short, after discussions with both @razajafri, @jrhemstad and comments on the rapidsai#7442, we will implement the following:

* [x] For `fixed_point` + `cudf::binary_operation` + `DIV` always **use** the `cudf::data_type output_type` parameter
* [x] ~~For `fixed_point` + `cudf::binary_operation` + `TRUE_DIV`, require that the columns/scalars provided as arguments (`lhs` and `rhs`) will result in the specified `data_type`/`scale`~~
* [x] Provide a convenience function (something like `binary_operation_fixed_point_scale()`) that will compute the "expected" scale given two input columns/scalars and a `binary_operator`
* [x] Remove `TRUE_DIV`
* [x] Add unit tests for different output data_types
* [x] Update Python/Cython 

**This will be a breaking change for all `fixed_point` + `cudf::binary_operation`.**

Authors:
  - Conor Hoekstra (@codereport)

Approvers:
  - Keith Kraus (@kkraus14)
  - Mike Wilson (@hyperbolic2346)

URL: rapidsai#7435
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team 4 - Needs Review Waiting for reviewer to review or respond breaking Breaking change improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[IMPR] fixed_point + cudf::binary_operation API Changes
7 participants