Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Python DecimalColumn #6715

Merged
merged 77 commits into from
Jan 22, 2021

Conversation

shwina
Copy link
Contributor

@shwina shwina commented Nov 9, 2020

Resolves #6657.

@GPUtester
Copy link
Collaborator

Please update the changelog in order to start CI tests.

View the gpuCI docs here.

@codecov
Copy link

codecov bot commented Nov 10, 2020

Codecov Report

Merging #6715 (870e040) into branch-0.18 (8860baf) will increase coverage by 0.12%.
The diff coverage is 88.53%.

Impacted file tree graph

@@               Coverage Diff               @@
##           branch-0.18    #6715      +/-   ##
===============================================
+ Coverage        82.09%   82.22%   +0.12%     
===============================================
  Files               97       98       +1     
  Lines            16474    16655     +181     
===============================================
+ Hits             13524    13694     +170     
- Misses            2950     2961      +11     
Impacted Files Coverage Δ
python/cudf/cudf/_lib/__init__.py 100.00% <ø> (ø)
python/cudf/cudf/core/column/lists.py 91.66% <ø> (-0.09%) ⬇️
python/cudf/cudf/utils/ioutils.py 78.71% <ø> (ø)
python/cudf/cudf/io/orc.py 86.89% <63.63%> (-1.51%) ⬇️
python/cudf/cudf/core/dataframe.py 90.49% <72.41%> (-0.22%) ⬇️
python/cudf/cudf/core/series.py 91.10% <80.00%> (-0.06%) ⬇️
python/cudf/cudf/core/column/numerical.py 94.08% <83.33%> (-0.33%) ⬇️
python/cudf/cudf/core/dtypes.py 89.94% <84.00%> (-0.45%) ⬇️
python/cudf/cudf/core/column/column.py 88.12% <85.71%> (-0.02%) ⬇️
python/cudf/cudf/io/csv.py 91.66% <86.66%> (-1.67%) ⬇️
... and 18 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6c116e3...870e040. Read the comment docs.

@harrism
Copy link
Member

harrism commented Dec 2, 2020

This PR needs a description, and a decision on whether to retarget to 0.18.

@shwina
Copy link
Contributor Author

shwina commented Dec 2, 2020

We plan to have this ready for review before code freeze (EOD today).

@codereport codereport added feature request New feature or request non-breaking Non-breaking change labels Dec 2, 2020
python/cudf/cudf/core/dtypes.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/dtypes.py Show resolved Hide resolved
python/cudf/cudf/core/column/decimal.py Outdated Show resolved Hide resolved
Copy link
Contributor

@karthikeyann karthikeyann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cpp approval.
Update Copyright year.

CHANGELOG.md Outdated Show resolved Hide resolved
python/cudf/cudf/core/column/decimal.py Show resolved Hide resolved
python/cudf/cudf/core/column/decimal.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/dtypes.py Show resolved Hide resolved
@codereport
Copy link
Contributor

codereport commented Jan 20, 2021

@shwina @kkraus14 CNL's behavior is that it can't represent numbers outside the bounds of the underlying type regrardless of what the positive scale (negative scale for Python) is. CNL also has a from_rep facility to avoid shifting, but that doesn't seem make a difference. Godbolt: https://godbolt.org/z/rd1xoe

libcudf on the other hand actually does work with values greater than the bounds of the underlying type, as long as you construct from the already shifted values (which is what fixed_point_column_wrapper does). For example, the numbers stored below with scale=6 (read scale=-6 in the equivalent Python code), are outside the bounds of an int32_t, but the operations all still work, and our explicit conversion operator to std::string is able to print the numbers correctly as well.

  using fp_wrapper = cudf::test::fixed_point_column_wrapper<int32_t>;

  auto const a = fp_wrapper{{100000000}, scale_type{6}};
  auto const b = fp_wrapper{{5000000},   scale_type{7}};
  auto const c = fp_wrapper{{2},         scale_type{0}};

  auto const expected1 = fp_wrapper{{150000000}, scale_type{6}};
  auto const expected2 = fp_wrapper{{50000000},  scale_type{6}};

  auto const result1 = cudf::binary_operation(a, b, cudf::binary_operator::ADD, {});
  auto const result2 = cudf::binary_operation(a, c, cudf::binary_operator::DIV, {});

  CUDF_TEST_EXPECT_COLUMNS_EQUAL(expected1, result1->view());
  CUDF_TEST_EXPECT_COLUMNS_EQUAL(expected2, result2->view());

Update:
Just realized to make it even more clear, you can set the precision as high as you like:

  using fp_wrapper = cudf::test::fixed_point_column_wrapper<int64_t>;

  auto a = fp_wrapper{{100000000}, scale_type{100}};
  auto b = fp_wrapper{{5000000}, scale_type{101}};
  auto c = fp_wrapper{{2}, scale_type{0}};

  auto expected1 = fp_wrapper{{150000000}, scale_type{100}};
  auto expected2 = fp_wrapper{{50000000}, scale_type{100}};

  auto result1 = cudf::binary_operation(a, b, cudf::binary_operator::ADD, {});
  auto result2 = cudf::binary_operation(a, c, cudf::binary_operator::DIV, {});

  CUDF_TEST_EXPECT_COLUMNS_EQUAL(expected1, result1->view());
  CUDF_TEST_EXPECT_COLUMNS_EQUAL(expected2, result2->view());

@@ -2200,6 +2200,21 @@ TYPED_TEST(FixedPointTestBothReps, FixedPointBinaryOpMultiplyScalar)
CUDF_TEST_EXPECT_COLUMNS_EQUAL(expected, result->view());
}

TYPED_TEST(FixedPointTestBothReps, FixedPointBinaryOpSimplePlus)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this just a missing binop test? I can't tell if it's testing something more specific/related to this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc: @codereport; IIRC it's just a test we wrote because this case wasn't working in Python. Eventually, we traced that to a Python bug, but we decided to keep this test because why would you delete a test? :D

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any test that ever failed is a good test 👍
I asked just to make sure it covers what it's supposed to.

@kkraus14
Copy link
Collaborator

@gpucibot merge

@kkraus14 kkraus14 added 5 - Ready to Merge Testing and reviews complete, ready to merge Python Affects Python cuDF API. and removed 3 - Ready for Review Ready for review by team labels Jan 22, 2021
@rapids-bot rapids-bot bot merged commit 797f004 into rapidsai:branch-0.18 Jan 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge feature request New feature or request non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Introduce DecimalColumn to cuDF
8 participants