-
Notifications
You must be signed in to change notification settings - Fork 908
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Review] Correct the sampling range when sampling with replacement #6884
Conversation
Codecov Report
@@ Coverage Diff @@
## branch-0.18 #6884 +/- ##
===============================================
+ Coverage 82.01% 83.12% +1.10%
===============================================
Files 96 96
Lines 16340 17873 +1533
===============================================
+ Hits 13402 14857 +1455
- Misses 2938 3016 +78
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The C++ fix is fine. Just need clarification on the Python exception messages.
rerun tests |
This corrects an issue with the sampling range used when replacement=True. Before, it sampled the range 0 through
num_rows
meaning it could samplenum_rows
even though it's one position out of bounds. This caused sample to return values not present in the original DataFrame.I also created exceptions for sampling on empty DataFrames that match pandas, as well as an exception for sampling when
axis=1
andreplace=True
as cudf does not support DataFrames with duplicate columns.This closes #6532 and #6882