You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This corrects an issue with the sampling range used when replacement=True. Before, it sampled the range 0 through `num_rows` meaning it could sample `num_rows` even though it's one position out of bounds. This caused sample to return values not present in the original DataFrame.
I also created exceptions for sampling on empty DataFrames that match pandas, as well as an exception for sampling when `axis=1` and `replace=True` as cudf does not support DataFrames with duplicate columns.
This closes#6532
Authors:
- Chris Jarrett <[email protected]>
- Mark Harris <[email protected]>
- ChrisJar <[email protected]>
Approvers:
- Keith Kraus
- Mark Harris
URL: #6884
Describe the bug
DataFrame.sample() returns values not present in the original dataframe
Steps/Code to reproduce bug
Expected behavior
I expect "2" to be returned rather than "3". Running the same code against a pandas dataframe returns "2"
Environment overview (please complete the following information)
Environment details
Click here to see environment details
The text was updated successfully, but these errors were encountered: