Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] Upgrade pandas to 1.2 #7375

Merged
merged 46 commits into from
Feb 26, 2021
Merged

Conversation

galipremsagar
Copy link
Contributor

@galipremsagar galipremsagar commented Feb 12, 2021

Fixes: #7367, #7446

This PR upgrades pandas to 1.2.2 in cudf. Changes include:

  • Bumping up pandas version.
  • Fixing isin behavior which now takes in types into accout: DOC: Undocumented change in .isin behavior from 1.1.5 to 1.2.0 pandas-dev/pandas#38781
  • CategoricalColumn.__setitem__ will now not allow setting of values that are not in existing categories.
  • Introduced cudf.core._compat.PANDAS_GE_120 variable to create back-ward compatibility.
  • Updated usages of pd.core.tools.datetimes._guess_datetime_format to pd.core.tools.datetimes.guess_datetime_format
  • Introduced std & median in DateTimeColumn.
  • Fixed incorrect handling of passing StringMethods as an input to methods in string APIs.
  • Fixed a typo in calling is_valid of Scalar.
  • Removed unnecessary special handling in TimeDeltaColumn.sum logic for empty inputs.
  • Introduced passing dtype='float64' wherever there is an empty series being created since pandas will soon be defaulting to object dtype if no type is passed and we don't have a perfectly resembling object dtype as that of pandas.
  • Fixed deprecation warnings of Index.__or__ and Index.__xor__ by replacing with union & symmetric_difference APIs.
  • Introduced mapping of our float32 & float64 dtypes to pandas Nullable dtypes FLoat32Dtype & Float64Dtype when nullable=True in to_pandas.
  • With introduction of nullable float dtypes, there is an issue in creating MultiIndex from dataframe: BUG: Unable to create a MultiIndex with nan values in nullable Float dtypes pandas-dev/pandas#39984, so introduced a workaround in our MultiIndex.__repr__ code.
  • Removed usages of check_less_precise in our code-base as this is deprecated and is replaced with rtol & atol. Retained its usages in our testing APIs for back-ward compatibility.
  • Removed good number xfail cases which are actually passing right now because of resolved issues in both pandas & cudf.
  • Did some miscellaneous code-cleanup in pytests.
  • Fixed pytests that will fail when run in parallel due to access to shared pytest params being manipulated inplace.
  • Follow a standard import pattern across pytest files, some files do from pandas import Series and some do from cudf.core import Series. So removed both patterns and doing only simple import cudf & import pandas as pd to avoid confusion while debugging test failures across multiple files. (Made this change in all pytest files which I had to touch as part of pandas upgrade, we can make similar changes in future for the files which we touch).
  • Fix issue with assigning np.nan values to a CategoricalColumn and fix related __repr__ code: [BUG] Unable to assign null/nan in categorical column #7446

@galipremsagar galipremsagar added 2 - In Progress Currently a work in progress Python Affects Python cuDF API. labels Feb 12, 2021
@galipremsagar galipremsagar self-assigned this Feb 12, 2021
@github-actions github-actions bot added the conda label Feb 23, 2021
@galipremsagar galipremsagar added 3 - Ready for Review Ready for review by team 4 - Needs cuDF (Python) Reviewer breaking Breaking change improvement Improvement / enhancement to an existing function and removed 2 - In Progress Currently a work in progress labels Feb 23, 2021
@galipremsagar
Copy link
Contributor Author

rerun tests

@kkraus14 kkraus14 added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team 4 - Needs Dask Reviewer labels Feb 26, 2021
@kkraus14
Copy link
Collaborator

@gpucibot merge

@@ -45,6 +45,7 @@ requirements:
- fsspec>=0.6.0
- {{ pin_compatible('cudatoolkit', max_pin='x.x') }}
- nvtx >=0.2.1
- packaging
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be added to the integration repo as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would defer to @ajschmidt8 here, there was a build job-related failure at this line https://github.com/rapidsai/cudf/blob/branch-0.19/ci/cpu/build.sh#L78-L79 so AJ suggested we add it here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, i will open a PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rapidsai/integration#225

Feel free to review. I will wait to merge until this PR is confirmed passing also.

@codecov
Copy link

codecov bot commented Feb 26, 2021

Codecov Report

Merging #7375 (416bc92) into branch-0.19 (43b44e1) will increase coverage by 0.52%.
The diff coverage is 90.33%.

Impacted file tree graph

@@               Coverage Diff               @@
##           branch-0.19    #7375      +/-   ##
===============================================
+ Coverage        81.80%   82.33%   +0.52%     
===============================================
  Files              101      101              
  Lines            16695    17198     +503     
===============================================
+ Hits             13658    14160     +502     
- Misses            3037     3038       +1     
Impacted Files Coverage Δ
python/cudf/cudf/utils/docutils.py 97.36% <50.00%> (-2.64%) ⬇️
python/cudf/cudf/testing/testing.py 80.00% <57.14%> (-1.04%) ⬇️
python/cudf/cudf/core/column/timedelta.py 88.57% <75.00%> (+0.33%) ⬆️
python/cudf/cudf/core/column/categorical.py 91.74% <77.14%> (-0.49%) ⬇️
python/cudf/cudf/core/column/datetime.py 89.63% <78.57%> (+0.58%) ⬆️
python/cudf/cudf/core/multiindex.py 82.90% <90.90%> (+0.73%) ⬆️
python/cudf/cudf/core/tools/datetimes.py 84.53% <93.33%> (+0.89%) ⬆️
python/cudf/cudf/core/column_accessor.py 95.47% <95.65%> (+2.53%) ⬆️
python/cudf/cudf/core/column/column.py 87.86% <96.42%> (+0.43%) ⬆️
python/cudf/cudf/core/_compat.py 100.00% <100.00%> (ø)
... and 62 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7526be7...416bc92. Read the comment docs.

@rapids-bot rapids-bot bot merged commit d0a5ec8 into rapidsai:branch-0.19 Feb 26, 2021
rapids-bot bot pushed a commit that referenced this pull request Mar 2, 2021
Fully resolves: #7412 

Most of the fix to address #7412 were done as part of pandas upgrade in #7375. This PR only includes docstrings update to `isin`.

Authors:
  - GALI PREM SAGAR (@galipremsagar)

Approvers:
  - Keith Kraus (@kkraus14)

URL: #7479
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge breaking Breaking change improvement Improvement / enhancement to an existing function Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Unable to assign null/nan in categorical column [FEA] Add support for pandas 1.2
4 participants