Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] fillna forward and backward fill methods fail to replace long ranges of missing values #8673

Closed
Nyrio opened this issue Jul 7, 2021 · 1 comment · Fixed by #8699
Closed
Assignees
Labels
bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. Python Affects Python cuDF API.

Comments

@Nyrio
Copy link

Nyrio commented Jul 7, 2021

The following code explains the problem better than words:

import cudf
import pandas as pd
import numpy as np

a_pd = pd.DataFrame([[0, np.nan], *([[np.nan, np.nan]]*14), [np.nan, 1]],
                    columns=list("AB"))
a_cudf = cudf.from_pandas(a_pd)

print(a_pd.fillna(method="ffill").fillna(method="bfill"))
print(a_cudf.fillna(method="ffill").fillna(method="bfill"))
input            |   pandas output   |  cudf output
                 |                   |
       A     B   |         A    B    |          A     B
0    0.0  <NA>   |   0   0.0  1.0    |   0    0.0  <NA>
1   <NA>  <NA>   |   1   0.0  1.0    |   1    0.0  <NA>
2   <NA>  <NA>   |   2   0.0  1.0    |   2    0.0   1.0
3   <NA>  <NA>   |   3   0.0  1.0    |   3    0.0   1.0
4   <NA>  <NA>   |   4   0.0  1.0    |   4    0.0   1.0
5   <NA>  <NA>   |   5   0.0  1.0    |   5    0.0   1.0
6   <NA>  <NA>   |   6   0.0  1.0    |   6    0.0   1.0
7   <NA>  <NA>   |   7   0.0  1.0    |   7    0.0   1.0
8   <NA>  <NA>   |   8   0.0  1.0    |   8    0.0   1.0
9   <NA>  <NA>   |   9   0.0  1.0    |   9    0.0   1.0
10  <NA>  <NA>   |   10  0.0  1.0    |   10   0.0   1.0
11  <NA>  <NA>   |   11  0.0  1.0    |   11   0.0   1.0
12  <NA>  <NA>   |   12  0.0  1.0    |   12   0.0   1.0
13  <NA>  <NA>   |   13  0.0  1.0    |   13   0.0   1.0
14  <NA>  <NA>   |   14  0.0  1.0    |   14  <NA>   1.0
15  <NA>   1.0   |   15  0.0  1.0    |   15  <NA>   1.0

The expected behavior is the pandas output: there shouldn't be any <NA> left after an existing value in ffill / before in bfill.
But cudf can only fill a certain range and then fails to fill subsequent missing values.

(note, however, that filling by value works)

@Nyrio Nyrio added Needs Triage Need team to review and classify bug Something isn't working labels Jul 7, 2021
@beckernick
Copy link
Member

I'm able to reproduce this as well. Marking as P0 due to data corruption being a silent failure.

@isVoid isVoid self-assigned this Jul 8, 2021
@beckernick beckernick added Python Affects Python cuDF API. libcudf Affects libcudf (C++/CUDA) code. and removed Needs Triage Need team to review and classify labels Jul 12, 2021
rapids-bot bot pushed a commit that referenced this issue Jul 16, 2021
… for gathermap (#8699)

Closes #8673 

This PR fixes a bug in the functor for `replace_nulls(replace_policy)`. The current functor assumes that the second element of the pair is discarded and is arbitrarily returned true. This breaks the associative constraints for `inclusive_scan` functors.

Authors:
  - Michael Wang (https://github.com/isVoid)

Approvers:
  - GALI PREM SAGAR (https://github.com/galipremsagar)
  - https://github.com/nvdbaranec
  - Nghia Truong (https://github.com/ttnghia)

URL: #8699
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants