Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] SimpleImputer requires array input (and incorrect docs) #3435

Closed
Tracked by #3483
beckernick opened this issue Jan 29, 2021 · 1 comment
Closed
Tracked by #3483

[BUG] SimpleImputer requires array input (and incorrect docs) #3435

beckernick opened this issue Jan 29, 2021 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@beckernick
Copy link
Member

In the 2021-01-29 nightlies, the SimpleImputer example in the documentation throws an error during fit. The namespace may need to be cuml.experimental.preprocessing at the moment. The example also fails in fit due to expecting an object that has a dtype attribute.

import numpy as np
from cuml.impute import SimpleImputer
imp_mean = SimpleImputer(missing_values=np.nan, strategy='mean')
imp_mean.fit([[7, 2, 3], [4, np.nan, 6], [10, 5, 9]])

X = [[np.nan, 2, 3], [4, np.nan, 6], [10, np.nan, 9]]
print(imp_mean.transform(X))
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-1-3af8666173bd> in <module>
      1 import numpy as np
----> 2 from cuml.impute import SimpleImputer
      3 imp_mean = SimpleImputer(missing_values=np.nan, strategy='mean')
      4 imp_mean.fit([[7, 2, 3], [4, np.nan, 6], [10, 5, 9]])
      5

ModuleNotFoundError: No module named 'cuml.impute'

Fixing the namespace, we get the following:

import numpy as np
from cuml.experimental.preprocessing import SimpleImputer
imp_mean = SimpleImputer(missing_values=np.nan, strategy='mean')
imp_mean.fit([[7, 2, 3], [4, np.nan, 6], [10, 5, 9]])
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-2-eacf546eda9a> in <module>
      2 from cuml.experimental.preprocessing import SimpleImputer
      3 imp_mean = SimpleImputer(missing_values=np.nan, strategy='mean')
----> 4 imp_mean.fit([[7, 2, 3], [4, np.nan, 6], [10, 5, 9]])
      5
      6 X = [[np.nan, 2, 3], [4, np.nan, 6], [10, np.nan, 9]]

/raid/nicholasb/miniconda3/envs/rapids-tpcxbb-20210129/lib/python3.7/site-packages/cuml/_thirdparty/sklearn/preprocessing/_imputation.py in fit(self, X, y)
    287         self : SimpleImputer
    288         """
--> 289         X = self._validate_input(X, in_fit=True)
    290         super()._fit_indicator(X)
    291

/raid/nicholasb/miniconda3/envs/rapids-tpcxbb-20210129/lib/python3.7/site-packages/cuml/_thirdparty/sklearn/preprocessing/_imputation.py in _validate_input(self, X, in_fit)
    254                                     accept_sparse='csc', dtype=dtype,
    255                                     force_all_finite=force_all_finite,
--> 256                                     copy=self.copy)
    257         except ValueError as ve:
    258             if "could not convert" in str(ve):

/raid/nicholasb/miniconda3/envs/rapids-tpcxbb-20210129/lib/python3.7/site-packages/cuml/_thirdparty/sklearn/utils/skl_dependencies.py in _validate_data(self, X, y, reset, validate_separately, **check_params)
    314                     f"requires y to be passed, but the target y is None."
    315                 )
--> 316             X = check_array(X, **check_params)
    317             out = X
    318         else:

/raid/nicholasb/miniconda3/envs/rapids-tpcxbb-20210129/lib/python3.7/site-packages/cuml/thirdparty_adapters/adapters.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
    229         dtype = numeric_types
    230
--> 231     correct_dtype = check_dtype(array, dtype)
    232
    233     if copy and not order and hasattr(array, 'flags'):

/raid/nicholasb/miniconda3/envs/rapids-tpcxbb-20210129/lib/python3.7/site-packages/cuml/thirdparty_adapters/adapters.py in check_dtype(array, dtypes)
    116
    117         if not isinstance(array, cuDataFrame):
--> 118             if array.dtype not in dtypes:
    119                 return dtypes[0]
    120         elif any([dt not in dtypes for dt in array.dtypes.tolist()]):

AttributeError: 'list' object has no attribute 'dtype'

Scikit-learn is able to pass a list.

from sklearn.impute import SimpleImputer
import numpy as npimp_mean = SimpleImputer(missing_values=np.nan, strategy='mean')
imp_mean.fit([[7, 2, 3], [4, np.nan, 6], [10, 5, 9]])
SimpleImputer()
conda list | grep "rapids\|blazing\|dask\|distr\|pandas"
# packages in environment at /raid/nicholasb/miniconda3/envs/rapids-tpcxbb-20210129:
blazingsql                0.18.0a0                 pypi_0    pypi
cudf                      0.18.0a210129   cuda_10.2_py37_gb608832f4f_217    rapidsai-nightly
cuml                      0.18.0a210129   cuda10.2_py37_gd413fe64d_91    rapidsai-nightly
dask                      2021.1.1           pyhd8ed1ab_0    conda-forge
dask-core                 2021.1.1           pyhd8ed1ab_0    conda-forge
dask-cuda                 0.18.0a210129           py37_53    rapidsai-nightly
dask-cudf                 0.18.0a210129   py37_gb608832f4f_217    rapidsai-nightly
distributed               2021.1.1         py37h89c1867_0    conda-forge
faiss-proc                1.0.0                      cuda    rapidsai-nightly
libcudf                   0.18.0a210129   cuda10.2_gb097b5adac_218    rapidsai-nightly
libcuml                   0.18.0a210129   cuda10.2_gd413fe64d_91    rapidsai-nightly
libcumlprims              0.18.0a201203   cuda10.2_gff080f3_0    rapidsai-nightly
librmm                    0.18.0a210129   cuda10.2_g89c560e_31    rapidsai-nightly
pandas                    1.1.5            py37hdc94413_0    conda-forge
rmm                       0.18.0a210129   cuda_10.2_py37_g89c560e_31    rapidsai-nightly
ucx                       1.9.0+gcd9efd3       cuda10.2_0    rapidsai-nightly
ucx-proc                  1.0.0                       gpu    rapidsai-nightly
ucx-py                    0.18.0a210129   py37_gcd9efd3_11    rapidsai-nightly
@beckernick beckernick added bug Something isn't working ? - Needs Triage Need team to review and classify labels Jan 29, 2021
@cjnolet cjnolet removed the ? - Needs Triage Need team to review and classify label Feb 4, 2021
rapids-bot bot pushed a commit that referenced this issue Feb 5, 2021
This solve partially #3435 by fixing the documentation of `SimpleImputer`. The next step will be to allow usage of lists for this algorithm.

Authors:
  - Micka (@lowener)

Approvers:
  - Michael Demoret (@mdemoret-nv)

URL: #3447
rapids-bot bot pushed a commit that referenced this issue Feb 16, 2021
Partially answers #3435

Authors:
  - Victor Lafargue (@viclafargue)

Approvers:
  - Micka (@lowener)
  - Dante Gama Dessavre (@dantegd)

URL: #3489
@JohnZed
Copy link
Contributor

JohnZed commented Feb 22, 2021

Closed by #3489

@JohnZed JohnZed closed this as completed Feb 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants