Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] iloc is not performing exclusively integer based indexing #1459

Closed
beckernick opened this issue Apr 18, 2019 · 2 comments
Closed

[BUG] iloc is not performing exclusively integer based indexing #1459

beckernick opened this issue Apr 18, 2019 · 2 comments
Assignees
Labels
bug Something isn't working Python Affects Python cuDF API.

Comments

@beckernick
Copy link
Member

beckernick commented Apr 18, 2019

Describe the bug
As a user, I expect iloc to exclusively perform integer based indexing. Currently, passing a column name to the second argument lets me return that column. The equivalent pandas code fails with a ValueError. This is due to the enforcement of a key-based column access with the second argument if two are passed and row indexer is a slice because this takes us to the DataFrame getitem (noted in #1444 ).

Steps/Code to reproduce bug

import numpy as np
import cudf

M = 1e3

df = cudf.DataFrame()
df['col1'] = np.random.rand(int(M))
df['col2'] = np.random.rand(int(M))

print(df.head())
# Key error on grabbing 0th col
X = df.iloc[:, 'col2']
0       0.878490475957518
1     0.35049769650203455
2    0.048280145683080145
3      0.3844068665321222
4    0.021345988158677165
5      0.5875369468577619
6      0.5527104075530989
7     0.09191724409680513
8     0.12458085283893261
9     0.37619273446670787
[990 more rows]
Name: col2, dtype: float64

Expected behavior
I expect iloc to fail when being based a column name as the second argument.

Environment details (please complete the following information):
Built from source cudf 0.7

Environment **git*** commit f6ad6de (HEAD -> branch-0.7, origin/branch-0.7, origin/HEAD) Merge: ea880d0 84c3d5b Author: Mark Harris Date: Sun Apr 14 08:43:14 2019 +1000
Merge pull request #1389 from eyalroz/fix-issue-1368

[REVIEW] refactored set_null_count()

OS Information
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.04
DISTRIB_CODENAME=xenial
DISTRIB_DESCRIPTION="Ubuntu 16.04.6 LTS"
NAME="Ubuntu"
VERSION="16.04.6 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.6 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial
Linux cd319eb30a15 4.4.0-134-generic #160-Ubuntu SMP Wed Aug 15 14:58:00 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

GPU Information
Wed Apr 17 18:12:36 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.44 Driver Version: 396.44 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-SXM2... On | 00000000:8A:00.0 Off | 0 |
| N/A 32C P0 55W / 300W | 20575MiB / 32510MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+

CPU
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 80
On-line CPU(s) list: 0-79
Thread(s) per core: 2
Core(s) per socket: 20
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 79
Model name: Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz
Stepping: 1
CPU MHz: 2737.539
CPU max MHz: 3600.0000
CPU min MHz: 1200.0000
BogoMIPS: 4392.00
Virtualization: VT-x
Hypervisor vendor: vertical
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 51200K
NUMA node0 CPU(s): 0-19,40-59
NUMA node1 CPU(s): 20-39,60-79
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbesyscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb invpcid_single intel_pt ssbd ibrs ibpb stibp kaiser tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdseed adx smap xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts flush_l1d

CMake
/conda/envs/cudf/bin/cmake
cmake version 3.14.2

CMake suite maintained and supported by Kitware (kitware.com/cmake).

g++
/usr/bin/g++
g++ (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

nvcc
/usr/local/cuda/bin/nvcc
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Tue_Jun_12_23:07:04_CDT_2018
Cuda compilation tools, release 9.2, V9.2.148

Python
/conda/envs/cudf/bin/python
Python 3.7.3

Environment Variables
PATH : /conda/envs/cudf/bin:/conda/condabin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/conda/bin
LD_LIBRARY_PATH : /usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/cuda/lib64:/usr/local/lib
NUMBAPRO_NVVM : /usr/local/cuda/nvvm/lib64/libnvvm.so
NUMBAPRO_LIBDEVICE : /usr/local/cuda/nvvm/libdevice/
CONDA_PREFIX : /conda/envs/cudf
PYTHON_PATH :

conda packages
/conda/condabin/conda
WARNING: The conda.compat module is deprecated and will be removed in a future release.

packages in environment at /conda/envs/cudf:

Name Version Build Channel

alabaster 0.7.12 py_0 conda-forge
arrow-cpp 0.12.1 py37h0e61e49_0 conda-forge
asn1crypto 0.24.0 py37_1003 conda-forge
atomicwrites 1.3.0 py_0 conda-forge
attrs 19.1.0 py_0 conda-forge
babel 2.6.0 py_1 conda-forge
backcall 0.1.0 py_0 conda-forge
bleach 3.1.0 py_0 conda-forge
boost 1.68.0 py37h8619c78_1001 conda-forge
boost-cpp 1.68.0 h11c811c_1000 conda-forge
bzip2 1.0.6 h14c3975_1002 conda-forge
ca-certificates 2019.1.23 0
certifi 2019.3.9 py37_0
cffi 1.12.2 py37hf0e25f4_1 conda-forge
chardet 3.0.4 py37_1003 conda-forge
click 7.0 pypi_0 pypi
cloudpickle 0.8.1 py_0 conda-forge
cmake 3.14.2 hf94ab9c_0 conda-forge
commonmark 0.8.1 py_0 conda-forge
cryptography 2.6.1 py37h72c5cf5_0 conda-forge
cudf 0+unknown pypi_0 pypi
curl 7.64.1 hf8cf82a_0 conda-forge
cython 0.29.7 py37he1b5a44_0 conda-forge
cytoolz 0.9.0.1 py37h14c3975_1001 conda-forge
dask 1.2.0+2.g918854d.dirty pypi_0 pypi
dask-core 1.2.0 py_0 conda-forge
dask-cudf 0.0.0.dev0 pypi_0 pypi
decorator 4.4.0 py_0 conda-forge
defusedxml 0.5.0 py_1 conda-forge
distributed 1.27.0 py37_0 conda-forge
docutils 0.14 py37_1001 conda-forge
entrypoints 0.3 py37_1000 conda-forge
expat 2.2.5 hf484d3e_1002 conda-forge
future 0.17.1 py37_1000 conda-forge
gmp 6.1.2 hf484d3e_1000 conda-forge
heapdict 1.0.0 py37_1000 conda-forge
icu 58.2 hf484d3e_1000 conda-forge
idna 2.8 py37_1000 conda-forge
imagesize 1.1.0 py_0 conda-forge
ipykernel 5.1.0 py37h24bf2e0_1002 conda-forge
ipython 7.4.0 py37h24bf2e0_0 conda-forge
ipython_genutils 0.2.0 py_1 conda-forge
jedi 0.13.3 py37_0 conda-forge
jinja2 2.10.1 py_0 conda-forge
jsonschema 3.0.1 py37_0 conda-forge
jupyter_client 5.2.4 py_3 conda-forge
jupyter_core 4.4.0 py_0 conda-forge
jupyterlab 0.35.4 py37hf63ae98_0
jupyterlab_server 0.2.0 py37_0
krb5 1.16.3 h05b26f9_1001 conda-forge
libblas 3.8.0 4_openblas conda-forge
libcblas 3.8.0 4_openblas conda-forge
libcurl 7.64.1 hda55be3_0 conda-forge
libedit 3.1.20170329 hf8c457e_1001 conda-forge
libffi 3.2.1 he1b5a44_1006 conda-forge
libgcc-ng 8.2.0 hdf63c60_1
libgdf-cffi 0.6.0 pypi_0 pypi
libgfortran 3.0.0 1 conda-forge
libprotobuf 3.6.1 hdbcaa40_1001 conda-forge
librmm-cffi 0.5.0 pypi_0 pypi
libsodium 1.0.16 h14c3975_1001 conda-forge
libssh2 1.8.2 h22169c7_2 conda-forge
libstdcxx-ng 8.2.0 hdf63c60_1
libuv 1.26.0 h14c3975_0 conda-forge
llvmlite 0.28.0 py37hdbcaa40_0 conda-forge
markdown 2.6.11 pypi_0 pypi
markupsafe 1.1.1 py37h14c3975_0 conda-forge
mistune 0.8.4 py37h14c3975_1000 conda-forge
more-itertools 4.3.0 py37_1000 conda-forge
msgpack-python 0.6.1 py37h6bb024c_0 conda-forge
nbconvert 5.4.1 py_2 conda-forge
nbformat 4.4.0 py_1 conda-forge
nbsphinx 0.4.2 py_0 conda-forge
ncurses 6.1 hf484d3e_1002 conda-forge
notebook 5.7.8 py37_0 conda-forge
numba 0.41.0 pypi_0 pypi
numpy 1.15.4 py37h8b7e671_1001 conda-forge
numpydoc 0.8.0 py_1 conda-forge
nvstrings 0.3.0 cuda9.2_py37_18 rapidsai/label/cuda9.2
openblas 0.3.5 ha44fe06_0 conda-forge
openssl 1.1.1b h7b6447c_1
packaging 19.0 py_0 conda-forge
pandas 0.24.2 py37hf484d3e_0 conda-forge
pandoc 1.19.2 0 conda-forge
pandocfilters 1.4.2 py_1 conda-forge
parquet-cpp 1.5.1 4 conda-forge
parso 0.4.0 py_0 conda-forge
pexpect 4.7.0 py37_0 conda-forge
pickleshare 0.7.5 py37_1000 conda-forge
pip 19.0.3 py37_0 conda-forge
pluggy 0.9.0 py_0 conda-forge
prometheus_client 0.6.0 py_0 conda-forge
prompt_toolkit 2.0.9 py_0 conda-forge
psutil 5.6.1 py37h14c3975_0 conda-forge
ptyprocess 0.6.0 py_1001 conda-forge
py 1.8.0 py_0 conda-forge
pyarrow 0.12.1 py37hbbcf98d_0 conda-forge
pycparser 2.19 py37_1 conda-forge
pygments 2.3.1 py_0 conda-forge
pyopenssl 19.0.0 py37_0 conda-forge
pyparsing 2.4.0 py_0 conda-forge
pyrsistent 0.14.11 py37h14c3975_0 conda-forge
pysocks 1.6.8 py37_1002 conda-forge
pytest 4.4.0 py37_1 conda-forge
python 3.7.3 h5b0a415_0 conda-forge
python-dateutil 2.8.0 py_0 conda-forge
pytz 2019.1 py_0 conda-forge
pyyaml 5.1 py37h14c3975_0 conda-forge
pyzmq 18.0.1 py37h0e1adb2_0 conda-forge
readline 7.0 hf8c457e_1001 conda-forge
recommonmark 0.5.0 py_0 conda-forge
requests 2.21.0 py37_1000 conda-forge
rhash 1.3.6 h14c3975_1001 conda-forge
send2trash 1.5.0 py_0 conda-forge
setuptools 41.0.0 py37_0 conda-forge
six 1.12.0 py37_1000 conda-forge
snowballstemmer 1.2.1 py_1 conda-forge
sortedcontainers 2.1.0 py_0 conda-forge
sphinx 2.0.1 py_0 conda-forge
sphinx-markdown-tables 0.0.9 pypi_0 pypi
sphinx_rtd_theme 0.4.3 py_0 conda-forge
sphinxcontrib-applehelp 1.0.1 py_0 conda-forge
sphinxcontrib-devhelp 1.0.1 py_0 conda-forge
sphinxcontrib-htmlhelp 1.0.2 py_0 conda-forge
sphinxcontrib-jsmath 1.0.1 py_0 conda-forge
sphinxcontrib-qthelp 1.0.2 py_0 conda-forge
sphinxcontrib-serializinghtml 1.1.1 py_0 conda-forge
sphinxcontrib-websupport 1.1.0 py_1 conda-forge
sqlite 3.26.0 h67949de_1001 conda-forge
tblib 1.3.2 pypi_0 pypi
terminado 0.8.2 py37_0 conda-forge
testpath 0.4.2 py_1001 conda-forge
thrift-cpp 0.12.0 h0a07b25_1002 conda-forge
tk 8.6.9 h84994c4_1001 conda-forge
toolz 0.9.0 pypi_0 pypi
tornado 6.0.2 py37h516909a_0 conda-forge
traitlets 4.3.2 py37_1000 conda-forge
urllib3 1.24.1 py37_1000 conda-forge
wcwidth 0.1.7 py_1 conda-forge
webencodings 0.5.1 py_1 conda-forge
wheel 0.33.1 py37_0 conda-forge
xz 5.2.4 h14c3975_1001 conda-forge
yaml 0.1.7 h14c3975_1001 conda-forge
zeromq 4.2.5 hf484d3e_1006 conda-forge
zict 0.1.4 pypi_0 pypi
zlib 1.2.11 h14c3975_1004 conda-forge

@beckernick beckernick added bug Something isn't working Needs Triage Need team to review and classify labels Apr 18, 2019
@kkraus14 kkraus14 added Python Affects Python cuDF API. and removed Needs Triage Need team to review and classify labels Apr 18, 2019
@awthomp
Copy link
Member

awthomp commented Apr 29, 2019

This is impeding my work with Whitewater where I need to handle arbitrary data sizes.

The current method of data handling is to expect a csv file of data with the first N-1 columns consisting of our feature X matrix and the last column consisting of our 'y' observation vector, e.g.

import numpy as np
import cudf

M = 1e3

df = cudf.DataFrame()
df['col1'] = np.random.rand(int(M))
df['col2'] = np.random.rand(int(M))
df['y'] = np.random.rand(int(M))

Where:

X = df.iloc[: , :-1]
y = df.iloc[:, -1]

iloc is currently unable to handle:

  1. grabbing columns by index (it returns the whole dataframe)
  2. grabbing groups of columns with splicing, e.g. df.iloc[:, 0:2]

My workaround in my project is (in this example):

X, y = df.loc[:, ['col1', 'col2']], df.loc[:, 'y']

But this obviously requires a-priori knowledge of the column names.

@beckernick
Copy link
Member Author

Resolved by #1622.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Python Affects Python cuDF API.
Projects
None yet
Development

No branches or pull requests

4 participants