BUG: read_table raises ValueError when delim_whitespace is set to True #35958

cgmorton · 2020-08-28T17:34:40Z

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.

Code Sample, a copy-pastable example

import io
import pandas as pd

# Build a simple space delimited file buffer to read
f = io.StringIO("a  b  c\n1 -2 -3\n4  5   6")

# This raises an error
df = pd.read_table(f, delim_whitespace=True)

# Setting the "sep" parameter as suggested in the docs works
# df = pd.read_table(f, sep='\s+')
# print(df)

# Not setting the delim_whitespace parameter or setting it to False works, 
#   but without correct formatting.
# df = pd.read_table(f)
# print(df)

Problem description

If you set the delim_whitespace to True when calling read_table() on a space delimited file, I get the following exception:

ValueError: Specified a delimiter with both sep and delim_whitespace=True; you can only specify one.

Expected Output

I would expect the same output as you get if the sep='\s+' parameter is set (as suggested in the docs).

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit : f2ca0a2
python : 3.6.10.final.0
python-bits : 64
OS : Darwin
OS-release : 18.7.0
Version : Darwin Kernel Version 18.7.0: Thu Jun 18 20:50:10 PDT 2020; root:xnu-4903.278.43~1/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.1.1
numpy : 1.18.5
pytz : 2020.1
dateutil : 2.8.1
pip : 20.1.1
setuptools : 49.6.0.post20200814
Cython : None
pytest : 5.4.3
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.16.1
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None

The text was updated successfully, but these errors were encountered:

dsaxton · 2020-08-28T22:18:51Z

Thanks @cgmorton. This worked in 1.0.5 so marking as a regression:

In [1]: from io import StringIO
   ...: import pandas as pd
   ...:
   ...: print(pd.__version__)
   ...:
   ...: f = StringIO("a  b  c\n1  2  3\n4  5  6")
   ...: pd.read_table(f, delim_whitespace=True)
   ...:
   ...:
1.0.5
Out[1]:
   a  b  c
0  1  2  3
1  4  5  6

asishm · 2020-08-28T22:30:23Z

#34976

a7d96fa is the first bad commit
commit a7d96fa
Author: Terji Petersen [email protected]
Date: Thu Jun 25 18:38:08 2020 +0100
TYP: make the type annotations of read_csv & read_table discoverable (#34976)

dsaxton · 2020-08-28T22:38:03Z

cc @topper-123

tderond · 2020-09-05T01:02:57Z

Simply using read_csv in stead of read_table using the exact same arguments seems to work as a workaround for me

cgmorton added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 28, 2020

dsaxton added IO Data IO issues that don't fit into a more specific label Regression Functionality that used to work in a prior pandas version and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 28, 2020

tderond mentioned this issue Sep 5, 2020

Pandas update to 1.1 breaks load_PfamScan_results_dataframe tderond/CO-ED#1

Closed

simonjayhawkins added this to the 1.1.3 milestone Sep 9, 2020

phofl mentioned this issue Sep 22, 2020

[BUG]: Fix regression in read_table with delim_whitespace=True #36560

Merged

5 tasks

tacaswell mentioned this issue Sep 22, 2020

Strange behaviour of delim_whitespace in pd.read_table #36381

Closed

3 tasks

simonjayhawkins closed this as completed in #36560 Sep 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: read_table raises ValueError when delim_whitespace is set to True #35958

BUG: read_table raises ValueError when delim_whitespace is set to True #35958

cgmorton commented Aug 28, 2020

INSTALLED VERSIONS

dsaxton commented Aug 28, 2020

asishm commented Aug 28, 2020

dsaxton commented Aug 28, 2020

tderond commented Sep 5, 2020

BUG: read_table raises ValueError when delim_whitespace is set to True #35958

BUG: read_table raises ValueError when delim_whitespace is set to True #35958

Comments

cgmorton commented Aug 28, 2020

Code Sample, a copy-pastable example

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

dsaxton commented Aug 28, 2020

asishm commented Aug 28, 2020

dsaxton commented Aug 28, 2020

tderond commented Sep 5, 2020

Output of `pd.show_versions()`