columns sometimes not added to empty dataframe #28871

innominate227 · 2019-10-09T14:53:06Z

Code Sample, a copy-pastable example if possible

import datetime
import pandas as pd

test = pd.DataFrame({'date':[datetime.datetime(2000,1,1)]}).set_index('date')
test = test[0:0].copy()

test['3010'] = None
test['2010'] = None

print(test)

Problem description

The column '3010' is added to the dataframe as expected, but column 2010 does not get added.

Expected Output

Empty DataFrame
Columns: [3010, 2010]
Index: []

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit : None
python : 3.5.4.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 85 Stepping 4, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None

pandas : 0.25.1
numpy : 1.17.2
pytz : 2019.3
dateutil : 2.8.0
pip : 9.0.1
setuptools : 36.4.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : None
pandas_datareader: None
openpyxl : None
odfpy : None
gcsfs : None
bottleneck : None
pandas_gbq : None
xarray : None
tables : None
scipy : None
pyarrow : None
xlsxwriter : None
matplotlib : None
sqlalchemy : None
pytables : None
xlwt : None
lxml.etree : None
s3fs : None
xlrd : None
numexpr : None
bs4 : None
fastparquet : None

The text was updated successfully, but these errors were encountered:

innominate227 · 2019-10-09T15:00:57Z

The limit seems to be:

test['2262'] = None #adds a new column
test['2261'] = None #doesnt add a column

innominate227 · 2019-10-09T15:24:00Z

Various way to add the column described here: https://stackoverflow.com/a/50372722/1418484 either throw an error or fail to add the column:

test = test.assign(**{'2010':None}) #no column is added
test = test.assign(**{'2010':test['3010']}) #error
test['2010'] = pd.Series(index=test.index) #error

I was able to work around by just creating a new dataframe with the column

test = pd.DataFrame(columns=list(test.columns)+['2010'],index=test.index)

mroeschke · 2019-10-09T15:59:47Z

Thanks for the report. I'm guessing this is due to some underlying datetime inference as 2262 is the max year we support in pandas. Investigations and PRs welcome.

amas0 · 2019-10-10T04:05:35Z

I dug into this a little bit - looks like the underlying problem is related to datetimes as @mroeschke suggested. The primary issue arises due to the fact that test has a DatetimeIndex as soon as the date column is set as the index.

The DatetimeIndex supports partial string indexing, so it attempts to parse the key as a partial datetime slice. To quote from the pandas docs on partial string indexing

To provide convenience for accessing longer time series, you can also pass in the year or year and month as strings:

Since '2010' is a valid date, it slices the rows in the dataframe that fall in 2010 and assigns their values. If you recreate this example scenario with data, it will simply assign all dates in 2010 to NaN. Using an empty dataframe as in the example, it looks as if it does nothing, but it's simply assigning on an empty slice. Setting a key outside of the valid date range causes the partial datetime slicing to throw an exception, which then falls back to normal column assignment. In line with the quote from the docs above, this issue also occurs when you pass a key in YYYY-MM format.

I'm not sure if this is truly a bug? However, I think it makes sense to add an additional check to the convert_index_to_sliceable() function that ensures the index is non-empty before attempting to create a slice.

Welcome anyone's thoughts on this and I'd be happy to whip up a PR if necessary.

phofl · 2023-04-18T09:17:28Z

Works now,

Empty DataFrame
Columns: [3010, 2010]
Index: []

mroeschke added Bug Constructors Series/DataFrame/Index/pd.array Constructors Indexing Related to indexing on series/frames, not to indexes themselves labels Oct 9, 2019

phofl added Needs Tests Unit test(s) needed to prevent regressions and removed Bug labels Apr 18, 2023

phofl mentioned this issue Apr 18, 2023

Add regression tests noatamir/pyladies-workshop#7

Closed

26 tasks

lucaseckes mentioned this issue Apr 18, 2023

Columns not added in empty dataframe #52733

Closed

5 tasks

coco90417 mentioned this issue Jul 15, 2023

add test for GH#28871 #54138

Merged

3 tasks

mroeschke mentioned this issue Jul 17, 2023

move indexslice below multiindex #54144

Merged

6 tasks

phofl closed this as completed in #54138 Jul 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

columns sometimes not added to empty dataframe #28871

columns sometimes not added to empty dataframe #28871

innominate227 commented Oct 9, 2019 •

edited

Loading

INSTALLED VERSIONS

innominate227 commented Oct 9, 2019 •

edited

Loading

innominate227 commented Oct 9, 2019 •

edited

Loading

mroeschke commented Oct 9, 2019

amas0 commented Oct 10, 2019

phofl commented Apr 18, 2023

columns sometimes not added to empty dataframe #28871

columns sometimes not added to empty dataframe #28871

Comments

innominate227 commented Oct 9, 2019 • edited Loading

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

innominate227 commented Oct 9, 2019 • edited Loading

innominate227 commented Oct 9, 2019 • edited Loading

mroeschke commented Oct 9, 2019

amas0 commented Oct 10, 2019

phofl commented Apr 18, 2023

innominate227 commented Oct 9, 2019 •

edited

Loading

Output of `pd.show_versions()`

innominate227 commented Oct 9, 2019 •

edited

Loading

innominate227 commented Oct 9, 2019 •

edited

Loading