Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

columns sometimes not added to empty dataframe #28871

Closed
Tracked by #7
innominate227 opened this issue Oct 9, 2019 · 5 comments · Fixed by #54138
Closed
Tracked by #7

columns sometimes not added to empty dataframe #28871

innominate227 opened this issue Oct 9, 2019 · 5 comments · Fixed by #54138
Labels
Constructors Series/DataFrame/Index/pd.array Constructors Indexing Related to indexing on series/frames, not to indexes themselves Needs Tests Unit test(s) needed to prevent regressions

Comments

@innominate227
Copy link

innominate227 commented Oct 9, 2019

Code Sample, a copy-pastable example if possible

import datetime
import pandas as pd

test = pd.DataFrame({'date':[datetime.datetime(2000,1,1)]}).set_index('date')
test = test[0:0].copy()

test['3010'] = None
test['2010'] = None

print(test)

Problem description

The column '3010' is added to the dataframe as expected, but column 2010 does not get added.

Expected Output

Empty DataFrame
Columns: [3010, 2010]
Index: []

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None
python : 3.5.4.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 85 Stepping 4, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None

pandas : 0.25.1
numpy : 1.17.2
pytz : 2019.3
dateutil : 2.8.0
pip : 9.0.1
setuptools : 36.4.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : None
pandas_datareader: None
openpyxl : None
odfpy : None
gcsfs : None
bottleneck : None
pandas_gbq : None
xarray : None
tables : None
scipy : None
pyarrow : None
xlsxwriter : None
matplotlib : None
sqlalchemy : None
pytables : None
xlwt : None
lxml.etree : None
s3fs : None
xlrd : None
numexpr : None
bs4 : None
fastparquet : None

@innominate227
Copy link
Author

innominate227 commented Oct 9, 2019

The limit seems to be:

test['2262'] = None #adds a new column
test['2261'] = None #doesnt add a column

@innominate227
Copy link
Author

innominate227 commented Oct 9, 2019

Various way to add the column described here: https://stackoverflow.com/a/50372722/1418484 either throw an error or fail to add the column:

test = test.assign(**{'2010':None}) #no column is added
test = test.assign(**{'2010':test['3010']}) #error
test['2010'] = pd.Series(index=test.index) #error

I was able to work around by just creating a new dataframe with the column

test = pd.DataFrame(columns=list(test.columns)+['2010'],index=test.index)

@mroeschke
Copy link
Member

Thanks for the report. I'm guessing this is due to some underlying datetime inference as 2262 is the max year we support in pandas. Investigations and PRs welcome.

@mroeschke mroeschke added Bug Constructors Series/DataFrame/Index/pd.array Constructors Indexing Related to indexing on series/frames, not to indexes themselves labels Oct 9, 2019
@amas0
Copy link

amas0 commented Oct 10, 2019

I dug into this a little bit - looks like the underlying problem is related to datetimes as @mroeschke suggested. The primary issue arises due to the fact that test has a DatetimeIndex as soon as the date column is set as the index.

The DatetimeIndex supports partial string indexing, so it attempts to parse the key as a partial datetime slice. To quote from the pandas docs on partial string indexing

To provide convenience for accessing longer time series, you can also pass in the year or year and month as strings:

Since '2010' is a valid date, it slices the rows in the dataframe that fall in 2010 and assigns their values. If you recreate this example scenario with data, it will simply assign all dates in 2010 to NaN. Using an empty dataframe as in the example, it looks as if it does nothing, but it's simply assigning on an empty slice. Setting a key outside of the valid date range causes the partial datetime slicing to throw an exception, which then falls back to normal column assignment. In line with the quote from the docs above, this issue also occurs when you pass a key in YYYY-MM format.

I'm not sure if this is truly a bug? However, I think it makes sense to add an additional check to the convert_index_to_sliceable() function that ensures the index is non-empty before attempting to create a slice.

Welcome anyone's thoughts on this and I'd be happy to whip up a PR if necessary.

@phofl phofl added Needs Tests Unit test(s) needed to prevent regressions and removed Bug labels Apr 18, 2023
@phofl
Copy link
Member

phofl commented Apr 18, 2023

Works now,

Empty DataFrame
Columns: [3010, 2010]
Index: []

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Constructors Series/DataFrame/Index/pd.array Constructors Indexing Related to indexing on series/frames, not to indexes themselves Needs Tests Unit test(s) needed to prevent regressions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants