BUG: Performance regression to_csv when formatting datatime index #39413
Labels
Bug
Datetime
Datetime data dtype
IO CSV
read_csv, to_csv
Performance
Memory or execution speed performance
Milestone
Code Sample, a copy-pastable example
All variants give the same output. The issue is not the different execution time of the individual variants but the performance regression of the first variant. The other variants are added just to show that nothing changed here between versions
1.1.5
and1.2.0
(versions1.2.1
and1.2.0rc0
give the same results as version1.2.0
).Problem description
Using a
date_format
for a datetime index into_csv
is almost 3 times slower in version1.2.0
than in1.1.5
(see first row in test output: 2.39 s instead of 829 ms for 100,000 rows). For 1,000,000 rows the slowdown is 19 times.
This might be related to #37484.
INSTALLED VERSIONS
commit : 3e89b4c
python : 3.8.2.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.18362
machine : AMD64
processor : Intel64 Family 6 Model 158 Stepping 10, GenuineIntel
byteorder : little
LC_ALL : None
LANG : en
LOCALE : de_DE.cp1252
pandas : 1.2.0
numpy : 1.19.0
pytz : 2020.1
dateutil : 2.8.1
pip : 20.1.1
setuptools : 41.2.0
Cython : 0.29.14
pytest : 5.4.1
hypothesis : None
sphinx : 2.4.4
blosc : None
feather : None
xlsxwriter : 1.2.9
lxml.etree : 4.5.0
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.1
IPython : 7.13.0
pandas_datareader: None
bs4 : 4.9.1
bottleneck : 1.3.2
fsspec : 0.7.4
fastparquet : None
gcsfs : None
matplotlib : 3.3.3
numexpr : None
odfpy : None
openpyxl : 3.0.4
pandas_gbq : None
pyarrow : 0.17.0
pyxlsb : None
s3fs : None
scipy : 1.6.0
sqlalchemy : 1.3.16
tables : None
tabulate : None
xarray : 0.15.1
xlrd : 1.2.0
xlwt : None
numba : 0.50.1
The text was updated successfully, but these errors were encountered: