Slow autocompletion in python/ipython console for large DataFrame containing strings #37947

flcong · 2020-11-18T22:28:14Z

I'm not sure if this is the right place to ask, but it seems the autocompletion in python or ipython console is especially slow for large DataFrame with strings (object) in it.

For example, consider the following two DataFrames:

import numpy as np
import pandas as pd
df1 = pd.DataFrame(np.random.rand(400000,60))
df2 = pd.DataFrame([["asdfasdf"]*60]*400000)

By typing df1.<TAB> in the interactive python/ipython session, the autocompletion is quick, but it takes a very long time for df2.<TAB> to finish (it gets stucked for many seconds).

I'm not sure if this is due to different implementation of pandas for DataFrames containing numbers vs. strings (object), or this is due to issues in the interactive python/ipython session.

The text was updated successfully, but these errors were encountered:

jreback · 2020-11-19T02:10:42Z

pls pd.show_versions() this is not an issue on master

flcong · 2020-11-19T03:43:44Z

INSTALLED VERSIONS
------------------
commit           : 67a3d4241ab84419856b84fc3ebc9abcbe66c6b3
python           : 3.8.6.final.0
python-bits      : 64
OS               : Windows
OS-release       : 10
Version          : 10.0.19041
machine          : AMD64
processor        : Intel64 Family 6 Model 158 Stepping 13, GenuineIntel
byteorder        : little
LC_ALL           : None
LANG             : None
LOCALE           : English_United States.1252

pandas           : 1.1.4
numpy            : 1.19.4
pytz             : 2020.4
dateutil         : 2.8.1
pip              : 20.2.4
setuptools       : 49.6.0.post20201009
Cython           : 0.29.21
pytest           : 6.1.2
hypothesis       : None
sphinx           : 3.3.1
blosc            : None
feather          : None
xlsxwriter       : 1.3.7
lxml.etree       : 4.6.1
html5lib         : 1.1
pymysql          : None
psycopg2         : None
jinja2           : 2.11.2
IPython          : 7.19.0
pandas_datareader: None
bs4              : 4.9.3
bottleneck       : 1.3.2
fsspec           : 0.8.4
fastparquet      : None
gcsfs            : None
matplotlib       : 3.3.3
numexpr          : 2.7.1
odfpy            : None
openpyxl         : 3.0.5
pandas_gbq       : None
pyarrow          : None
pytables         : None
pyxlsb           : None
s3fs             : None
scipy            : 1.5.3
sqlalchemy       : 1.3.20
tables           : 3.6.1
tabulate         : None
xarray           : None
xlrd             : 1.2.0
xlwt             : 1.3.0
numba            : 0.51.2

Actually, it only happens if I type df2.<TAB>. If I give an initial letter, it is okay, e.g. df2.a<TAB>.

jorisvandenbossche · 2020-11-19T08:24:29Z

This is also an issue on master.

The reason is that jedi, the package providing the tab completion info for IPython, is executing all attributes while getting this info. And doing a profile on your example case, it seems that it is df2.T that is the one very slow attribute of the DataFrame causing this slow tab completion (that's also the reason that df2.a<TAB> doesn't have the issue).
See eg davidhalter/jedi#1383 for some context (although that was about showing deprecation warnings when executing attributes, but it's related to the same root cause).

jorisvandenbossche · 2020-11-19T08:30:20Z

Now, in the meantime, jedi has improved (see eg davidhalter/jedi#520 (comment)), and I was myself not using the latest version. After upgrading jedi in my local environment from 0.15 to 0.17, the issue mostly went away.

@flcong can you check the version of jedi that you are using? (import jedi; jedi.__version__)

flcong · 2020-11-20T01:15:19Z

Now, in the meantime, jedi has improved (see eg davidhalter/jedi#520 (comment)), and I was myself not using the latest version. After upgrading jedi in my local environment from 0.15 to 0.17, the issue mostly went away.

@flcong can you check the version of jedi that you are using? (import jedi; jedi.__version__)

Thank you. It's the latest version I guess: 0.17.2, but df2.T<TAB> still gets stuck for several second, but I think it's fine.

hwalinga · 2020-12-05T19:02:04Z

There has been improvements in Jedi, but there are still cases in which Jedi is still really slow: davidhalter/jedi#1696

And as it seems this won't improve much in the future: davidhalter/jedi#1059 (comment)

Seems like pandas is a bit too complex too handle, and the current implementation of Jedi isn't designed with that in mind.

jbrockmendel · 2021-06-19T03:57:38Z

Is this actionable on our end?

mroeschke · 2021-08-14T05:25:50Z

Sounds like the most recent versions of jedi has somewhat ameliorated this issue, and it's not to evident what pandas could do since auto-completion is handled by jedi. Closing, but happy to reopen if someone could identify what in pandas would need fixing to enhance the performance.

flcong mentioned this issue Nov 20, 2020

Slow auto-completion for large DataFrame containing strings jorgenschaefer/elpy#1853

Closed

jbrockmendel added the Dependencies Required and optional dependencies label Jun 19, 2021

mroeschke closed this as completed Aug 14, 2021

jdtsmith mentioned this issue Dec 7, 2021

Jedi is slow for pandas completion for pd.read_csv dataframes davidhalter/jedi#1696

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow autocompletion in python/ipython console for large DataFrame containing strings #37947

Slow autocompletion in python/ipython console for large DataFrame containing strings #37947

flcong commented Nov 18, 2020

jreback commented Nov 19, 2020 •

edited

Loading

flcong commented Nov 19, 2020 •

edited by jorisvandenbossche

Loading

jorisvandenbossche commented Nov 19, 2020

jorisvandenbossche commented Nov 19, 2020

flcong commented Nov 20, 2020

hwalinga commented Dec 5, 2020

jbrockmendel commented Jun 19, 2021

mroeschke commented Aug 14, 2021

Slow autocompletion in python/ipython console for large DataFrame containing strings #37947

Slow autocompletion in python/ipython console for large DataFrame containing strings #37947

Comments

flcong commented Nov 18, 2020

jreback commented Nov 19, 2020 • edited Loading

flcong commented Nov 19, 2020 • edited by jorisvandenbossche Loading

jorisvandenbossche commented Nov 19, 2020

jorisvandenbossche commented Nov 19, 2020

flcong commented Nov 20, 2020

hwalinga commented Dec 5, 2020

jbrockmendel commented Jun 19, 2021

mroeschke commented Aug 14, 2021

jreback commented Nov 19, 2020 •

edited

Loading

flcong commented Nov 19, 2020 •

edited by jorisvandenbossche

Loading