Create a Database Index for Jedi #1059

davidhalter · 2018-03-12T20:09:49Z

For a lot of things (especially usages) jedi's completely lazy approach is not good enough. It is probably better to use a database index cache. The index will basically be a graph that saves all the type inference findings.

This is just an issue for discussion and collection of possible ideas.

hajs · 2018-03-21T15:58:41Z

Recently I wrote my own source indexer using jedi.
I used a sqlite databases with four tables: file, name, definition and reference.
Indexing stdlib took about four minutes using multiprocessing on four cores,

It would be great if the index could be exposed to the public api in some way.
Besides finding all usages of a definition, a database could be used to offer auto imports and fast fuzzy auto-complete.

davidhalter · 2018-03-21T22:06:39Z

That's actually pretty fast. Did you index all the subfolders (asyncio, multiprocessing, json, etc)?

Also can you post the script? I wonder if it's "complete".

davidhalter · 2018-12-23T16:08:34Z

@hajs I would still be interested :)

jakubzitny · 2020-07-14T12:25:54Z

Are there any next steps on this issue? Maybe one more friendly ping for @hajs's solution..

CJ-Wright · 2020-12-01T01:56:16Z

@davidhalter are you still interested in this idea? I'm currently planning on building out a database of all potential imports using Jedi for the symbol inspection. Would you be interested in the issues I find? If so, what would be the best format to report errors in?

davidhalter · 2020-12-01T17:21:50Z

@davidhalter are you still interested in this idea? I'm currently planning on building out a database of all potential imports using Jedi for the symbol inspection. Would you be interested in the issues I find? If so, what would be the best format to report errors in?

I'm definitely interested in your findings, but as I said above, it's pretty unlikely that Jedi's architecture is going to change a lot. There are a lot of underlying issues. I'm currently rewriting parso in Rust and having a great time (it's not open source yet, though).

AlanSwenson · 2021-01-15T15:51:18Z

@davidhalter very interested in contributing to rust version of parso and Jedi when you open them up.

davidhalter · 2021-01-15T21:40:26Z

Will post it here once it's in a good shape. However I want to do a lot of things the right way this time so I'm keeping it private for now.

I have been working on the parser for the last three months, but I unfortunately don't have a lot of time for it.

krassowski · 2021-02-07T16:17:34Z

Thank you for working on this! In the meantime, would it be appropriate to have get_signatures cached the same way as _get_docstring_signature is being cached? (as in bf446f2)

I profiled some language servers using jedi and it appears that get_signatures call is the major bottleneck. I understand that for an improvement I could patch those to use _get_docstring_signature, but it includes type annotations and is a part of a private interface so it is not ideal. Would adding get_cached_signature or get_cached_signatures be in scope, or should we just wait for the upcoming database index?

davidhalter · 2021-02-07T20:08:39Z

I profiled some language servers using jedi and it appears that get_signatures call is the major bottleneck

What did you profile? Can you share the results?

krassowski · 2021-02-07T20:15:45Z

Please see palantir/python-language-server#823 (comment)

davidhalter · 2021-02-07T20:23:10Z

This is a tricky one.

Basically it's definitely not possible to do this in a general way, because the Jedi caches need to be invalidated somehow if a library changes. This is exactly what this issue is about.

However, I thought that we could maybe use the cache just if is_big_annoying_library is true (that would probably help) in Jedi and just cache signatures in those cases. But even that is probably a bad idea. Jedi is not built to deal with multiple inference_state instances.

I think I would just argue that get_signatures is not built to be used for every completion. It's something you should use for maybe 10 results or ideally only for one.

krassowski · 2021-02-07T21:00:59Z

Thank you for getting back to me. I worked around this deferring the call to get_signatures() by calling it in a separate thread and caching:

palantir/python-language-server@develop...krassowski:feature/asynchronous/labels-cache

It was tricky, especially with the jedi being not exactly thread-safe but adding a lock solves the issue. I decided to use a custom cache key instead of the default hash implementation (to avoid inclusion of inference_state) and to re-schedule refresh at every user action rather than guess when to invalidate.

Your replay will certainly help to plan for the future, and potentially to upstream such an approach. I got down to <<1 second for numpy. It might not be perfect, but possible a good proof of concept of how one could approach this.

davidhalter · 2021-02-07T21:08:03Z

Note that with such an approach you're also losing some of Jedi's correctness. I would really recommend to use something like https://github.com/davidhalter/jedi/blob/master/jedi/inference/helpers.py#L194-L202 and only apply caching to those libraries.

In general almost all other libraries are not an issue, because they do not export a thousand functions in one module. The culprits are always pandas, numpy, tensorflow and matplotlib.

krassowski · 2021-02-08T01:13:03Z

Thank you. I gave up on the asynchronous approach, and followed your advice to treat the likes of numpy differently.

It's something you should use for maybe 10 results or ideally only for one.

I understand. I will try to nudge the popular language servers in this direction (but it might take time as it is only possible with recent LSP 3.16 and many clients believe that the label - which is what the signature is being used for - should be available from the beginning). Nonetheless, I will be very happy to see any performance improvements to get_signatures().

iustin94 · 2021-08-27T11:39:07Z

This sounds like a very interesting task, I'm not sure what the etiquette is in regards to helping out but I would be interested in contributing to the rust re-implementation of Jedi 👍

davidhalter mentioned this issue Mar 12, 2018

No definitions found for certain scenario #880

Closed

davidhalter added the discussion label Mar 13, 2018

davidhalter mentioned this issue Mar 22, 2018

Speed issues with benchmarks #910

Closed

This was referenced Sep 30, 2018

Numpy and Scipy slow on completion. #1218

Closed

Slow performance of Jedi with large libraries (cv2/PIL) imported #1195

Closed

davidhalter mentioned this issue Dec 23, 2018

Scope of the usages #1258

Closed

This was referenced Feb 24, 2019

linter doesn't dig context managers #1280

Closed

Find all references does not work on function definition #1047

Closed

OSError: [Errno 24] Too many open files #1293

Closed

davidhalter mentioned this issue Mar 22, 2019

I found a question, please answer :) #1301

Closed

davidhalter mentioned this issue Apr 5, 2019

AssertionError: speed issue test_os_path_join #1306

Closed

davidhalter mentioned this issue Jun 12, 2019

Accessing completion attributes is very slow (tensorflow) #1116

Closed

davidhalter mentioned this issue Jun 22, 2019

Docs: Static analysis library capabilities #1269

Closed

davidhalter mentioned this issue Aug 11, 2019

Jedi on sublimeText3 too slow. #1251

Closed

davidhalter mentioned this issue Sep 19, 2019

Go to method definition in Python modules fails with "No definition found for '<method_name>'" message #1402

Closed

scarab5q mentioned this issue Oct 4, 2019

Support option for virtual/database filesystems #1419

Closed

kirk86 mentioned this issue Oct 11, 2019

painfully slow auto completions #1422

Closed

davidhalter mentioned this issue Oct 25, 2019

lots of lambda definitions slow things to a crawl davidhalter/jedi-vim#330

Closed

This was referenced Dec 13, 2019

script.usages not working in some circumstances #744

Closed

Discussion: Jedi wrapper module using server-client architecture to deal with multiprocessing, async interface, multiple Python version, cache lock, etc. #385

Closed

davidhalter added the database-index Needs a database index/Rewrite in Rust (#1059) label Dec 14, 2019

This was referenced Jan 5, 2020

Slow pandas completion #520

Closed

Understand Side Effects on Classes/Instances #1056

Closed

davidhalter mentioned this issue Feb 3, 2020

Scikit-learn has autocomplete type "module" when it should seemingly be "function" #1486

Closed

davidhalter mentioned this issue Feb 20, 2020

How to analyze a folder with multiple python files? #1509

Closed

davidhalter mentioned this issue Mar 5, 2020

Show usages (Leader+n) shows only references in the current file davidhalter/jedi-vim#999

Closed

davidhalter mentioned this issue Oct 12, 2020

Keep in memory cache for longer and possibly as long as needed #1679

Closed

pappasam mentioned this issue Oct 26, 2020

Feature proposal: serve completion items from cache pappasam/jedi-language-server#45

Closed

davidhalter mentioned this issue Dec 5, 2020

Jedi is slow for pandas completion for pd.read_csv dataframes #1696

Closed

hwalinga mentioned this issue Dec 5, 2020

Slow autocompletion in python/ipython console for large DataFrame containing strings pandas-dev/pandas#37947

Closed

davidhalter mentioned this issue Jan 9, 2021

Show auto-completion items together with their src path #1728

Closed

krassowski mentioned this issue Feb 7, 2021

Slow completion (maybe jedi cache) palantir/python-language-server#823

Closed

This comment has been minimized.

Sign in to view

This was referenced Mar 24, 2021

Infer starts returning empty results unexpectedly #1761

Closed

Argument suggestions for some functions in the pytorch module are duplicated when using Jedi microsoft/vscode-python#12503

Closed

krassowski mentioned this issue Apr 28, 2021

Implement cached label resolution and label resolution limit python-lsp/python-lsp-server#26

Merged

davidhalter mentioned this issue May 1, 2022

Can I get a call stack for a variable? #1854

Closed

davidhalter mentioned this issue Jul 7, 2022

autoimport e.g. from os import * #1866

Closed

davidhalter mentioned this issue Aug 27, 2022

too slow for multiple call this api #1872

Closed

davidhalter mentioned this issue Nov 22, 2022

get_signature() for decorated function faulty with manipulated parameter list #1894

Open

davidhalter mentioned this issue Jan 13, 2023

Jedi+ycm doesn't work with boto3-stubs #1904

Closed

tsugumi-sys mentioned this issue Feb 20, 2023

Do you have any idea for auto-completion feature of ruff-lsp? astral-sh/ruff-lsp#47

Closed

davidhalter mentioned this issue Oct 14, 2023

Cache does not seem to be caching. Slow autocomplete. davidhalter/jedi-vim#1116

Closed

davidhalter mentioned this issue Dec 28, 2023

Slow completion for python-igraph depending on current working directory davidhalter/jedi-vim#1118

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create a Database Index for Jedi #1059

Create a Database Index for Jedi #1059

davidhalter commented Mar 12, 2018

hajs commented Mar 21, 2018

davidhalter commented Mar 21, 2018

davidhalter commented Dec 23, 2018

jakubzitny commented Jul 14, 2020

CJ-Wright commented Dec 1, 2020

davidhalter commented Dec 1, 2020 •

edited

Loading

AlanSwenson commented Jan 15, 2021

davidhalter commented Jan 15, 2021

krassowski commented Feb 7, 2021

davidhalter commented Feb 7, 2021

krassowski commented Feb 7, 2021

davidhalter commented Feb 7, 2021

krassowski commented Feb 7, 2021

davidhalter commented Feb 7, 2021

krassowski commented Feb 8, 2021

This comment has been minimized.

iustin94 commented Aug 27, 2021

Create a Database Index for Jedi #1059

Create a Database Index for Jedi #1059

Comments

davidhalter commented Mar 12, 2018

hajs commented Mar 21, 2018

davidhalter commented Mar 21, 2018

davidhalter commented Dec 23, 2018

jakubzitny commented Jul 14, 2020

CJ-Wright commented Dec 1, 2020

davidhalter commented Dec 1, 2020 • edited Loading

AlanSwenson commented Jan 15, 2021

davidhalter commented Jan 15, 2021

krassowski commented Feb 7, 2021

davidhalter commented Feb 7, 2021

krassowski commented Feb 7, 2021

davidhalter commented Feb 7, 2021

krassowski commented Feb 7, 2021

davidhalter commented Feb 7, 2021

krassowski commented Feb 8, 2021

This comment has been minimized.

iustin94 commented Aug 27, 2021

davidhalter commented Dec 1, 2020 •

edited

Loading