Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a Database Index for Jedi #1059

Open
davidhalter opened this issue Mar 12, 2018 · 18 comments
Open

Create a Database Index for Jedi #1059

davidhalter opened this issue Mar 12, 2018 · 18 comments
Labels
database-index Needs a database index/Rewrite in Rust (#1059) discussion

Comments

@davidhalter
Copy link
Owner

For a lot of things (especially usages) jedi's completely lazy approach is not good enough. It is probably better to use a database index cache. The index will basically be a graph that saves all the type inference findings.

This is just an issue for discussion and collection of possible ideas.

@hajs
Copy link

hajs commented Mar 21, 2018

Recently I wrote my own source indexer using jedi.
I used a sqlite databases with four tables: file, name, definition and reference.
Indexing stdlib took about four minutes using multiprocessing on four cores,

It would be great if the index could be exposed to the public api in some way.
Besides finding all usages of a definition, a database could be used to offer auto imports and fast fuzzy auto-complete.

@davidhalter
Copy link
Owner Author

That's actually pretty fast. Did you index all the subfolders (asyncio, multiprocessing, json, etc)?

Also can you post the script? I wonder if it's "complete".

@davidhalter
Copy link
Owner Author

@hajs I would still be interested :)

@davidhalter davidhalter added the database-index Needs a database index/Rewrite in Rust (#1059) label Dec 14, 2019
@jakubzitny
Copy link

Are there any next steps on this issue? Maybe one more friendly ping for @hajs's solution..

@CJ-Wright
Copy link

@davidhalter are you still interested in this idea? I'm currently planning on building out a database of all potential imports using Jedi for the symbol inspection. Would you be interested in the issues I find? If so, what would be the best format to report errors in?

@davidhalter
Copy link
Owner Author

davidhalter commented Dec 1, 2020

@davidhalter are you still interested in this idea? I'm currently planning on building out a database of all potential imports using Jedi for the symbol inspection. Would you be interested in the issues I find? If so, what would be the best format to report errors in?

I'm definitely interested in your findings, but as I said above, it's pretty unlikely that Jedi's architecture is going to change a lot. There are a lot of underlying issues. I'm currently rewriting parso in Rust and having a great time (it's not open source yet, though).

@AlanSwenson
Copy link

@davidhalter very interested in contributing to rust version of parso and Jedi when you open them up.

@davidhalter
Copy link
Owner Author

Will post it here once it's in a good shape. However I want to do a lot of things the right way this time so I'm keeping it private for now.

I have been working on the parser for the last three months, but I unfortunately don't have a lot of time for it.

@krassowski
Copy link

Thank you for working on this! In the meantime, would it be appropriate to have get_signatures cached the same way as _get_docstring_signature is being cached? (as in bf446f2)

I profiled some language servers using jedi and it appears that get_signatures call is the major bottleneck. I understand that for an improvement I could patch those to use _get_docstring_signature, but it includes type annotations and is a part of a private interface so it is not ideal. Would adding get_cached_signature or get_cached_signatures be in scope, or should we just wait for the upcoming database index?

@davidhalter
Copy link
Owner Author

I profiled some language servers using jedi and it appears that get_signatures call is the major bottleneck

What did you profile? Can you share the results?

@krassowski
Copy link

@davidhalter
Copy link
Owner Author

This is a tricky one.

Basically it's definitely not possible to do this in a general way, because the Jedi caches need to be invalidated somehow if a library changes. This is exactly what this issue is about.

However, I thought that we could maybe use the cache just if is_big_annoying_library is true (that would probably help) in Jedi and just cache signatures in those cases. But even that is probably a bad idea. Jedi is not built to deal with multiple inference_state instances.

I think I would just argue that get_signatures is not built to be used for every completion. It's something you should use for maybe 10 results or ideally only for one.

@krassowski
Copy link

Thank you for getting back to me. I worked around this deferring the call to get_signatures() by calling it in a separate thread and caching:

palantir/python-language-server@develop...krassowski:feature/asynchronous/labels-cache

It was tricky, especially with the jedi being not exactly thread-safe but adding a lock solves the issue. I decided to use a custom cache key instead of the default hash implementation (to avoid inclusion of inference_state) and to re-schedule refresh at every user action rather than guess when to invalidate.

Your replay will certainly help to plan for the future, and potentially to upstream such an approach. I got down to <<1 second for numpy. It might not be perfect, but possible a good proof of concept of how one could approach this.

@davidhalter
Copy link
Owner Author

Note that with such an approach you're also losing some of Jedi's correctness. I would really recommend to use something like https://github.com/davidhalter/jedi/blob/master/jedi/inference/helpers.py#L194-L202 and only apply caching to those libraries.

In general almost all other libraries are not an issue, because they do not export a thousand functions in one module. The culprits are always pandas, numpy, tensorflow and matplotlib.

@krassowski
Copy link

Thank you. I gave up on the asynchronous approach, and followed your advice to treat the likes of numpy differently.

It's something you should use for maybe 10 results or ideally only for one.

I understand. I will try to nudge the popular language servers in this direction (but it might take time as it is only possible with recent LSP 3.16 and many clients believe that the label - which is what the signature is being used for - should be available from the beginning). Nonetheless, I will be very happy to see any performance improvements to get_signatures().

@vikigenius

This comment has been minimized.

@iustin94
Copy link

This sounds like a very interesting task, I'm not sure what the etiquette is in regards to helping out but I would be interested in contributing to the rust re-implementation of Jedi 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
database-index Needs a database index/Rewrite in Rust (#1059) discussion
Projects
None yet
Development

No branches or pull requests

8 participants