-
Notifications
You must be signed in to change notification settings - Fork 233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a cache mechanism for the PPX phase #1584
Conversation
Is there a way to manually coontrol this cache or toggle it somehow? I'm concerned that it isn't correct for ppx that relies on |
That's a very good point, @rgrinberg. Thanks! Do you mean toggling the cache from the outside, i.e. by the instance that's calling Merlin? How would that instance know whether there are preprocessor dependencies? Maybe instead of toggling it from the outside, another option could be the following (let me know if that's already what you had in mind). We could extend the So for new versions of What do you think? Is that what you had in mind? |
I meant just being able to turn it off or control it when using merlin as a library. In lsp, I would use dune rpc to tell merlin when a preprocessed AST would be invalidated.
That would be very difficult because it would mean re-implementing dune's DSL specification for deps inside merlin. |
That's interesting! Thanks for explaining. We also need a way to make this work correctly in a different context than lsp, though. I'm not very familiar with the different communication layers that are done in different contexts for Merlin to get all its config information. I'll ask @voodoos about that and come back to you. |
If |
This PR reminded me of alternative method to speed up ppx that works without relying on cache hits (we won't get the cache hits in many cases). One of the reasons why ppx (and any other preprocessor) is slow is the cost of launching an external process every single time the input changes. I bet we could speed up preprocessing materially if we kept the ppx executable persistent, and just dispatched pp requests to it. It would require some changes to ppxlib to allow drivers to run in a Even in this server mode however, we'll need some co-operation from dune to re-run the server whenever the binary changes. On the lsp side, I have no problems with managing such ppx subprocesses. But perhaps it wouldn't work well with merlin? In any case, I think it does make sense to make merlin a little more flexible and allow callers to customize how they would like to run preprocessors. |
A server mode for the driver would be a nice feature. Additionally the ppxes currently write the ast to a file on disk that is then read by merlin. Skipping that intermediary file might also speed up things a little ? |
@ddickstein, yes, I also think that's a good idea, for sure "as a start" and possibly permanently. In fact, that's what I understood @rgrinberg was referring to. About the "speeding up the PPX phase by improving the interaction between Merlin and the PPX driver" idea: there are three different aspects to that point. As you, @rgrinberg, have pointed out, spawning a new process each time is time-consuming. As you, @voodoos, have pointed out, passing the AST between Merlin and the PPX process via disc is time-consuming. And the third one is that Merlin marshalling and the PPX process unmarshalling (and later vice-versa) is time-consuming. For the first, we can have the PPX driver run in server mode. For the second, we could pass the AST via a serial stream or similar. For all three of them, including the third, we could not do those two things and instead embed a PPX driver into Merlin that dynamically links the PPXs (that driver could be similar to the one I've been analyzing some Merlin latency data during the last weeks, though, and I think there are several far more impactful things we can do currently to improve the PPX phase latency. One is that several AST passes could/should be gotten rid of, particularly for any Merlin query different than To give an idea about why I think those things are more impactful than improving the interaction between Merlin and the PPX driver: I had a look at the system time Merlin and its child process spend during the PPX phase, and it's marginal: I haven't found a case in which it was over 20 msecs, yet. In the cases in which the PPX phase is a bottleneck, it spends up to 500 msecs. And that without a cache, so it does so every single time for every single query. |
Btw, @rgrinberg, I'm very curious about your side-note
Quantitatively, what do you have in mind when you say "in many cases"? You have a way better overview of Merlin/LSP user behavior than me, so it would be interesting for me to understand what you have in mind (this is a very simple first step for the cache. improving it is one of the many things I might do next to improve Merlin performance). Simplifying (quite) a bit, I thought that we could distinguish between a user doing code analysis and a user coding. For a user doing code analysis, there will be lots of |
Is that really what utop does? I'm pretty sure it just uses compiler-libs which executes the external processes.
You have the right idea. When the user is actively writing code, we can't rely on cache hits. As you've mentioned, providing completion will be triggered heavily in this phase, so we'll be doing a lot of preprocessing without a lot of cache hits. |
Wouldn't the server mode already cover that? We'd be passing the AST over the socket. This should be faster than using a temporary file. |
I do think that that's what utop does for derivers.
That (or something similar) is what it does for PPXs registered with
Not considering this sounds good to me. I was considering it before you mentioned the idea of launching Ppxlib in server mode. However, I think the server mode idea is better! Also about that:
Good point! But in any case, as I've mentioned before, I think we have lower-hanging fruits for now. So, to come back to the cache:
Ok, providing completion is indeed a problem we'll need to look into separately. That problem also depends on the editor I guess. Btw, that reminds me of a question I have: does the VSCode plug-in or the lsp-server in general have a way to cancel queries? |
The LSP protocol supports the client cancelling a request whenever the client is no longer interested in the response. It is relevant to ppx as well. For example, if the client requests diagnostics and immediately edits the document, the diagnostics request would be cancelled and we'd do well to stop whatever preprocessing is going on. Unfortunately, that means sending a signal to stop the ppx driver process, usually followed by re-running it on subsequent requests. That's all pretty inefficient, and we could do better without all these processes. |
Cool! Does VSCode ever cancel old |
I'm not sure unfortunately. I would hope so. I would also not recommend relying on this cancellation mechanism for performance. It can definitely help us, but unfortunately not all editors (their corresponding lsp clients) have it and their implementations are not really consistent with each other. |
I'm in favor of @ddickstein's suggestion to ignore the cache if This "ppx server" idea seems great too - for files that are being actively edited, it'll be necessary to reduce the cost of running necessary ppxes - but it's not going to be easy; some unknown number of ppxes use global state under the assumption that it's going to be invoked on a single ml/mli file at a time. |
💯
Hmm, this might be important for me to clarify, since I might be missing something. Could you explain this a bit more? I'm not sure what you mean by the "assumption that it's going to be invoked on a single ml/mli file at a time.". Isn't the PPX driver always just invoked on the AST (ml/mli file) of the current buffer? Edit: Nvm my question. It's clear and a very good point! End edit Btw, I also think that if we make the PPX driver run in server mode, we can later improve the performance of the PPX phase significantly: we can make the PPX driver conscious of whether all the PPXs it's linking are "state-less". In the case of a state-less driver, we could invoke the driver on each structure/signature item separately (thanks to the server mode) instead of invoking it on the whole AST. So we could implement a structure/signature item based PPX cache, which keeps track of all expanded items and only re-expands the ones that were modified. That cache would even be more powerful than the typer cache of the current buffer,, since it really only need to re-expand the modified items, whereas the typer cache needs to re-type all items from the first modified item on. I haven't thought this idea though yet, though. It just came to my mind when reading about the server idea. Does it go in the same direction as what you have in mind, @TyOverby? |
I've made the cache togglable now. @Ulysse, could you review when you have a moment? Btw, there seem to be two different workflows to add new config values to merlin: via a new config directive, such as exclude-query-dir; or simply putting it into the generic bucket of flags, such as Also, I've decided to dump the new config value "use-ppx-cache" when someone runs |
Btw, I forgot to mention: I've tested enabling the cache on a project with a |
Could you write a test illustrating the cache behavior in the test suite ? Note about test organization: so far the testsuite has only one test for ppxs named |
With "hidden dependency", are you referring to PPX dependencies? Whether PPX dependencies influence the PPX cache, doesn't depend on Merlin with this implementation, but on the build system. Merlin only knows whether the cache is disabled or enabled, not why: the PPX cache is disabled by default and can be enabled by the merlin config, i.e. by the build system (the build system should enable it if and only if there are no PPX dependencies). What Merlin knows about is:
I can add tests to show that when any of the above points is the case, the parsetree is expanded instead of being pulled from the cache. That would be a different approach of test than what you've suggested (if I've understood you correctly): iiuc, you were suggesting an "add a test showing the limits of the current implementation -> fix the implementation -> see the fix in the test" approach. Is that right? I'm suggesting to just add a "test that the implementation is right" approach. Do you want me to add some test like that? |
Yes, we should have tests that show correct cache invalidation when it is expected that Merlin knows about it. My suggestion was about showing that there is, in fact, a cache hit when it is no invalidated. I was thinking that we could trick Merlin in showing that it does use a cache by making a change that would not trigger cache invalidation because of the known limitation. But I guess we could simply print the log in the test... Anyway, you should start by writing the tests you think make sense :-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did a first rough review.
I've added the tests now, both the ones I had in mind and the one you had in mind, @voodoos (I think. it's the one at the end of the new test file). The test structure has ended up different than you've suggested. I don't have an opinion on the structure tbh. So if you'd like it the way you've suggested, let me know! I think it should be very simple a sec to change it. And to come back to whether to add the new use-ppx-cache config via the FLG directive or a new directive. I've done the first, but that might be a problem: When using .merlin-files, then Merlin Does anyone know whether that's also a problem when using dune? I'd like to think that the config compatibility story is better for the Merlin-Dune communication, but I don't have any idea. @voodoos, do you know? Also, @rudi, I'm assuming that for ocaml-lsp this isn't a problem. If it is, please let me know! |
Good catch ! I actually thought that when using I see three possible ways to go:
I am personally be in favor of 3 since we control which flags Dune sends to Merlin and if a wrong compiler flag is given it will be already told to the user by the compiler itself when building the project. This way future flag update won't always require a new conflict. Also, what should happen if multiple ppxes are defined ? I guess we want to keep thing simple and have the flag apply to all of them ? |
You're right! I've just double-checked and, when coming from a .merlin-file, unknown directives are indeed also tagged as errors. Now to make sure: You are sure that when coming from dune, they aren't tagged as errors, right? (I don't know how to check that myself without modifying dune). About the 3 options you're proposing: I agree that 3. is the best option. I just hope it isn't too much work.
When saying "multiple PPXs", are you referring to multiple PPXs inside the same PPX driver (dune workflow) or are you literally referring to multiple PPX binaries (possible manual workflow / rescript workflow)? If it's the first, there's no choice. If it's the latter: I think it's nowadays unusual enough that it's ok to disable the whole cache when there's one PPX binary with PPX dependencies. PD: I've made all the changes from your first review. |
it should be yes, at least, that was the clear intent: https://github.com/ocaml/merlin/blob/master/src/kernel/mconfig_dot.ml#L238-L240 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more comments but the code actually looks quite good.
The test is also very clear and exhaustive 👍
Thanks for the nice review comments, @voodoos! I've addressed them and have also changed the toggling config workflow: it's now via a new directive instead of a flag (to avoid messing with old merlin - new dune compatibility). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, can you rebase and clean the history a bit before the merge ? You should also add a changelog entry. Thanks !
This adds an all-or-nothing cache for the reader phase and for the PPX phase; the important one being the latter. The cache is disabled by default. It can be enabled via the new USE_PPX_CACHE conifg directive, which is also added in this commit. The build system should add the USE_PPX_CACHE directive if and only if the project doesn't use PPX depnedencies.
Cool! I've just rebased, squashed the commits, and added a changelog entry. After rebasing, I get a small formatting diff in |
from pitag-ha/ppx-cache
CHANGES: Fri May 26 15:23:42 CEST 2023 + merlin binary - Allow monadic IO in dot protocol (ocaml/merlin#1581) - Add a `scope` option to the `occurrences` command in preparation for the upcoming `project-wide-occurrences` feature (ocaml/merlin#1596) - Construct bool-typed holes as `false` instead of `true` in the `construct` command, for consistency (ocaml/merlin#1599). - Add a hook to configure system command for spawning ppxes when Merlin is used as a library. (ocaml/merlin#1585) - Implement an all-or-nothing cache for the PPX phase (ocaml/merlin#1584) - Cleanup functors caches when backtracking, to avoid memory leaks (ocaml/merlin#1609, fixes ocaml/merlin#1529 and ocaml-lsp#1032) - Fix `construct` results ordering for sum types sand poly variants (ocaml/merlin#1603) - Fix object method completion not working (ocaml/merlin#1606, fixes ocaml/merlin#1575) - Improve context detection for package types (ocaml/merlin#1608, fixes ocaml/merlin#1607) - Fix incorrect locations for string literals (ocaml/merlin#1574) - Fixed an issue that caused `errors` to erroneously alert about missing `cmi` files (ocaml/merlin#1577) - Prevent destruct from crashing on closed variant types (ocaml/merlin#1602, fixes ocaml/merlin#1601) - Improve longident parsing (ocaml/merlin#1612, fixes ocaml/merlin#945) + editor modes - emacs: call the user's configured completion UI in `merlin-construct` (ocaml/merlin#1598) + test suite - Add missing dependency to a test using ppxlib (ocaml/merlin#1583) - Add tests for the new PPX phase cache (ocaml/merlin#1584) - Add and update tests for `construct` ordering (ocaml/merlin#1603)
CHANGES: Fri May 26 15:23:42 CEST 2023 + merlin binary - Allow monadic IO in dot protocol (ocaml/merlin#1581) - Add a `scope` option to the `occurrences` command in preparation for the upcoming `project-wide-occurrences` feature (ocaml/merlin#1596) - Construct bool-typed holes as `false` instead of `true` in the `construct` command, for consistency (ocaml/merlin#1599). - Add a hook to configure system command for spawning ppxes when Merlin is used as a library. (ocaml/merlin#1585) - Implement an all-or-nothing cache for the PPX phase (ocaml/merlin#1584) - Cleanup functors caches when backtracking, to avoid memory leaks (ocaml/merlin#1609, fixes ocaml/merlin#1529 and ocaml-lsp#1032) - Fix `construct` results ordering for sum types sand poly variants (ocaml/merlin#1603) - Fix object method completion not working (ocaml/merlin#1606, fixes ocaml/merlin#1575) - Improve context detection for package types (ocaml/merlin#1608, fixes ocaml/merlin#1607) - Fix incorrect locations for string literals (ocaml/merlin#1574) - Fixed an issue that caused `errors` to erroneously alert about missing `cmi` files (ocaml/merlin#1577) - Prevent destruct from crashing on closed variant types (ocaml/merlin#1602, fixes ocaml/merlin#1601) - Improve longident parsing (ocaml/merlin#1612, fixes ocaml/merlin#945) + editor modes - emacs: call the user's configured completion UI in `merlin-construct` (ocaml/merlin#1598) + test suite - Add missing dependency to a test using ppxlib (ocaml/merlin#1583) - Add tests for the new PPX phase cache (ocaml/merlin#1584) - Add and update tests for `construct` ordering (ocaml/merlin#1603)
CHANGES: Fri May 26 15:23:42 CEST 2023 + merlin binary - Allow monadic IO in dot protocol (ocaml/merlin#1581) - Add a `scope` option to the `occurrences` command in preparation for the upcoming `project-wide-occurrences` feature (ocaml/merlin#1596) - Construct bool-typed holes as `false` instead of `true` in the `construct` command, for consistency (ocaml/merlin#1599). - Add a hook to configure system command for spawning ppxes when Merlin is used as a library. (ocaml/merlin#1585) - Implement an all-or-nothing cache for the PPX phase (ocaml/merlin#1584) - Cleanup functors caches when backtracking, to avoid memory leaks (ocaml/merlin#1609, fixes ocaml/merlin#1529 and ocaml-lsp#1032) - Fix `construct` results ordering for sum types sand poly variants (ocaml/merlin#1603) - Fix object method completion not working (ocaml/merlin#1606, fixes ocaml/merlin#1575) - Improve context detection for package types (ocaml/merlin#1608, fixes ocaml/merlin#1607) - Fix incorrect locations for string literals (ocaml/merlin#1574) - Fixed an issue that caused `errors` to erroneously alert about missing `cmi` files (ocaml/merlin#1577) - Prevent destruct from crashing on closed variant types (ocaml/merlin#1602, fixes ocaml/merlin#1601) - Improve longident parsing (ocaml/merlin#1612, fixes ocaml/merlin#945) + editor modes - emacs: call the user's configured completion UI in `merlin-construct` (ocaml/merlin#1598) + test suite - Add missing dependency to a test using ppxlib (ocaml/merlin#1583) - Add tests for the new PPX phase cache (ocaml/merlin#1584) - Add and update tests for `construct` ordering (ocaml/merlin#1603) [new release] merlin, merlin-lib and dot-merlin-reader (4.9-414) CHANGES: Fri May 26 15:23:42 CEST 2023 + merlin binary - Allow monadic IO in dot protocol (ocaml/merlin#1581) - Add a `scope` option to the `occurrences` command in preparation for the upcoming `project-wide-occurrences` feature (ocaml/merlin#1596) - Construct bool-typed holes as `false` instead of `true` in the `construct` command, for consistency (ocaml/merlin#1599). - Add a hook to configure system command for spawning ppxes when Merlin is used as a library. (ocaml/merlin#1585) - Implement an all-or-nothing cache for the PPX phase (ocaml/merlin#1584) - Cleanup functors caches when backtracking, to avoid memory leaks (ocaml/merlin#1609, fixes ocaml/merlin#1529 and ocaml-lsp#1032) - Fix `construct` results ordering for sum types sand poly variants (ocaml/merlin#1603) - Fix object method completion not working (ocaml/merlin#1606, fixes ocaml/merlin#1575) - Improve context detection for package types (ocaml/merlin#1608, fixes ocaml/merlin#1607) - Fix incorrect locations for string literals (ocaml/merlin#1574) - Fixed an issue that caused `errors` to erroneously alert about missing `cmi` files (ocaml/merlin#1577) - Prevent destruct from crashing on closed variant types (ocaml/merlin#1602, fixes ocaml/merlin#1601) - Improve longident parsing (ocaml/merlin#1612, fixes ocaml/merlin#945) + editor modes - emacs: call the user's configured completion UI in `merlin-construct` (ocaml/merlin#1598) + test suite - Add missing dependency to a test using ppxlib (ocaml/merlin#1583) - Add tests for the new PPX phase cache (ocaml/merlin#1584) - Add and update tests for `construct` ordering (ocaml/merlin#1603)
CHANGES: unreleased + merlin binary - Preview support for OCaml 5.1-alpha1. Short path is temporary disabled and inline records might not behave as expected. - Allow monadic IO in dot protocol (ocaml/merlin#1581) - Add a `scope` option to the `occurrences` command in preparation for the upcoming `project-wide-occurrences` feature (ocaml/merlin#1596) - Construct bool-typed holes as `false` instead of `true` in the `construct` command, for consistency (ocaml/merlin#1599). - Add a hook to configure system command for spawning ppxes when Merlin is used as a library. (ocaml/merlin#1585) - Implement an all-or-nothing cache for the PPX phase (ocaml/merlin#1584) - Cleanup functors caches when backtracking, to avoid memory leaks (ocaml/merlin#1609, fixes ocaml/merlin#1529 and ocaml-lsp#1032) - Fix `construct` results ordering for sum types sand poly variants (ocaml/merlin#1603) - Fix object method completion not working (ocaml/merlin#1606, fixes ocaml/merlin#1575) - Improve context detection for package types (ocaml/merlin#1608, fixes ocaml/merlin#1607) - Fix incorrect locations for string literals (ocaml/merlin#1574) - Fixed an issue that caused `errors` to erroneously alert about missing `cmi` files (ocaml/merlin#1577) - Prevent destruct from crashing on closed variant types (ocaml/merlin#1602, fixes ocaml/merlin#1601) - Improve longident parsing (ocaml/merlin#1612, fixes ocaml/merlin#945) + editor modes - emacs: call the user's configured completion UI in `merlin-construct` (ocaml/merlin#1598) + test suite - Add missing dependency to a test using ppxlib (ocaml/merlin#1583) - Add tests for the new PPX phase cache (ocaml/merlin#1584) - Add and update tests for `construct` ordering (ocaml/merlin#1603)
There are three different cache mechanisms for the typer phase, but none for the parsetree-related phases. This commit adds a simple cache for the reader phase and for the PPX phase, the important one being the latter.
Some things that might call your attention are:
File_id
module with aget_res
function. The reason for that is that when in doubt whether the cache should be invalidated or not, it's better to invalidate it IMO: correctness over performance IMO. The case of an error while computingFile_id
is definitely a reason to doubt whether the cache can be used. However, that has made this cache incoherent with the file caches, which go ahead with a generic file id when a file id couldn't be computed. Let me know if you want more coherence here!