Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/tools/gopls: improvements to loading behavior #68002

Open
findleyr opened this issue Jun 14, 2024 · 5 comments
Open

x/tools/gopls: improvements to loading behavior #68002

findleyr opened this issue Jun 14, 2024 · 5 comments
Assignees
Labels
gopls Issues related to the Go language server, gopls. Tools This label describes issues relating to any tools in the x/tools repository.
Milestone

Comments

@findleyr
Copy link
Member

This issue aggregates some work I'd like to do to refine the go/packages.Load calls made by gopls. Generally speaking, these fall into the category of "using gopls' knowledge of the workspace state to refine loading patterns". This should benefit users across the board, but may in particular help for users of go/packages drivers that have different performance characteristics than the go command.

A word on terminology: we refer to the packages data returned by go/packages.Load as package metadata.

Considering only the simple case of a module workspace, metadata loading works approximately as follows (some edge cases are ignored). All loads are inclusive of -deps.

  1. On startup, load <modulePath>/...
  2. When a file change affects metadata, expand that change to all "possibly affected" packages, which includes
  • all packages containing the file
  • if the package clause changed, all packages with files in the same directory as the file
  • all packages that have import cycles (it's too hard to determine whether the change could have resolved cycles)
  1. When metadata is invalidated for a package, also invalidate its importers, recursively (the reverse transitive closure)
  2. When processing a file-oriented request such as completion, if we have no package metadata containing that file, load a file= query (but keep track of files that can't be loaded so we don't keep trying).
  3. When processing a workspace-wide request such as references, load all the packages invalidated by steps (2) and (3) with a query containing all the invalidated package paths.

This loading pattern evolved to ensure we don't miss metadata changes, and can operate on a file as soon as possible. Notably, it was only chosen based on the go command behavior, and generally works because go list is heavily optimized. Even loading a relatively large project such as Kubernetes takes only a few seconds.

However, there are some improvements to be desired:

  • It would be nice to be able to handle low latency "best effort" requests such as completion with stale metadata. Most metadata changes are the addition or deletion of an import, and most type checking can still be performed (this was previously worked on in x/tools/gopls: save old metadata before invalidating #42266, though that approach had some problems). In gopls terminology, I think the right way to do this is to have loads create a new snapshot, and let some requests operate on the earlier snapshot.
  • We don't necessarily need to load with -deps when we're just selectively reloading a file or package. We can theoretically instead load without -deps, see which imports (if any) are missing, and then load those missing imports in a second pass.
  • In many cases we don't need to invalidate the reverse transitive closure recursively (again, with the common case being the addition or deletion of an import). I believe this is only necessary for x_test packages which may pick up an intermediate test variant. If we can convince ourselves of this, then maybe we only need to invalidate x_test packages in the reverse transitive closure...
  • When the queue of packages to load become large, it's usually faster (even with the go command) to reload the entire workspace. We can have a threshold at which point we decide just to reload <modulePath>/....

We should experiment with these improvements. I believe they should be of particular help for users of a bazel go/packages drivers, as bazel queries have significantly higher overhead than the go command.

CC @adonovan @JamyDev

@findleyr findleyr added this to the gopls/v0.17.0 milestone Jun 14, 2024
@gopherbot gopherbot added Tools This label describes issues relating to any tools in the x/tools repository. gopls Issues related to the Go language server, gopls. labels Jun 14, 2024
@li-jin-gou
Copy link

I'm responsible for a system that generates a lot of rpc automated code that exists as a remote package that is slow to load and slow to compile. I'm currently working on a GoPackageDriver to automatically generate the code locally at the user's site, which would help a lot if the speed could be increased. ❤️

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/611843 mentions this issue: gopls: join concurrent package batch operations

@findleyr findleyr self-assigned this Sep 16, 2024
gopherbot pushed a commit to golang/tools that referenced this issue Sep 25, 2024
One of the unsolved problems of the gopls scalability redesign described
at https://go.dev/blog/gopls-scalability is that concurrent type
checking operations may perform redundant work. While we avoided
redundancy within the context of *one* operation, by encapsulating the
type checking batch, it is possible that there will be logically
distinct operations occurring concurrently, such as computing
diagnostics and autocompletion. These operations could not share type
information, as they were operating within distinct type checking
"realms" (recall that go/types relies on the canonicalization of
imports).

This CL addresses that problem by refactoring the type checking batch
such that it can join two distinct operations into a shared state. The
typeCheckBatch becomes a queryable entity, coordinating work across
ongoing package graph traversals.

This required surprisingly little change to the typeCheckBatch itself,
which already abstracted the notion of a package traversal. Rather, the
key missing component was a future cache implementation that was (1)
retryable and (2) transient, in the sense that computed values were
discarded after use. The bulk of this change is the implementation and
testing of such a cache.

Some elements of the refactoring remain awkward:

- packageHandles are collected and merged into a shared map, because
  they must still be computed in large batches for efficiency.
- the shared importGraph remains a clumsy optimization, requiring subtle
  logic to pre-build a shared import graph.

My intuition is that both of these problems have an elegant solution,
but this work is left for a subsequent CL.

In addition to reducing CPU across the board, due to less redundancy in
the existing diagnostic pass, this CL will facilitate the following
additions to gopls:

- Pull-based diagnostics (golang/go#53275): now that diagnostic
  operations are naturally joined, there is less need for gopls to
  manage the scheduling of the diagnostic pass.
- Improvements to loading behavior (golang/go#68002): being able to
  handle requests asynchronously allows completion requests to
  potentially be handled using stale metadata.

Updates golang/go#53275
Updates golang/go#68002

Change-Id: Ib228cdcce2c4b6f616d6ba5b0abeb40e87f449be
Reviewed-on: https://go-review.googlesource.com/c/tools/+/611843
Reviewed-by: Alan Donovan <[email protected]>
LUCI-TryBot-Result: Go LUCI <[email protected]>
@findleyr
Copy link
Member Author

I got this working, and it was quite a bit harder than expected. Furthermore, the change in loading behavior has approximately no impact on performance when used with go list. A few notes:

  • in go/packages, if NeedImports is set, will actually call go list with -deps=true. It's not clear whether this is intentional, but the current behavior is to return the full import graph, with Imports recursively set, even if NeedDeps is not set.
  • gopls currently overwrites package metadata with all newly loaded data (assuming the new data is better than the old data). This was incompatible with the new loading logic, so had to be adjusted.
  • go list has approximately the same performance when invoked with/without -deps=true. I assume this is because it needs to load dependencies anyway to resolve import IDs anyway. (though perhaps this is only needed for test files, to resolve intermediate test dependencies).

Therefore, I'm not sure whether this is worth doing. It's a lot of complexity for little gain. I think the more impactful change is to be able to run completion on stale metadata: I propose we skip straight to that.

@findleyr findleyr modified the milestones: gopls/v0.17.0, gopls/backlog Nov 11, 2024
@JamyDev
Copy link

JamyDev commented Nov 13, 2024

gopls currently overwrites package metadata with all newly loaded data (assuming the new data is better than the old data)
Is there some way of versioning the data that we don't have to replace everything? Is this currently done with some hashing mechanism? Reason I ask is that in packagesdriver mode, I'm not sure how it would differentiate a stale package in the tree from an unchanged one.

go list has approximately the same performance when invoked with/without -deps=true.
This is true in packagesdriver mode as well. There's a good chance that, unless already resolved, a package would require a costly resolve where the Imports is available anyways.

able to run completion on stale metadata
This would be amazing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gopls Issues related to the Go language server, gopls. Tools This label describes issues relating to any tools in the x/tools repository.
Projects
None yet
Development

No branches or pull requests

5 participants