Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring of remote stores after initial addition of OpenDAL #19902

Open
2 of 8 tasks
huonw opened this issue Sep 21, 2023 · 0 comments
Open
2 of 8 tasks

Refactoring of remote stores after initial addition of OpenDAL #19902

huonw opened this issue Sep 21, 2023 · 0 comments
Assignees

Comments

@huonw
Copy link
Contributor

huonw commented Sep 21, 2023

With #19827, there's a bunch of refactoring of the remote stores:

@huonw huonw self-assigned this Sep 21, 2023
huonw added a commit that referenced this issue Sep 21, 2023
…9827)

This is the final major step that unlocks #11149: 

- adding new byte store and action cache providers, implemented using
https://opendal.apache.org
- plumbing through to be able to be use them via `remote_store_address =
"..."`
- (and implementing an "example" one, that's backed by the local file
system)

This doesn't yet resolve that issue, as it doesn't implement any sort of
useful "simple" truly-remote cache, but it sets up most of the
infrastructure.

### User facing change

This only adds support for one additional store for now:
`experimental:file:///path/to/dir`, which writes the store to
`/path/to/dir`.

This is store is enough to validate that this functionality (mostly)
works, and even seems like it would be useful for:
- debugging/testing remote cache behaviour (e.g. can set up reduced
reproducers for remote caching bugs using normal file manipulation
commands)
- potentially for real, for using an NFS mount as a remote cache,
without forcing _all_ of Pants' caches to have the NFS overhead.

The user facing functionality here is configured entirely through
`remote_store_address`: that option now supports URIs that have schemes
other than `grpc` or `grpcs`. To start with, just `file`. It also
explicitly supports these schemes being experimental, communicated
through the `experimental:` scheme-prefix. This is to, hopefully, allow
landing the feature earlier and with less risk.

For example, running `pants test ::` in the follow example writes
various blobs to subdirectories of `/tmp/remote-cache-example` (e.g.
`action-cache/e3/2f/e32f034f...` and `byte-store/50/ce/50ce4a68...`),
and a second `pants --no-pantsd --no-local-cache test ::` command is
able to service it from cache, reporting ` //:slow-test succeeded in
5.01s (cached remotely)`

```toml
# pants.toml
[GLOBAL]
pants_version = "2.17.0"

backend_packages = ["pants.backend.shell"]

remote_store_address = "experimental:file:///tmp/remote-cache-example"
remote_cache_read = true
remote_cache_write = true
```

```python
# BUILD
experimental_test_shell_command(name="slow-test", command="sleep 5", tools=["sleep"])
```

### Details of this change

There's three main bits of work:

1. add the new generic OpenDAL byte store and action cache provider (in
the two pairs of `base_opendal.rs` + tests files)
2. hook up the specific FS one (in `store/src/remote.rs` and
`remote/src/remote_cache.rs`)
3. adjust the options parsing to support them, especially the
`experimental:` scheme handling (in `global_options.py`)

None of these are huge in isolation, but they add up to a fair chunk of
code. I think each of those files can be reviewed somewhat in isolation,
in that order.

### Why OpenDAL?

It's used by sccache for similar remote-caching purposes
(https://github.com/search?q=repo%3Amozilla%2Fsccache%20opendal&type=code),
and supports a whole lot of services:

- blob stores (e.g. S3, GCS)
- key-value caches (e.g. Redis)
- CI caches (e.g. GitHub Actions)
- weirder ones (IPFS, Google Drive)

Pants doesn't/shouldn't support all of them, but definitely blob stores
and CI caches seem exciting!

### Next steps

This only adds support for caching "remotely" to the local file system
but I'll add support for GitHub Actions as follow up work (#19831), and
maybe S3 too. Hopefully as part of doing that I'll work through any
authentication (etc.) issues and so it becomes easier for others to add
the backends they need too.

I've suggested we use the URI scheme for deciding on the service, rather
than an explicit `remote_store_service = "reapi"` option, or similar. In
future, some services may use the same scheme, e.g. I imagine several
services might conventionally use `http://` or `https://`. My current
thinking is to solve this similar to pip's [VCS
requirements](https://pip.pypa.io/en/stable/topics/vcs-support/)
(`vcs+transport://...`) or `sqlalchemy` [database
URLs](https://docs.sqlalchemy.org/en/20/core/engines.html#database-urls)
(`dialect+driver://...`), e.g. `remote_store_address = "s3+http://..."`
or `remote_store_address = "http+s3://..."`, and then the provider would
strip off the `s3` part as appropriate.

There's also a whole lot of TODOs that I don't think are critical to
landing this code, and I can chip away at once it's in: by landing this
sooner, I'm hoping we can start having the user-facing parts of #11149
being tested, validated and enjoyed sooner rather than later. I've
already done a loooot of *pre*-factoring (see all the PRs referencing
#11149), so it'd be nice to switch to having the feature in and just do
some normal *re*-factoring. 😅

TODOs mostly tracked in #19902
huonw added a commit that referenced this issue Sep 27, 2023
This adds support for remote caching to the Github Actions Cache, by
setting `remote_store_address =
"experimental:github-actions-cache+https://..."`. This is implemented as
just another OpenDAL (#19827) provider.

As written, this requires the `ACTIONS_CACHE_URL` and
`ACTIONS_RUNTIME_TOKEN` values to be massaged into the appropriate pants
configuration. I've included basic docs for how to do this. This set-up
can/should be packaged into the `init-pants` action in future, but for
now, we're in experimental-land.

There's various ways in which this can be improved, but I think it's
helpful to have a "real world" example of a remote cache beyond the
`file://` one to be able to guide the refactoring, plus good to start
testing this earlier. Thus, I'd like to land this, and do them as
follow-up:

- better docs/help (always!), particularly on the `using-pants-in-ci`
docs page, which I haven't updated here
- now that I've implemented this, I don't like my
`<provider>+<scheme>://...` proposal from #19827. I think it'd be better
to have a separate `remote_store_provider =
"experimental-github-actions-cache"` (or `"reapi"`, etc.) option and
`remote_store_address` is just the raw URL (with the provider validating
it can understand the URL, of course)
- it'd be good to be able to pass the token in in a simpler way, I'm
thinking there could be a `remote_oauth_bearer_token` option literally,
which is expected (and suggested in docs) to be passed as the
`PANTS_REMOTE_OAUTH_BEARER_TOKEN` env var, in addition to
`remote_oauth_bearer_token_path`

These are tracked in #19902 (and all of the others in that issue would
be good too, but weren't as acutely surfaced by this change, I'll start
chipping away at it once this is in).

This is awkward to test. My plan, not implemented here, is to somehow
tee up a dedicated set of tests to run within
https://github.com/nektos/act somehow. For now, we'll have to go with
manual testing, e.g. this specific PR is validated by
https://github.com/huonw/pants-cache-github-actions-poc/actions/runs/6298908094/job/17119702612#step:11:983

This is the first example of something useful user-facing for #11149.
This doesn't close it, I'll wait until we have a blob store like S3 too.

The commits are individually reviewable.

(Replaces #17840.)
huonw added a commit that referenced this issue Oct 5, 2023
This does a no-functionality-change refactoring of the remote _store_
providers (backed by REAPI and OpenDAL). This splits the concept of a
remote provider out into their own internal "surface" crate
`remote_provider`, plus three crates that are (mostly)
implementation-details of that one:

- `remote_provider/remote_provider_traits` for the traits (and structs!)
that the various providers have to implement
- `remote_provider/remote_provider_reapi` for the gRPC/REAPI-backed
provider
- `remote_provider/remote_provider_opendal` for the OpenDAL-backed
provider

They're arranged like this:

```mermaid
graph BT
   remote("process_execution/remote (existing)")
   store("fs/store (existing)")
   traits("remote_provider_traits (new)")
   grpc("remote_provider_reapi (new)")
   opendal("remote_provider_opendal (new)")
   selector("remote_provider (new)")
   grpc --> traits
   opendal --> traits
   selector --> grpc
   selector --> opendal
   remote -- for remote execution --> grpc
   remote --> selector
   store --> selector
```

Theoretically the new crates other than `remote_provider` are an
implementation detail... except there's a helper in
`remote_provider_reapi` that's used for the remote _execution_ in
`process_execution/remote`, in addition to the byte store and action
cache, hence the dependency there.


This is one point of #19902,
following up on
#19827 (comment).

The commits are individually reviewable, although the overall PR view
gives a slightly more useful view of the overall file renames for some
files.
huonw added a commit that referenced this issue Oct 31, 2023
This replaces uses of `RemoteCacheProviderOptions` with
`RemoteStoreOptions` (previously `RemoteOptions`): they're very similar,
and only separate because they were previously in very different parts
of the codebase, but this has changed with #19958.

This makes a few additional changes to make the merging smoother:
- removes the "capability cell" sharing optimisation, which enabled
sharing of the `GetCapabilitiesRequest` gRPC request result between the
CAS and the remote execution providers, if both were being used with the
same address. I thought that optimising one-time (on start-up) requests
from 2 to 1 wasn't worth the complexity of plumbing it through these
more generic options, but can restore it if it is.
- renames various fields

This does meant that there's a few options that are seemingly irrelevant
to the action cache providers, e.g. "batch API size limit" is
meaningless. I think this is fine: I think it's fundamental to
supporting multiple providers that there'll be a grab-bag of options,
which some providers support and others do not.

The commits are individually reviewable.

(This is one point of #19902.)
huonw added a commit that referenced this issue Nov 15, 2023
This takes the unusual step of _removing_ documentation. In particular,
this removes the documentation for the still _very_ experimental new
remote cache backends: GitHub Actions and file system, because they're
useless and very unstable:

- there's significant user-facing refactoring work that I'm not going to
get to for 2.19 (#19902),
particularly around changing how the options are configured
- the GHA backend seems to be very... limited, and only works for small
repos (#20133)

I'm targeting 2.19, as it stabilises, and leaving the docs in `main`
(for 2.20).
huonw added a commit that referenced this issue Nov 21, 2023
…rer_token_path` (#20116)

This adds a new global option `remote_oauth_bearer_token`, that allows
setting the remote store OAuth token directly. This is easier to work
with in CI contexts (and similar), where setting an environment variable
is often quite simple, and having a direct option allows setting this
via `PANTS_REMOTE_OAUTH_BEARER_TOKEN`.

For instance, it simplifies how one can configure the token for the
GitHub Actions Cache remote cache provider.

I believe this replaces `remote_oauth_bearer_token_path` fully: I think
`remote_oauth_bearer_token_path = "/path/to/token.txt"` can become
`remote_oauth_bearer_token = "@/path/to/token.txt"`. Thus, I have
deprecated the `..._path` option, for removal in a couple of versions.

(This is part of #19902.)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant