pantsbuild · huonw · Sep 27, 2023 · Sep 23, 2023 · Sep 23, 2023 · Sep 23, 2023
diff --git a/docs/markdown/Using Pants/remote-caching-execution.md b/docs/markdown/Using Pants/remote-caching-execution.md
@@ -13,7 +13,7 @@ By default, Pants executes processes in a local [environment](doc:environments)
 
 2. "Remote execution" where Pants offloads execution of processes to a remote server (and consumes cached results from that remote server)
 
-Pants does this by using the "Remote Execution API" to converse with the remote cache or remote execution server.
+Pants does this by using the "Remote Execution API" to converse with the remote cache or remote execution server. Pants also [supports some additional providers](doc:remote-caching) other than Remote Execution API that provide only remote caching, without execution.
 
 What is Remote Execution API?
 -----------------------------

diff --git a/docs/markdown/Using Pants/remote-caching-execution/remote-caching.md b/docs/markdown/Using Pants/remote-caching-execution/remote-caching.md
@@ -7,20 +7,26 @@ createdAt: "2021-03-19T21:40:24.451Z"
 What is remote caching?
 =======================
 
-Remote caching allows Pants to store and retrieve the results of process execution to and from a remote server that complies with the [Remote Execution API](https://github.com/bazelbuild/remote-apis) standard ("REAPI"), rather than only using your machine's local Pants cache. This allows Pants to share a cache across different runs and different machines, for example, all of your CI workers sharing the same fine-grained cache.
+Remote caching allows Pants to store and retrieve the results of process execution to and from a remote server, rather than only using your machine's local Pants cache. This allows Pants to efficiently share a cache across different runs and different machines, for example, all of your CI workers sharing the same fine-grained cache.
 
-Setup
-=====
+Pants supports several remote caching providers:
+
+- [Remote Execution API](https://github.com/bazelbuild/remote-apis) ("REAPI"), which also supports [remote execution](doc:remote-execution)
+- GitHub Actions Cache
+- Local file system
+
+Remote Execution API
+====================
 
 Server
 ------
 
-Remote caching requires the availability of a REAPI-compatible cache. See the [REAPI server compatibility guide](doc:remote-caching-execution#server-compatibility) for more information.
+See the [REAPI server compatibility guide](doc:remote-caching-execution#server-compatibility) for more information about REAPI-compatible caches.
 
 Pants Configuration
 -------------------
 
-After you have either set up a REAPI cache server or obtained access to one, the next step is to point Pants to it so that Pants will use it to read and write process results. 
+After you have either set up a REAPI cache server or obtained access to one, the next step is to point Pants to it so that Pants will use it to read and write process results.
 
 For the following examples, assume that the REAPI server is running on `cache.corp.example.com` at port 8980 and that it is on an internal network. Also assume that the name of the REAPI instance is "main." At a minimum, you will need to configure `pants.toml` as follows:
 
@@ -34,6 +40,64 @@ remote_instance_name = "main"
 
 If the endpoint is using TLS, then the `remote_store_address` option would be specified with the  `grpcs://` scheme, i.e. `"grpcs://cache.corp.example.com:8980"`.
 
+GitHub Actions Cache
+====================
+
+GitHub Actions provides built-in caching service which Pants supports using for sharing caches across GitHub Actions runs (not with machines outside of GitHub Actions). It is typically used via the `actions/cache` action to cache whole directories and files, but Pants can use the same functionality for fine-grained caching.
+
+> 🚧 GitHub Actions Cache support is still experimental
+>
+> Support for this cache provider is still under development, with more refinement required. Please [let us know](doc:getting-help) if you use it and encounter errors or warnings.
+
+Workflow
+--------
+
+The values of the `ACTIONS_CACHE_URL` and `ACTIONS_RUNTIME_TOKEN` environment variables need to be provided to Pants via the `[GLOBAL].remote_store_address` and `[GLOBAL].remote_store_headers` options respectively. They are only provided to action calls (not shell steps that use `run: ...`). Include a step like the following in your jobs, which sets those options via environment variables, before executing any Pants commands:
+
+```yaml
+- name: Configure Pants caching to GitHub Actions Cache
+  uses: actions/github-script@v6
+  with:
+    script: |
+      core.exportVariable('PANTS_REMOTE_STORE_ADDRESS', 'experimental:github-actions-cache+' + (process.env.ACTIONS_CACHE_URL || ''));
+      core.exportVariable('PANTS_REMOTE_STORE_HEADERS', `+{'authorization':'Bearer ${process.env.ACTIONS_RUNTIME_TOKEN || ''}'}`);
+```
+
+Pants Configuration
+-------------------
+
+Once the GitHub values are configured, Pants will read the environment variables. You will also need to configure pants to read and write to the cache only while in CI, such as [via a `pants.ci.toml` configuration file](doc:using-pants-in-ci#configuring-pants-for-ci-pantscitoml-optional):
+
+```toml
+[GLOBAL]
+# GitHub Actions cache URL and token are set via environment variables
+remote_cache_read = true
+remote_cache_write = true
+```
+
+If desired, you can also set `remote_instance_name` to a string that's included as a prefix on each cache key, which will be then be displayed in the 'Actions' > 'Caches' UI.
+
+Local file system
+=================
+
+Pants can cache "remotely" to a local file system path, which can be used for a networked mount cache, without having to pay the cost of storing Pants' local cache on the network mount too. This can also be used for testing/validation.
+
+> 🚧 Local file system caching support is still experimental
+>
+> Support for this cache provider is still under development, with more refinement required. Please [let us know](doc:getting-help) if you use it and encounter errors or warnings.
+
+Pants Configuration
+-------------------
+
+To read and write the cache to `/path/to/cache`, you will need to configure `pants.toml` as follows:
+
+```toml
+[GLOBAL]
+remote_store_address = "experimental:file:///path/to/cache"
+remote_cache_read = true
+remote_cache_write = true
+```
+
 Reference
 =========
 

diff --git a/src/python/pants/option/global_options.py b/src/python/pants/option/global_options.py
@@ -280,6 +280,21 @@ def renderer(_: object) -> str:
             """
         ),
     ),
+    _RemoteAddressScheme(
+        schemes=("github-actions-cache+http", "github-actions-cache+https"),
+        supports_execution=False,
+        experimental=True,
+        description=softwrap(
+            f"""
+            Use the GitHub Actions Cache for fine-grained caching. This requires extracting
+            `ACTIONS_CACHE_URL` (passing it in `[GLOBAL].remote_store_address`) and
+            `ACTIONS_RUNTIME_TOKEN` (storing it in a file and passing
+            `[GLOBAL].remote_oauth_bearer_token_path` or setting `[GLOBAL].remote_store_headers` to
+            include `authorization: Bearer {{token...}}`). See
+            {doc_url('remote-caching#github-actions-cache')} for more details.
+            """
+        ),
+    ),
 )
 
 

diff --git a/src/rust/engine/Cargo.lock b/src/rust/engine/Cargo.lock
diff --git a/src/rust/engine/Cargo.toml b/src/rust/engine/Cargo.toml
@@ -254,7 +254,14 @@ notify = { git = "https://github.com/pantsbuild/notify", rev = "276af0f3c5f300bf
 num_cpus = "1"
 num_enum = "0.5"
 once_cell = "1.18"
-opendal = { version = "0.39.0", default-features = false }
+# TODO: this is waiting for several changes to be released (likely in 0.41):
+# https://github.com/apache/incubator-opendal/pull/3163
+# https://github.com/apache/incubator-opendal/pull/3177
+opendal = { git = "https://github.com/apache/incubator-opendal", rev = "97bcef60eb0b515bd2442ab5b671080766fa35eb", default-features = false, features = [
+  "services-memory",
+  "services-fs",
+  "services-ghac",
+] }
 os_pipe = "1.1"
 parking_lot = "0.12"
 peg = "0.8"

diff --git a/src/rust/engine/fs/store/Cargo.toml b/src/rust/engine/fs/store/Cargo.toml
@@ -41,10 +41,7 @@ tower-service = { workspace = true }
 tryfuture = { path = "../../tryfuture" }
 uuid = { workspace = true, features = ["v4"] }
 workunit_store = { path = "../../workunit_store" }
-opendal = { workspace = true, default-features = false, features = [
-  "services-memory",
-  "services-fs",
-] }
+opendal = { workspace = true }
 
 [dev-dependencies]
 criterion = { workspace = true }

diff --git a/src/rust/engine/fs/store/src/remote.rs b/src/rust/engine/fs/store/src/remote.rs
@@ -81,6 +81,14 @@ async fn choose_provider(options: RemoteOptions) -> Result<Arc<dyn ByteStoreProv
       "byte-store".to_owned(),
       options,
     )?))
+  } else if let Some(url) = address.strip_prefix("github-actions-cache+") {
+    // TODO: this is relying on python validating that it was set as
+    // `github-actions-cache+https://...`
+    Ok(Arc::new(base_opendal::Provider::github_actions_cache(
+      url,
+      "byte-store".to_owned(),
+      options,
+    )?))
   } else {
     Err(format!(
       "Cannot initialise remote byte store provider with address {address}, as the scheme is not supported",

diff --git a/src/rust/engine/fs/store/src/remote/base_opendal.rs b/src/rust/engine/fs/store/src/remote/base_opendal.rs
@@ -9,13 +9,16 @@ use async_trait::async_trait;
 use bytes::Bytes;
 use futures::future;
 use hashing::{async_verified_copy, Digest, Fingerprint, EMPTY_DIGEST};
+use http::header::AUTHORIZATION;
 use opendal::layers::{ConcurrentLimitLayer, RetryLayer, TimeoutLayer};
 use opendal::{Builder, Operator};
 use tokio::fs::File;
 use workunit_store::ObservationMetric;
 
 use super::{ByteStoreProvider, LoadDestination, RemoteOptions};
 
+const GITHUB_ACTIONS_CACHE_VERSION: &str = "pants-1";
+
 #[derive(Debug, Clone, Copy)]
 pub enum LoadMode {
   Validate,
@@ -71,6 +74,44 @@ impl Provider {
     Provider::new(builder, scope, options)
   }
 
+  pub fn github_actions_cache(
+    url: &str,
+    scope: String,
+    options: RemoteOptions,
+  ) -> Result<Provider, String> {
+    let mut builder = opendal::services::Ghac::default();
+
+    builder.version(GITHUB_ACTIONS_CACHE_VERSION);
+    builder.endpoint(url);
+
+    // extract the token from the `authorization: Bearer ...` header because OpenDAL's Ghac service
+    // reasons about it separately (although does just stick it in its own `authorization: Bearer
+    // ...` header internally).
+    let header_help_blurb = "Using GitHub Actions Cache remote cache requires a token set in a `authorization: Bearer ...` header, set via [GLOBAL].remote_store_headers or [GLOBAL].remote_oauth_bearer_token_path";
+    let Some(auth_header_value) = options.headers.get(AUTHORIZATION.as_str()) else {
+      let existing_headers = options.headers.keys().collect::<Vec<_>>();
+      return Err(format!(
+        "Expected to find '{}' header, but only found: {:?}. {}",
+        AUTHORIZATION, existing_headers, header_help_blurb,
+      ));
+    };
+
+    let Some(token) = auth_header_value.strip_prefix("Bearer ") else {
+      return Err(format!(
+        "Expected '{}' header to start with `Bearer `, found value starting with {:?}. {}",
+        AUTHORIZATION,
+        // only show the first few characters to not accidentally leak (all of) a secret, but
+        // still give the user something to start debugging
+        &auth_header_value[..4],
+        header_help_blurb,
+      ));
+    };
+
+    builder.runtime_token(token);
+
+    Provider::new(builder, scope, options)
+  }
+
   fn path(&self, fingerprint: Fingerprint) -> String {
     // We include the first two bytes as parent directories to make listings less wide.
     format!(
@@ -158,11 +199,15 @@ impl ByteStoreProvider for Provider {
 
     let path = self.path(digest.hash);
 
-    self
-      .operator
-      .write(&path, bytes)
-      .await
-      .map_err(|e| format!("failed to write bytes to {path}: {e}"))
+    match self.operator.write(&path, bytes).await {
+      Ok(()) => Ok(()),
+      // The item already exists, i.e. these bytes have already been stored. For example,
+      // concurrent executions that are caching the same bytes. This makes the assumption that
+      // which ever execution won the race to create the item successfully finishes the write, and
+      // so no wait + retry (or similar) here.
+      Err(e) if e.kind() == opendal::ErrorKind::AlreadyExists => Ok(()),
+      Err(e) => Err(format!("failed to write bytes to {path}: {e}")),
+    }
   }
 
   async fn store_file(&self, digest: Digest, mut file: File) -> Result<(), String> {
@@ -174,12 +219,15 @@ impl ByteStoreProvider for Provider {
 
     let path = self.path(digest.hash);
 
-    let mut writer = self
-      .operator
-      .writer_with(&path)
-      .content_length(digest.size_bytes as u64)
-      .await
-      .map_err(|e| format!("failed to start write to {path}: {e}"))?;
+    let mut writer = match self.operator.writer(&path).await {
+      Ok(writer) => writer,
+      // The item already exists, i.e. these bytes have already been stored. For example,
+      // concurrent executions that are caching the same bytes. This makes the assumption that
+      // which ever execution won the race to create the item successfully finishes the write, and
+      // so no wait + retry (or similar) here.
+      Err(e) if e.kind() == opendal::ErrorKind::AlreadyExists => return Ok(()),
+      Err(e) => return Err(format!("failed to start write to {path}: {e} {}", e.kind())),
+    };
 
     // TODO: it would be good to pass through options.chunk_size_bytes here
     match tokio::io::copy(&mut file, &mut writer).await {

diff --git a/src/rust/engine/process_execution/remote/src/remote_cache.rs b/src/rust/engine/process_execution/remote/src/remote_cache.rs
@@ -105,6 +105,14 @@ async fn choose_provider(
       "action-cache".to_owned(),
       remote_options,
     )?))
+  } else if let Some(url) = address.strip_prefix("github-actions-cache+") {
+    // TODO: this is relying on python validating that it was set as
+    // `github-actions-cache+https://...`
+    Ok(Arc::new(base_opendal::Provider::github_actions_cache(
+      url,
+      "action-cache".to_owned(),
+      remote_options,
+    )?))
   } else {
     Err(format!(
       "Cannot initialise remote action cache provider with address {address}, as the scheme is not supported",