-
-
Notifications
You must be signed in to change notification settings - Fork 159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feature] Support Remote Execution API for caching #1520
Comments
I've been working on making moonbase self-hostable, but while doing so, I've had thoughts of just reworking it into a generic remote caching server. I keep going back and forth on which approach would be better. Either way, it's a lot for me to maintain at the moment. |
No rush at all! Very important to have but also not the top priority right now. One suggestion to cut some corners that don't need to be developed: You can leverage bazel-remote right away and avoid building the same abstraction again. Leverage it so you don't need to build something that almost became an industry standard. This will also put a big plus on the monorepo.tools website for moonrepo. So the REAPI is a gRPC Protobuf implementation where the bazel-remote responds with the cache parts in a streaming way, which saves lots of back and forth. One implementation in rust is done by Pants in this file: https://github.com/pantsbuild/pants/blob/main/src/rust/engine/process_execution/src/cache.rs Hope you can find a way! It would be very beneficial to all the community. |
Yeah agreed, I've also thought about piggy backing off of bazel's APIs. Might as well. |
Maybe the remote cache can be implemented on the client side instead of having to use a server? |
By using bazel-remote we do this. But moon must support this as the source for finding the cache hits and hydrating the state. |
I've briefly looked into this, and I will be moving to bazel's APIs, since they also offer action caching which I'll need in the future. Just need to find the time to integrate it. If anyone else wants to tackle it, let me know. |
Can you elaborate? Doesn't bazel-remote require a server? |
Yes. We run a bazel-remote container backed by azure blob storage. We connect to bazel-remote using mTLS connection. We use this today with Pants. If moon could support the same, we don't need to have many different places for managing this cache. |
@rhuanbarreto I still don't understand your point. My suggestion was to have the client make direct API calls to the blob storage (S3 etc.) instead of communicating with a server which has to be deployed and maintained. In addition a server would require another authentication and authorization mechanism for the clients, which you would get out of the box with IAM permissions for a client based solution. So I still don't see the advantage of having a server based solution which adds extra complexity and overhead. |
Good news, a new rust crate recently popped up that does a lot of the heavy lifting for the bazel remote APIs. https://github.com/amkartashov/bazel-remote-apis-rust Will give this a shot for the next release. |
OMG! Great news! If you need an alpha tester, you know where to find me. One small request: Make sure moon can support mTLS connections. htppasswd is too unsafe. |
The issue linked from nx above is what took me here. I co-own a small startup that mostly runs in GCP. Having used nx for a while in our monorepo, I started to really like the plugins that provided caching via whatever, but in my case GCS buckets. The plugin basically uses the GCS API, and stores/fetches directly to/from the configured bucket. This works both from within GCP, and from dev machines given the proper credentials. My big beef with the changes nx are doing, is that they get paid plugins that do the exact same while blocking the open source ones. I don't necessarily mind using nx cloud or similar (moonbase), but I'd rather use infra we already pay for (and/or have a payment relationship with), rather than buying yet another service. (The new paid plugins doesn't support GCS either at this point.) Setting up an additional VM for proxying sounds very unnecessary, unless it can provide some additional functionality. Anyway, I hope this can be come to a useful resolution, as I am now considering the options that are not nx, and moonrepo looks very interesting. |
An update on this: I've got a basic implementation working that communicates with https://github.com/buchgr/bazel-remote. PR here: #1651 Uploading to CAS was relatively easy. However, downloading from CAS is currently blocked. The issue is that I don't know how to reference the cached item in CAS and download the correct blob. The bazel APIs require a digest (hash + size) but we only have rhe hash. We can't calculate the size without archiving the build before running the task, which is far too much overhead. The bazel asset API actually solve this, as you can associate metadata with an uploaded blob via tagging, but I don't think this will land in the next release, until I can figure out how to calculate these digests. |
I've been thinking about this even more, and I'm still quite confused. I took a lot at pants, which uses these bazel APIs, and it looks like the scan the outputs on the file system, read the bytes and size of each file, and collect all digests for these files, then upload them all to remote cache as individual files. In moon, we pack all the outputs into a single tarball archive, store that at But even if we follow the pants/bazel way of doing things, it still doesn't make much sense. For example:
|
Ok, ok, I think I finally figured it all out, thanks to this article: https://bitrise.io/blog/post/bazel-remote-caching-api I need to use the |
Is your feature request related to a problem? Please describe.
Although moonbase has a caching service, for regulatory reasons we cannot store cached artifacts outside our own domain.
Many other monorepo tools like bazel, pants and rush enables the usage of your own storage backend for caching artifacts.
On the other hand, caching the
.moon/cache
folder in github actions doesn't help much either once the size limits of github are too low.Describe the solution you'd like
I would like to have a config so I can self host my own cached artifacts in Azure Blob Storage for example. If this includes running a container separately for the service like https://github.com/buchgr/bazel-remote it's fine.
Describe alternatives you've considered
For now using moonbase is actually hard as it creates a dependency a service outside our domain.
So only alternative is using Github / Azure DevOps pipeline caching.
The text was updated successfully, but these errors were encountered: