Initial attempt at direct s3 remote cache #4889

GregBowyer · 2018-03-20T23:25:10Z

This is an initial attempt to support #4844

It directly supports S3 as a simple remote cache. After spending some time looking at how AWS credentials are done it looks like its easier to use the SDK, instead of hand rolling the auth directly.

Since the SDK provides the backbone for auth support, it makes sense to include the S3 parts of the sdk as well.

In this PR we use the S3 SDK directly to provide S3 remote cache support. This support should be on par with the google datastore support, and has similar options in its usage.

Thoughts?

TODO

googlebot · 2018-03-20T23:25:14Z

We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for the commit author(s). If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google.
In order to pass this check, please resolve this problem and have the pull request author add another comment and the bot will run again. If the bot doesn't comment, it means it doesn't think anything has changed.

buchgr

Heya!

I think it's awesome that you are working on this! This will make a lot of users happy.

You are more an expert on this than I am, but I wonder whether we could reuse the HttpBlobStore for this?

The main motivation being that we would like to have as few dependencies in Bazel as possible, because these .jars will all end up in the Bazel binary and thus increase its size significantly.

Maybe we can get away with just the aws-core sdk for authentication? What do you think?

buchgr · 2018-03-22T13:29:59Z

src/main/java/com/google/devtools/build/lib/remote/blobstore/s3/S3BlobStore.java

+ *
+ * <p>This blobstore always uses TLS.
+ */
+public class S3BlobStore implements SimpleBlobStore {


I was hoping that we could reuse the HttpBlobStore, by using the AWS S3 HTTP API

https://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectGET.html
https://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectPUT.html

Do you think this would be possible?

GregBowyer · 2018-03-26T19:39:33Z

I think it might be possible to (re)use HttpBlobStore, the only thing here is once you bring in the aws core for authentication you have the major bulk of the jars, with the S3 jar being only "one more jar".

I will make an attempt to try to reuse the HTTP blob store, but it might be worth considering what the fallback looks like.

buchgr · 2018-03-27T12:17:57Z

Thanks @GregBowyer.

Do we know exactly how AWS authentication works? If not, I think we might need to find out and implement it ourselves fun. I think Bazel won't be able to ship 6MiB of libraries just so that it can authenticate to AWS.

cc: @philwo @ulfjack

GregBowyer · 2018-03-27T16:25:43Z

Uggh I can look into that but I suspect thats not fun

buchgr · 2018-03-27T16:29:00Z

@GregBowyer Agree 😄 . Let's first get something that works and then see how big the additional libraries required are.

cbsmith · 2018-03-27T17:24:54Z

I think I'd avoid implementing it myself if possible. There's some code out there that provides a basic implementation: https://github.com/martijnvogten/java-aws-auth-v4/blob/master/src/nl/martijnvogten/aws/AwsAuthVersion4.java

But that doesn't appear to do RBAC, which requires a bunch of code to get session keys from the AWS meta-data service (and regularly renewing them without getting identified as a DoS attack on the meta-data service)... and even the guy who wrote it says, "don't use this if at all possible".

GregBowyer · 2018-03-27T18:55:46Z

Out of question what JVM version does bazel target? Is it 1.7 or 1.8?

buchgr · 2018-03-27T19:20:24Z

@GregBowyer we use 1.8, possibly upgrading to 1.9 soon.

ulfjack · 2018-03-27T19:28:45Z

While the current Bazel binary is significantly larger, I'd be rather unhappy about adding a 6 MB dependency at this point. I have worked on reducing the Bazel binary size, and I think we can get it ~10 MB, so a 6 MB additional dependency would be a 60% increase.

Is there a way to pick just the jar that are necessary for auth and not put in the other stuff that isn't?

GregBowyer · 2018-03-27T20:25:48Z

We can, are we ok maintaining that code?

buchgr · 2018-03-27T20:27:16Z

@GregBowyer it depends on how complex the solution is, but generally yes. I can't believe nobody else has had and solved this problem in open source?

GregBowyer · 2018-03-27T20:57:05Z

So its pretty complex is the pause. Legally if we forked the auth code and adapted is there any issue. I dont think so since the AWS SDK is Apache-2 as per Bazel, but I dont want to put the bazel project in a tight spot here?

Clean room implementing the auth methods would become more substantial.

GregBowyer · 2018-03-28T00:48:55Z

Ok I just pushed a WIP that replaces S3BlobStore and attempts to use AWS header creation directly. I will try to extract out the salient auth methods either using a different jar such as the one @cbsmith suggested or pulling the code from the AWS SDK.

I am going to kick the tires a few times before I write unit tests for the expanded HttpCredentialsAdapter

buchgr · 2018-03-28T10:47:03Z

@GregBowyer awesome! thanks a lot!

philwo · 2018-03-28T11:00:09Z

So its pretty complex is the pause. Legally if we forked the auth code and adapted is there any issue. I dont think so since the AWS SDK is Apache-2 as per Bazel, but I dont want to put the bazel project in a tight spot here?

I assume so (if we put the modified SDK into its own directory underneath third_party and add a README.txt file that indicates where it originally comes from), but I have asked our internal team who actually knows this stuff for advice. Will report back when I heard something.

buchgr · 2018-03-28T11:14:36Z

@philwo I don't see why this wouldn't be ok. We have forked tons of code in our third_party folder right now. All our third_party folder is effectively forks.

GregBowyer · 2018-03-29T00:40:10Z

So the commit 021bdd2 fails for a range of AWS regions. Newer regions don't support the v2 auth method but choose to have a v4 method.

The good news is, Amazon provides test vectors for the newer auth method, and it works everywhere.

I reworked the auth handling to work with v4 auth methods. I will try again to kick the tires a bit before I go through what it takes to pull in how to actually get the credentials.

GregBowyer · 2018-03-29T21:22:41Z

Ok latest commit removes anything that isnt the full core of the AWS auth chain code.

This removes the following deps:

Jackson-databind (replaced in the code with gson)
Aws-SDK
Aws-S3
ionjava
jodatime (replaced in the code with java.time.*

We bring in:

aws-sdk-auth-lite
This is a largely mechanically altered version of the aws-sdk code to strip out anything that isnt related to the management and aquisition of AWS credentials.
aws-sig-v4-test-suite
This is a build time data dependency for unit testing that we can correctly construct AWS auth signatures according to the V4 sig spec.

I have run a basic bazel build test with the cache set to an S3 bucket, and as a result have it caching the artifacts as expected. I will need to test EC2 delegated permissions before we can call it done.

Left TODO

I want to take a bit more work at AwsHttpAuthAdapter as its constructor is messy for testing
Unit tests for changes
EC2 instance credentials test
Tidy up command line options (could use some help here as to what the bazel team want the options to look like)
Potentially port relevant auth tests from aws sdk for the additional code.
Replace the ThreadLocal date formatters with javax.time ones
Stare at, and maybe profile the Aws auth code for inefficiencies
Make sure the remote documentation makes sense for the option

buchgr · 2018-03-29T21:51:58Z

@GregBowyer sounds awesome. I ll be on vacation until 9th April, I ll review it then :-). Big thanks for your efforts!

GregBowyer · 2018-03-29T22:30:45Z

I also need to find out why its a lot less reliable and far slower using HttpBlobStore than when using the SDK

GregBowyer · 2018-03-30T18:53:28Z

So investigating this uncovered a performance regression detailed here. #4944

I am going to assume the workaround applied in that issue, and carry on with this work to flesh out the S3 code some more.

borkaehw · 2019-08-08T17:11:39Z

Hi @GregBowyer, checking in again on where you are at with this PR. It’s been a couple months and we are dusting off this PR as we upgrade to Bazel 0.28. Looks like AuthHeadersProvider still needs to be split out to unblock this PR. We sent you a PR GregBowyer#8 to update this branch to HEAD and are happy to help split out the AuthHeadersProvider. Please let us know if you have time to tackle this in the next couple weeks. If not, we’re happy to push this over the finish line. It would be great to finally get this merged.

We understand that things get busy, so we're willing to push this forward. If you grant us write access to this branch, we're happy to keep the bulk of the changes here and to simply open a separate PR for the AuthHeadersProvider. Otherwise, we can open two new PRs (one with AuthHeadersProvider only and one with the rest of the changes) if you have no objections. We'll go ahead with this approach over the next couple weeks if we don't hear back before then. We might be moving away from http caching to gRPC based caching (remote_http_cache vs remote_cache) in the near future, but we'd still love to see these changes merged.

Removes the directy dependency of the remote and buildeventservice packages on the authentication libraries by using the new AuthHeadersProvider. The authandtls package is split into the "authencation/google" and "grpc" package. This is a step towards merging bazelbuild#6810. This change is a no-op.

borkaehw · 2019-08-15T18:13:43Z

@buchgr We have tried reaching out to @GregBowyer by GitHub and email, but we have not heard back from him. We believe this PR still brings value to the community, what are our options here?

As we suggested in the last comment, we have two draft PRs ready, #9138 and #9137. The former has commits co-authored with Greg, so we get a Google CLA notice stating we need Greg’s consent to move forward. We're not sure how to move forward without Greg though. What can we do if we don't hear back?

buchgr · 2019-08-19T09:00:39Z

@borkaehw

Hi there,

I won't have time to help out with reviews over the next weeks as there are some higher priority items for Bazel 1.0 that I will need to dedicate my time to.

We have been thinking of an alternative way to add AWS and other authentication libraries to Bazel that doesn't require 1000s of lines of code changes. The basic idea would be for Bazel to launch a local proxy process that can add authentication headers. We would provide a proxy implementation that ships with commonly used authentication methods but users would also be free to add their own.
The proxy could be maintained independent of Bazel.

How does this sound?

Best,
Jakob

GregBowyer · 2019-08-19T14:50:53Z

Sorry I have been super busy, I will take a look at your prs

@buchgr isnt that just piling on more complexity for the sake of avoiding the complexity we already have?

buchgr · 2019-08-19T15:17:13Z

@GregBowyer

Authentication libraries living outside of Bazel has major advantages:

Bazel core maintainers don't need any knowledge of authentication library internals. For example, the AWS SDK does not have to be stripped of unused code. Instead Bazel can download the library lazily and on demand.
Bazel internal interfaces won't need to know about authentication. Bazel can send all traffic through a proxy and won't have to worry about how authentication is performed.
Authentication libraries can be updated independently of Bazel releases and will no longer require vendoring of dependencies.
The proxy can be community maintained and allow for faster turn around times.

I understand that there are trade offs. I am happy to hear and discuss pros and cons.

jjudd · 2019-08-19T15:25:39Z

@buchgr Would the GCP authentication mechanisms also move to this proxy? If so, great. If not, then I worry the proxy would rot compared to Bazel.

GregBowyer · 2019-08-19T15:29:22Z

@buchgr So I did once try a sidecar proxy process, and it was a horrible experience, bazel or the proxy crashes and the build is broken.

Is there a middle ground, maybe a nascent plugin architecture where the same can be achieved by downloading the jar and using that through well speced interfaces.

buchgr · 2019-08-19T15:29:23Z

@jjudd yes of course.

GregBowyer · 2019-08-19T15:33:19Z

.... or is this proxy process managed by the bazel server and communicated through GRPC or something akin to the persistent worker API, if that is the case I would be up for reworking aws auth to work along those lines.

jjudd · 2019-08-19T15:36:57Z

Awesome. Glad to hear it!

If we can figure out a way to make this work seamlessly for the end user, then using the actual 3rd party dependencies sounds far easier to maintain. Upgrading custom 3rd party lib stuff with a custom written version of AWS code is going to be really ugly in the future.

buchgr · 2019-08-19T15:57:02Z

.... or is this proxy process managed by the bazel server and communicated through GRPC or something akin to the persistent worker API, if that is the case I would be up for reworking aws auth to work along those lines.

Yes. The idea was for the Bazel server to own the proxy process. Essentially, I was thinking for the proxy to implement the http and grpc API as well as a health check call. I was hoping to reuse https://github.com/buchgr/bazel-remote or something like it. Thinking further, this proxy could also replace the disk cache.

A design document would be a good start. Many details will need to be figured out :).

softprops · 2019-09-26T11:33:14Z

As a user I find it very confusing to see built in support for gcp buckets and such a large ongoing debate about being hesitant to include support for an even more widely used and prolific object storage service like s3.

I've been watching and waiting on this feature for over a year and at least one other now abandoned pr due to feedback cycles and movement.

Re #4889 (comment)

I'm not sure how this is a different story than bazels gcp authentication implementation. Why is the s3 (carrying broader set of potential bazel customers) a special case?

The proxy idea sounds interesting but could it be made a separate effort. I believe the puller has followed all of the feedback from reviewers at this point.

What are the chances of merging this when a separate project for extracting authentication as a managed proxy process?

borkaehw · 2019-10-20T02:57:53Z

@mavenraven yes, we forked it and maintained it by ourselves. We have a branch that supported this PR until 0.28, but we switched to grpc caching starting from 0.29. We moved away from it to avoid constant rebasing.

mavenraven · 2019-10-20T14:58:15Z

@borkaehw Ah yeah, that's what I figured. Thanks for the tip on grpc caching, I'll look into that!

buchgr · 2019-10-22T12:23:52Z

As a user I find it very confusing to see built in support for gcp buckets and such a large ongoing debate about being hesitant to include support for an even more widely used and prolific object storage service like s3.

I can see where you are coming from but what you are trying to imply is simply not accurate. I still think it would be awesome if Bazel supported authentication to AWS and many other providers. However, it seems to be the case that for AWS authentication we either have to ship 6 MiB of libraries or maintain our own fork of it. That's a big hurdle. The gcloud library is ~100KiB. The whole Bazel binary is around ~40 MiB. I originally added GCS caching because it was so simple to do.

Additionally, over the course of this year with the advent of "builds without the bytes" (BWTB) I believe that both GCS and AWS S3 caching will be less relevant as BWTB does not work with either.

Lastly, I am not trying to make excuses for the mistakes that I have made with reviewing this PR, but what you are suggesting is simply not true.

cgruber · 2019-10-24T04:44:31Z

For what it's worth, our (Square's android dev-ex team) experiments showed S3 to be a worse option than an AWS server running the open-source implementations of the caching protocol, whether using the forked support or the Asana bazels3cache proxy (which even handles slow-writes). I think S3 isn't optimized the way most people would need for a really efficient build cache. Other Amazon services can be. Heck, we didn't even go with the remote cache for our CI, as it turned out an NFS drive in AWS highly available to the build machines, using the --disk-cache, was faster than all of them (though we miss out on managed clean-up, which we do out of process). We're currently playing with a few options from a few vendors for developer-accessible build caches.

The point is: while I agree having support for major "storage buckets, out of the box" is a super sweet nice to have feature, the protocol itself is quite well supported, and "having it for completeness" don't outweigh other considerations. There's nothing stopping anyone from using Amazon or Microsoft cloud services instead of Google cloud, and even the baked-in support for gcs doesn't make it some sort of monopolistic play. There are good and bad reasons to prefer each vendor and among their various options.

GregBowyer · 2019-10-31T19:14:06Z

@mavenraven No that's not fair and not quite how it shakes out. @buchgr is pretty spot on here, its a pig to maintain either way and really the subtitles of S3 vs GCS are such that even simple support for both is pretty invasive in the code.

I have found personally running the cache server instances to be a pain that the s3 version solves, however that pain is really related to the management of that server pool (more stuff to manage, more cost, slower etc)

I think bonus points really if we can get to a place where its a WORKSPACE rule and something akin to bazel --remote_cache_provider=@my_crazy_cache ...

As @buchgr alluded to, having a version that manages a "side car" that provides caching is IMHO a clearer and better solution, it lets people get crazy on what they want caching to be and saves us from a world of managing ever larger build farms.

@cgruber that surprises me, although I suspect its a language thing, I get easily a 80-90% speedup on s3 and a clear save over NFS or other solutions. With that said I am building twisty C++ where I have translation units that can easily take multiple minutes to compile, and my compile farm is inside AWS (with the s3 and builders configured to be in the same region)

It might be a case of YMMV?

aiuto · 2019-12-04T22:18:04Z

Removed @aiuto as reviewer. I don't have enough knowledge of this code to comment.

GregBowyer · 2019-12-12T01:36:06Z

This is dead in the water now, to be redone another day

spuliaiev-sfdc · 2023-07-19T14:29:53Z

Are there any other attempts to build this for the modern Bazel?

tjgq · 2023-07-19T15:43:43Z

My hope is that it's now possible to implement this by using --remote_http_cache (as currently implemented in Bazel) in conjunction with an external --credential_helper that supplies AWS credentials, but someone would have to try it out and report back.

googlebot added the cla: no label Mar 20, 2018

GregBowyer mentioned this pull request Mar 20, 2018

remote/feature: support AWS S3 as a remote caching backend #4844

Closed

GregBowyer force-pushed the s3-cache branch 2 times, most recently from 5480bb0 to 3d18a08 Compare March 21, 2018 02:26

buchgr self-assigned this Mar 21, 2018

buchgr self-requested a review March 21, 2018 09:35

buchgr suggested changes Mar 22, 2018

View reviewed changes

GregBowyer force-pushed the s3-cache branch from 9102caa to 3bcb4d7 Compare March 29, 2018 21:16

GregBowyer mentioned this pull request Mar 30, 2018

HttpBlobStore get performance regression #4944

Closed

buchgr and others added 2 commits August 9, 2019 14:52

Implementation of aws-sig-v4 HTTP auth for S3 caching

103044f

This was referenced Aug 9, 2019

Initial attempt at direct s3 remote cache (without AuthHeadersProvider) #9138

Closed

Initial attempt at direct s3 remote cache (with AuthHeadersProvider) #9137

Closed

GregBowyer force-pushed the s3-cache branch from 49b3527 to 103044f Compare August 19, 2019 15:27

GregBowyer requested a review from aiuto as a code owner August 19, 2019 15:27

philwo removed their request for review August 27, 2019 13:59

aiuto removed their request for review December 4, 2019 22:17

GregBowyer closed this Dec 12, 2019

Initial attempt at direct s3 remote cache #4889

Initial attempt at direct s3 remote cache #4889

Conversation

GregBowyer commented Mar 20, 2018 • edited by fweikert Loading

TODO

googlebot commented Mar 20, 2018

buchgr left a comment

Choose a reason for hiding this comment

buchgr Mar 22, 2018

Choose a reason for hiding this comment

GregBowyer commented Mar 26, 2018

buchgr commented Mar 27, 2018

GregBowyer commented Mar 27, 2018

buchgr commented Mar 27, 2018

cbsmith commented Mar 27, 2018

GregBowyer commented Mar 27, 2018

buchgr commented Mar 27, 2018

ulfjack commented Mar 27, 2018

GregBowyer commented Mar 27, 2018

buchgr commented Mar 27, 2018 • edited Loading

GregBowyer commented Mar 27, 2018

GregBowyer commented Mar 28, 2018

buchgr commented Mar 28, 2018

philwo commented Mar 28, 2018

buchgr commented Mar 28, 2018 • edited Loading

GregBowyer commented Mar 29, 2018

GregBowyer commented Mar 29, 2018 • edited Loading

Left TODO

buchgr commented Mar 29, 2018

GregBowyer commented Mar 29, 2018

GregBowyer commented Mar 30, 2018

borkaehw commented Aug 8, 2019

borkaehw commented Aug 15, 2019

buchgr commented Aug 19, 2019

GregBowyer commented Aug 19, 2019

buchgr commented Aug 19, 2019

jjudd commented Aug 19, 2019

GregBowyer commented Aug 19, 2019

buchgr commented Aug 19, 2019

GregBowyer commented Aug 19, 2019

jjudd commented Aug 19, 2019

buchgr commented Aug 19, 2019

softprops commented Sep 26, 2019 • edited Loading

borkaehw commented Oct 20, 2019

mavenraven commented Oct 20, 2019

buchgr commented Oct 22, 2019

cgruber commented Oct 24, 2019 • edited Loading

GregBowyer commented Oct 31, 2019

aiuto commented Dec 4, 2019

GregBowyer commented Dec 12, 2019

spuliaiev-sfdc commented Jul 19, 2023

tjgq commented Jul 19, 2023

GregBowyer commented Mar 20, 2018 •

edited by fweikert

Loading

buchgr commented Mar 27, 2018 •

edited

Loading

buchgr commented Mar 28, 2018 •

edited

Loading

GregBowyer commented Mar 29, 2018 •

edited

Loading

softprops commented Sep 26, 2019 •

edited

Loading

cgruber commented Oct 24, 2019 •

edited

Loading