-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial attempt at direct s3 remote cache #4889
Conversation
We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for the commit author(s). If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google. |
5480bb0
to
3d18a08
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heya!
I think it's awesome that you are working on this! This will make a lot of users happy.
You are more an expert on this than I am, but I wonder whether we could reuse the HttpBlobStore for this?
The main motivation being that we would like to have as few dependencies in Bazel as possible, because these .jars will all end up in the Bazel binary and thus increase its size significantly.
Maybe we can get away with just the aws-core sdk for authentication? What do you think?
* | ||
* <p>This blobstore always uses TLS. | ||
*/ | ||
public class S3BlobStore implements SimpleBlobStore { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was hoping that we could reuse the HttpBlobStore, by using the AWS S3 HTTP API
https://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectGET.html
https://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectPUT.html
Do you think this would be possible?
I think it might be possible to (re)use I will make an attempt to try to reuse the HTTP blob store, but it might be worth considering what the fallback looks like. |
Thanks @GregBowyer. Do we know exactly how AWS authentication works? If not, I think we might need to find out and implement it ourselves fun. I think Bazel won't be able to ship 6MiB of libraries just so that it can authenticate to AWS. |
Uggh I can look into that but I suspect thats not fun |
@GregBowyer Agree 😄 . Let's first get something that works and then see how big the additional libraries required are. |
I think I'd avoid implementing it myself if possible. There's some code out there that provides a basic implementation: https://github.com/martijnvogten/java-aws-auth-v4/blob/master/src/nl/martijnvogten/aws/AwsAuthVersion4.java But that doesn't appear to do RBAC, which requires a bunch of code to get session keys from the AWS meta-data service (and regularly renewing them without getting identified as a DoS attack on the meta-data service)... and even the guy who wrote it says, "don't use this if at all possible". |
Out of question what JVM version does bazel target? Is it 1.7 or 1.8? |
@GregBowyer we use 1.8, possibly upgrading to 1.9 soon. |
While the current Bazel binary is significantly larger, I'd be rather unhappy about adding a 6 MB dependency at this point. I have worked on reducing the Bazel binary size, and I think we can get it ~10 MB, so a 6 MB additional dependency would be a 60% increase. Is there a way to pick just the jar that are necessary for auth and not put in the other stuff that isn't? |
We can, are we ok maintaining that code? |
@GregBowyer it depends on how complex the solution is, but generally yes. I can't believe nobody else has had and solved this problem in open source? |
So its pretty complex is the pause. Legally if we forked the auth code and adapted is there any issue. I dont think so since the AWS SDK is Apache-2 as per Bazel, but I dont want to put the bazel project in a tight spot here? Clean room implementing the auth methods would become more substantial. |
Ok I just pushed a WIP that replaces S3BlobStore and attempts to use AWS header creation directly. I will try to extract out the salient auth methods either using a different jar such as the one @cbsmith suggested or pulling the code from the AWS SDK. I am going to kick the tires a few times before I write unit tests for the expanded |
@GregBowyer awesome! thanks a lot! |
I assume so (if we put the modified SDK into its own directory underneath third_party and add a README.txt file that indicates where it originally comes from), but I have asked our internal team who actually knows this stuff for advice. Will report back when I heard something. |
@philwo I don't see why this wouldn't be ok. We have forked tons of code in our third_party folder right now. All our third_party folder is effectively forks. |
So the commit 021bdd2 fails for a range of AWS regions. Newer regions don't support the v2 auth method but choose to have a v4 method. The good news is, Amazon provides test vectors for the newer auth method, and it works everywhere. I reworked the auth handling to work with v4 auth methods. I will try again to kick the tires a bit before I go through what it takes to pull in how to actually get the credentials. |
Ok latest commit removes anything that isnt the full core of the AWS auth chain code. This removes the following deps:
We bring in:
I have run a basic bazel build test with the cache set to an S3 bucket, and as a result have it caching the artifacts as expected. I will need to test EC2 delegated permissions before we can call it done. Left TODO
|
@GregBowyer sounds awesome. I ll be on vacation until 9th April, I ll review it then :-). Big thanks for your efforts! |
I also need to find out why its a lot less reliable and far slower using HttpBlobStore than when using the SDK |
So investigating this uncovered a performance regression detailed here. #4944 I am going to assume the workaround applied in that issue, and carry on with this work to flesh out the S3 code some more. |
Hi @GregBowyer, checking in again on where you are at with this PR. It’s been a couple months and we are dusting off this PR as we upgrade to Bazel 0.28. Looks like AuthHeadersProvider still needs to be split out to unblock this PR. We sent you a PR GregBowyer#8 to update this branch to HEAD and are happy to help split out the AuthHeadersProvider. Please let us know if you have time to tackle this in the next couple weeks. If not, we’re happy to push this over the finish line. It would be great to finally get this merged. We understand that things get busy, so we're willing to push this forward. If you grant us write access to this branch, we're happy to keep the bulk of the changes here and to simply open a separate PR for the AuthHeadersProvider. Otherwise, we can open two new PRs (one with AuthHeadersProvider only and one with the rest of the changes) if you have no objections. We'll go ahead with this approach over the next couple weeks if we don't hear back before then. We might be moving away from http caching to gRPC based caching (remote_http_cache vs remote_cache) in the near future, but we'd still love to see these changes merged. |
Removes the directy dependency of the remote and buildeventservice packages on the authentication libraries by using the new AuthHeadersProvider. The authandtls package is split into the "authencation/google" and "grpc" package. This is a step towards merging bazelbuild#6810. This change is a no-op.
@buchgr We have tried reaching out to @GregBowyer by GitHub and email, but we have not heard back from him. We believe this PR still brings value to the community, what are our options here? As we suggested in the last comment, we have two draft PRs ready, #9138 and #9137. The former has commits co-authored with Greg, so we get a Google CLA notice stating we need Greg’s consent to move forward. We're not sure how to move forward without Greg though. What can we do if we don't hear back? |
Hi there, I won't have time to help out with reviews over the next weeks as there are some higher priority items for Bazel 1.0 that I will need to dedicate my time to. We have been thinking of an alternative way to add AWS and other authentication libraries to Bazel that doesn't require 1000s of lines of code changes. The basic idea would be for Bazel to launch a local proxy process that can add authentication headers. We would provide a proxy implementation that ships with commonly used authentication methods but users would also be free to add their own. How does this sound? Best, |
Sorry I have been super busy, I will take a look at your prs @buchgr isnt that just piling on more complexity for the sake of avoiding the complexity we already have? |
Authentication libraries living outside of Bazel has major advantages:
I understand that there are trade offs. I am happy to hear and discuss pros and cons. |
@buchgr Would the GCP authentication mechanisms also move to this proxy? If so, great. If not, then I worry the proxy would rot compared to Bazel. |
@buchgr So I did once try a sidecar proxy process, and it was a horrible experience, bazel or the proxy crashes and the build is broken. Is there a middle ground, maybe a nascent plugin architecture where the same can be achieved by downloading the jar and using that through well speced interfaces. |
@jjudd yes of course. |
.... or is this proxy process managed by the bazel server and communicated through GRPC or something akin to the persistent worker API, if that is the case I would be up for reworking aws auth to work along those lines. |
Awesome. Glad to hear it! If we can figure out a way to make this work seamlessly for the end user, then using the actual 3rd party dependencies sounds far easier to maintain. Upgrading custom 3rd party lib stuff with a custom written version of AWS code is going to be really ugly in the future. |
Yes. The idea was for the Bazel server to own the proxy process. Essentially, I was thinking for the proxy to implement the http and grpc API as well as a health check call. I was hoping to reuse https://github.com/buchgr/bazel-remote or something like it. Thinking further, this proxy could also replace the disk cache. A design document would be a good start. Many details will need to be figured out :). |
As a user I find it very confusing to see built in support for gcp buckets and such a large ongoing debate about being hesitant to include support for an even more widely used and prolific object storage service like s3. I've been watching and waiting on this feature for over a year and at least one other now abandoned pr due to feedback cycles and movement. I'm not sure how this is a different story than bazels gcp authentication implementation. Why is the s3 (carrying broader set of potential bazel customers) a special case? The proxy idea sounds interesting but could it be made a separate effort. I believe the puller has followed all of the feedback from reviewers at this point. What are the chances of merging this when a separate project for extracting authentication as a managed proxy process? |
@mavenraven yes, we forked it and maintained it by ourselves. We have a branch that supported this PR until 0.28, but we switched to grpc caching starting from 0.29. We moved away from it to avoid constant rebasing. |
@borkaehw Ah yeah, that's what I figured. Thanks for the tip on grpc caching, I'll look into that! |
I can see where you are coming from but what you are trying to imply is simply not accurate. I still think it would be awesome if Bazel supported authentication to AWS and many other providers. However, it seems to be the case that for AWS authentication we either have to ship 6 MiB of libraries or maintain our own fork of it. That's a big hurdle. The gcloud library is ~100KiB. The whole Bazel binary is around ~40 MiB. I originally added GCS caching because it was so simple to do. Additionally, over the course of this year with the advent of "builds without the bytes" (BWTB) I believe that both GCS and AWS S3 caching will be less relevant as BWTB does not work with either. Lastly, I am not trying to make excuses for the mistakes that I have made with reviewing this PR, but what you are suggesting is simply not true. |
For what it's worth, our (Square's android dev-ex team) experiments showed S3 to be a worse option than an AWS server running the open-source implementations of the caching protocol, whether using the forked support or the Asana bazels3cache proxy (which even handles slow-writes). I think S3 isn't optimized the way most people would need for a really efficient build cache. Other Amazon services can be. Heck, we didn't even go with the remote cache for our CI, as it turned out an NFS drive in AWS highly available to the build machines, using the --disk-cache, was faster than all of them (though we miss out on managed clean-up, which we do out of process). We're currently playing with a few options from a few vendors for developer-accessible build caches. The point is: while I agree having support for major "storage buckets, out of the box" is a super sweet nice to have feature, the protocol itself is quite well supported, and "having it for completeness" don't outweigh other considerations. There's nothing stopping anyone from using Amazon or Microsoft cloud services instead of Google cloud, and even the baked-in support for gcs doesn't make it some sort of monopolistic play. There are good and bad reasons to prefer each vendor and among their various options. |
@mavenraven No that's not fair and not quite how it shakes out. @buchgr is pretty spot on here, its a pig to maintain either way and really the subtitles of S3 vs GCS are such that even simple support for both is pretty invasive in the code. I have found personally running the cache server instances to be a pain that the s3 version solves, however that pain is really related to the management of that server pool (more stuff to manage, more cost, slower etc) I think bonus points really if we can get to a place where its a As @buchgr alluded to, having a version that manages a "side car" that provides caching is IMHO a clearer and better solution, it lets people get crazy on what they want caching to be and saves us from a world of managing ever larger build farms. @cgruber that surprises me, although I suspect its a language thing, I get easily a 80-90% speedup on s3 and a clear save over NFS or other solutions. With that said I am building twisty C++ where I have translation units that can easily take multiple minutes to compile, and my compile farm is inside AWS (with the s3 and builders configured to be in the same region) It might be a case of YMMV? |
Removed @aiuto as reviewer. I don't have enough knowledge of this code to comment. |
This is dead in the water now, to be redone another day |
Are there any other attempts to build this for the modern Bazel? |
My hope is that it's now possible to implement this by using |
This is an initial attempt to support #4844
It directly supports S3 as a simple remote cache. After spending some time looking at how AWS credentials are done it looks like its easier to use the SDK, instead of hand rolling the auth directly.
Since the SDK provides the backbone for auth support, it makes sense to include the S3 parts of the sdk as well.
In this PR we use the S3 SDK directly to provide S3 remote cache support. This support should be on par with the google datastore support, and has similar options in its usage.
Thoughts?
TODO
get
performance regression #4944)get
performance regression #4944)AwsHttpAuthAdapter
as its constructor is messy for testingjavax.time
onesget
performance regression #4944) causes