-
-
Notifications
You must be signed in to change notification settings - Fork 644
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add support for mTLS in remote cache usage #19887
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this. I left some comments.
Also, maybe also add a test in https://github.com/pantsbuild/pants/blob/eecd07908868af3e3e0d97a91852de2785b7b16c/src/rust/engine/grpc_util/src/channel.rs to ensure mTLS mode works?
src/rust/engine/process_execution/remote/src/remote_cache/reapi.rs
Outdated
Show resolved
Hide resolved
@tdyas Thanks for the comments! Updated from feedback and addressed some points in replies. Fixed some ergonomics to accept Re: tests -- as far as I can tell, those tests don't use anything at all from the code I've changed. I went ahead and added a test locally, but it doesn't pass. I think this happens because I can't configure "mtls but don't verify the certs" without re-implementing a custom rustls verifier, which seems... excessive. You added the certs; if you recall how they were generated I can create new ones and see if that solves the issue. :-) |
I don't recall exactly. I may have copied the certs from My request here is to ensure we have an integration test proving mTLS works since we have no such test and I would like to prevent future regressions here. I did some searching tonight and found the |
Will have a look, but I'll probably a test in |
My thought is just for the integration test to be a "sanity check." So the verifier does not need to actually verify the certificates, just check that certificates were passed. My concern is we refactor the TLS setup logic in the future and somehow not even pass them to |
fyi I added the label to cherry-pick this automatically to the |
Thank you! Will try to finish up tomorrow evening. |
@tdyas PTAL again! I added three tests: Does the cert reach the client, and does Also reverted the CI seems borked; it fails on this:
but my workspace is clean and that line doesn't exist on my disk. Trying a rebase... |
3448e00
to
0e585b2
Compare
@@ -91,6 +91,9 @@ remote_execution_address = "grpcs://build.example.com:443" | |||
remote_instance_name = "main" | |||
# This is optional, Pants will auto-discover certificates otherwise. | |||
remote_ca_certs_path = "/etc/ssl/certs/ca-certificates.crt" | |||
# this allows you to setup mTLS with a client certificate and key. | |||
remote_mtls_certs_path = "/etc/ssl/certs/client-cert.pem" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naming questions:
- Can there be multiple certificates in the file at that path or just one? I.e., should the option be
remote_mtls_cert_path
instead ofremote_mtls_certs_path
? - These options are really only about enabling the TLS client certificate and key. mTLS actually implies the superset of doing mutual client cert verification and server cert verification. Should we rename these options to reflect that they only pertain to the client certificate? Bazel calls its equivalent options
--tls_client_certificate
and--tls_client_key
. Maybe we s should similarly call the Pants optionsremote_tls_client_cert_path
andremote_tls_client_key
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
Yes, I think it's always a chain formally.
-
Makes sense on naming, I had that during my initial setup but since the internal config calls it mtls and Pants is never a server I went for that name. But I'm happy to rename.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably worth asking for some opinions in Slack. I'll tag this comment thread there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also given the mTLS functionality was never visible to users, my view is we are free to name it as if this were a new feature. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think client
in the name makes sense, presumably in an mTLS scenario the server cert still comes via the CA chain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed all the public places that I introduced, and most validation messages. Kept the lower level structs as mtls_data
to harmonize with MtlsConfig
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kept mTLS in the docs as a search needle, but also added client authentication
so both are searchable.
Just tested everything on our prod repo after the rebase, so should be G2G.
|
@@ -369,7 +369,7 @@ async fn execute(top_match: &clap::ArgMatches) -> Result<(), ExitError> { | |||
"Failed to read mtls-client-certificate-chain-path from {cert_chain_path:?}: {err:?}" | |||
) | |||
})?; | |||
Some(MtlsConfig { key, cert_chain }) | |||
Some((cert_chain, key)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A pair of items of the same type, where order matters, makes me uneasy. Especially since we're not consistent with the order (see below). Could this be a struct with named fields, say cert_chain_pem
and key_pem
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. We should put this back as a struct
with two named fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's inconsistent feedback; I had them as two fields and was asked to fold them into one option elsewhere, now it's the other way around. I also did explicitly construct an MtlsConfig from the two pieces, and passed it to TlsConfig, but was also asked to change that. I think "config objects" are quite weird when this local, and what should be done is to construct the TlsConfig immediately after loading the data and not in many places.
I believe this is best tackled as part of point 3 here: #19902.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that other feedback did say "(or a struct if having names works better)" :)
Agreed that constructing the TlsConfig earlier makes sense, and happy to punt it to #19902
src/rust/engine/grpc_util/src/tls.rs
Outdated
} | ||
} | ||
|
||
let key = key.ok_or_else(|| "No private key found in mTLS key file".to_owned())?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably want to replace mTLS
with the name of the option for the key file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This place does not know that because there's many main.rs
that reach this point. With proper setup of error chaining (formatting, expect, etc) it should be handled by higher levels.
@@ -1602,6 +1608,35 @@ class BootstrapOptions: | |||
""" | |||
), | |||
) | |||
remote_client_certs_path = StrOption( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason for not including "tls" in the name similar to how Bazel names it? I.e., remote_tls_client_certs_path
and remote_tls_client_key_path
? I guess the choice was made with remote_ca_certs_path
not including tls
in the name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah; I think all fields should have tls
or none, yes.
Co-authored-by: Benjy Weinberger <[email protected]>
Co-authored-by: Benjy Weinberger <[email protected]>
@benjyw @tdyas Responded to feedback again, PTAL. I'm happy to keep iterating but I think there's some overlap between what I'm doing and what @huonw is looking at with simplifying the remote. If we want to simplify TLS setup I do think the best approach is to construct it immediately after reading files, and pass That seems better to tackle holistically, and not apply band-aids. I'm happy to tackle "construct TLSConfig once" in a separate PR if you want, but that mostly depends on what @huonw wants to achieve. |
I'm fine with the internal redesign proceeding in other PRs. I'm just concerned about the switch from @huonw: What are your thoughts? |
Yeah, more than happy to have whatever internal redesign happen after landing, to get the user-facing aspects in earlier. Feel free to adjust #19902 to include extra things too. |
I tried to automatically cherry-pick this change back to each relevant milestone, so that it is available in those older releases of Pants. ❌ 2.18.xI was unable to cherry-pick this PR to 2.18.x, likely due to merge-conflicts. Steps to Cherry-Pick locallyTo resolve:
Please note that I cannot re-run CI if a job fails. Please work with your PR approver(s) to re-run CI if necessary. When you're done manually cherry-picking, please remove the Thanks again for your contributions! |
@tgolsson The auto-cherry-pick to 2.18.x failed. Can you do that manually? The steps are listed above. |
Ack, will look tonight! |
This PR adds support for mTLS when using remote caching. The API is fairly similar to what we've already have, but I've taken the liberty of doing some type cleanups. I also did some refactoring of the tls setup code to move some earliers to parse-time rather than conversion time, and simplify a bit. I also found a bug with the `alpn` config, which isn't actually valid -- hyper just barfs when that is defined. Not sure how no-one else is hitting that, but it is what it is. Maybe a version/release issue. Comes from here: https://github.com/rustls/hyper-rustls/blob/163b3f539a497ae9c4fa65f55a8133234ef33eb3/src/connector/builder.rs#L46-L51.
…19960) This PR adds support for mTLS when using remote caching. The API is fairly similar to what we've already have, but I've taken the liberty of doing some type cleanups. I also did some refactoring of the tls setup code to move some earliers to parse-time rather than conversion time, and simplify a bit. I also found a bug with the `alpn` config, which isn't actually valid -- hyper just barfs when that is defined. Not sure how no-one else is hitting that, but it is what it is. Maybe a version/release issue. Comes from here: https://github.com/rustls/hyper-rustls/blob/163b3f539a497ae9c4fa65f55a8133234ef33eb3/src/connector/builder.rs#L46-L51.
This PR adds support for mTLS when using remote caching. The API is fairly similar to what we've already have, but I've taken the liberty of doing some type cleanups. I also did some refactoring of the tls setup code to move some earliers to parse-time rather than conversion time, and simplify a bit.
I also found a bug with the
alpn
config, which isn't actually valid -- hyper just barfs when that is defined. Not sure how no-one else is hitting that, but it is what it is. Maybe a version/release issue. Comes from here: https://github.com/rustls/hyper-rustls/blob/163b3f539a497ae9c4fa65f55a8133234ef33eb3/src/connector/builder.rs#L46-L51.I unfortunately don't have any non-mTLS cache to test against, so would love if someone else can test that I haven't broken something there.
(I need to figure out how to easily test-drive this in CI etc; right now I can only validate this works to read/write locally. Otherwise I guess I can test it for real in 2.19 :( )