Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for mTLS in remote cache usage #19887

Merged
merged 12 commits into from
Oct 2, 2023

Conversation

tgolsson
Copy link
Contributor

@tgolsson tgolsson commented Sep 20, 2023

This PR adds support for mTLS when using remote caching. The API is fairly similar to what we've already have, but I've taken the liberty of doing some type cleanups. I also did some refactoring of the tls setup code to move some earliers to parse-time rather than conversion time, and simplify a bit.

I also found a bug with the alpn config, which isn't actually valid -- hyper just barfs when that is defined. Not sure how no-one else is hitting that, but it is what it is. Maybe a version/release issue. Comes from here: https://github.com/rustls/hyper-rustls/blob/163b3f539a497ae9c4fa65f55a8133234ef33eb3/src/connector/builder.rs#L46-L51.

I unfortunately don't have any non-mTLS cache to test against, so would love if someone else can test that I haven't broken something there.

(I need to figure out how to easily test-drive this in CI etc; right now I can only validate this works to read/write locally. Otherwise I guess I can test it for real in 2.19 :( )

Copy link
Contributor

@tdyas tdyas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this. I left some comments.

Also, maybe also add a test in https://github.com/pantsbuild/pants/blob/eecd07908868af3e3e0d97a91852de2785b7b16c/src/rust/engine/grpc_util/src/channel.rs to ensure mTLS mode works?

src/rust/engine/grpc_util/src/tls.rs Outdated Show resolved Hide resolved
src/rust/engine/grpc_util/src/tls.rs Outdated Show resolved Hide resolved
src/rust/engine/process_execution/remote/src/remote.rs Outdated Show resolved Hide resolved
src/rust/engine/process_executor/src/main.rs Outdated Show resolved Hide resolved
src/rust/engine/process_executor/src/main.rs Outdated Show resolved Hide resolved
src/rust/engine/src/context.rs Outdated Show resolved Hide resolved
src/rust/engine/src/context.rs Outdated Show resolved Hide resolved
src/rust/engine/src/context.rs Outdated Show resolved Hide resolved
@tgolsson
Copy link
Contributor Author

@tdyas Thanks for the comments! Updated from feedback and addressed some points in replies. Fixed some ergonomics to accept AsRef for the data as well in the configuration constructor, so it works as well with both owned and unowned data.

Re: tests -- as far as I can tell, those tests don't use anything at all from the code I've changed. I went ahead and added a test locally, but it doesn't pass. I think this happens because I can't configure "mtls but don't verify the certs" without re-implementing a custom rustls verifier, which seems... excessive. You added the certs; if you recall how they were generated I can create new ones and see if that solves the issue. :-)

@tdyas
Copy link
Contributor

tdyas commented Sep 21, 2023

Re: tests -- as far as I can tell, those tests don't use anything at all from the code I've changed. I went ahead and added a test locally, but it doesn't pass. I think this happens because I can't configure "mtls but don't verify the certs" without re-implementing a custom rustls verifier, which seems... excessive. You added the certs; if you recall how they were generated I can create new ones and see if that solves the issue. :-)

I don't recall exactly. I may have copied the certs from tonic or rustls but they are actually technically expired and so invalid so the TLS integration test actually rejects them which is why I configured TLS in the test to not verify the certificates.

My request here is to ensure we have an integration test proving mTLS works since we have no such test and I would like to prevent future regressions here.

I did some searching tonight and found the rcgen crate which appears to have the ability to easily generate self-signed certificates entirely in Rust code. I found it in this rustls issue: rustls/rustls#400

@tgolsson
Copy link
Contributor Author

tgolsson commented Sep 21, 2023

Will have a look, but I'll probably a test in tls.rs purely. Validating that we set up the right config options. I think beyond that it's fairly out of our control, as we just pass the config to Hyper's connector builder.

@tdyas
Copy link
Contributor

tdyas commented Sep 21, 2023

Will have a look, but I'll probably a test in tls.rs purely. Validating that we set up the right config options. I think beyond that it's fairly out of our control, as we just pass the config to Hyper's connector builder.

My thought is just for the integration test to be a "sanity check." So the verifier does not need to actually verify the certificates, just check that certificates were passed. My concern is we refactor the TLS setup logic in the future and somehow not even pass them to rustls.

@tdyas tdyas added this to the 2.18.x milestone Sep 21, 2023
@tdyas
Copy link
Contributor

tdyas commented Sep 21, 2023

fyi I added the label to cherry-pick this automatically to the 2.18.x branch.

@tgolsson
Copy link
Contributor Author

Thank you! Will try to finish up tomorrow evening.

@tgolsson
Copy link
Contributor Author

tgolsson commented Sep 22, 2023

@tdyas PTAL again! I added three tests: Does the cert reach the client, and does mTLS on/off generate an expected rustls::ClientConfig. I found no good way to test for alpn without breaking down the channel/client construction into ridiculously small steps. But we do both enable_http2 and http2_only(true) so I think we're safe there.

Also reverted the grpcs check as discussed.

CI seems borked; it fails on this:

error[E0599]: no function or associated item named `new_without_mtls` found for struct `Config` in the current scope
  --> process_execution/remote/src/remote_cache.rs:89:30
   |
89 |     tls_config: tls::Config::new_without_mtls(options.root_ca_certs.clone()),
   |                              ^^^^^^^^^^^^^^^^ function or associated item not found in `Config`

For more information about this error, try `rustc --explain E0599`.

but my workspace is clean and that line doesn't exist on my disk. Trying a rebase...

@tgolsson tgolsson force-pushed the ts/remote-cache-mtls branch from 3448e00 to 0e585b2 Compare September 22, 2023 16:00
@@ -91,6 +91,9 @@ remote_execution_address = "grpcs://build.example.com:443"
remote_instance_name = "main"
# This is optional, Pants will auto-discover certificates otherwise.
remote_ca_certs_path = "/etc/ssl/certs/ca-certificates.crt"
# this allows you to setup mTLS with a client certificate and key.
remote_mtls_certs_path = "/etc/ssl/certs/client-cert.pem"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming questions:

  1. Can there be multiple certificates in the file at that path or just one? I.e., should the option be remote_mtls_cert_path instead of remote_mtls_certs_path?
  2. These options are really only about enabling the TLS client certificate and key. mTLS actually implies the superset of doing mutual client cert verification and server cert verification. Should we rename these options to reflect that they only pertain to the client certificate? Bazel calls its equivalent options --tls_client_certificate and --tls_client_key. Maybe we s should similarly call the Pants options remote_tls_client_cert_path and remote_tls_client_key?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @stuhood and @benjyw on input on the option naming

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Yes, I think it's always a chain formally.

  2. Makes sense on naming, I had that during my initial setup but since the internal config calls it mtls and Pants is never a server I went for that name. But I'm happy to rename.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably worth asking for some opinions in Slack. I'll tag this comment thread there.

Copy link
Contributor

@tdyas tdyas Sep 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also given the mTLS functionality was never visible to users, my view is we are free to name it as if this were a new feature. :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think client in the name makes sense, presumably in an mTLS scenario the server cert still comes via the CA chain?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed all the public places that I introduced, and most validation messages. Kept the lower level structs as mtls_data to harmonize with MtlsConfig.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kept mTLS in the docs as a search needle, but also added client authentication so both are searchable.

@tgolsson tgolsson requested review from benjyw, tdyas and huonw September 24, 2023 10:30
@tgolsson
Copy link
Contributor Author

Just tested everything on our prod repo after the rebase, so should be G2G.

  remote_cache_requests: 9
  remote_cache_requests_cached: 9

src/python/pants/option/global_options.py Outdated Show resolved Hide resolved
src/python/pants/option/global_options.py Outdated Show resolved Hide resolved
src/rust/engine/fs/brfs/src/main.rs Show resolved Hide resolved
src/rust/engine/grpc_util/src/tls.rs Show resolved Hide resolved
src/rust/engine/grpc_util/src/tls.rs Outdated Show resolved Hide resolved
@@ -369,7 +369,7 @@ async fn execute(top_match: &clap::ArgMatches) -> Result<(), ExitError> {
"Failed to read mtls-client-certificate-chain-path from {cert_chain_path:?}: {err:?}"
)
})?;
Some(MtlsConfig { key, cert_chain })
Some((cert_chain, key))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A pair of items of the same type, where order matters, makes me uneasy. Especially since we're not consistent with the order (see below). Could this be a struct with named fields, say cert_chain_pem and key_pem?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. We should put this back as a struct with two named fields.

Copy link
Contributor Author

@tgolsson tgolsson Sep 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's inconsistent feedback; I had them as two fields and was asked to fold them into one option elsewhere, now it's the other way around. I also did explicitly construct an MtlsConfig from the two pieces, and passed it to TlsConfig, but was also asked to change that. I think "config objects" are quite weird when this local, and what should be done is to construct the TlsConfig immediately after loading the data and not in many places.

I believe this is best tackled as part of point 3 here: #19902.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that other feedback did say "(or a struct if having names works better)" :)

Agreed that constructing the TlsConfig earlier makes sense, and happy to punt it to #19902

}
}

let key = key.ok_or_else(|| "No private key found in mTLS key file".to_owned())?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably want to replace mTLS with the name of the option for the key file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This place does not know that because there's many main.rs that reach this point. With proper setup of error chaining (formatting, expect, etc) it should be handled by higher levels.

@@ -1602,6 +1608,35 @@ class BootstrapOptions:
"""
),
)
remote_client_certs_path = StrOption(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason for not including "tls" in the name similar to how Bazel names it? I.e., remote_tls_client_certs_path and remote_tls_client_key_path? I guess the choice was made with remote_ca_certs_path not including tls in the name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah; I think all fields should have tls or none, yes.

@tgolsson
Copy link
Contributor Author

@benjyw @tdyas Responded to feedback again, PTAL. I'm happy to keep iterating but I think there's some overlap between what I'm doing and what @huonw is looking at with simplifying the remote. If we want to simplify TLS setup I do think the best approach is to construct it immediately after reading files, and pass Option<TlsConfig> to any interior APIs -- I'm not fundamentally doing anything other than what was already being done with root_ca_certs, and I think any arguments for or against my approach apply there as well.

That seems better to tackle holistically, and not apply band-aids. I'm happy to tackle "construct TLSConfig once" in a separate PR if you want, but that mostly depends on what @huonw wants to achieve.

@tdyas
Copy link
Contributor

tdyas commented Sep 25, 2023

That seems better to tackle holistically, and not apply band-aids. I'm happy to tackle "construct TLSConfig once" in a separate PR if you want, but that mostly depends on what @huonw wants to achieve.

I'm fine with the internal redesign proceeding in other PRs. I'm just concerned about the switch from struct MtlsConfig to a 2-tuple but if that will be refactored away than I'm less concerned.

@huonw: What are your thoughts?

@huonw
Copy link
Contributor

huonw commented Sep 25, 2023

Yeah, more than happy to have whatever internal redesign happen after landing, to get the user-facing aspects in earlier. Feel free to adjust #19902 to include extra things too.

@benjyw benjyw merged commit 7412749 into pantsbuild:main Oct 2, 2023
@WorkerPants
Copy link
Member

I tried to automatically cherry-pick this change back to each relevant milestone, so that it is available in those older releases of Pants.

❌ 2.18.x

I was unable to cherry-pick this PR to 2.18.x, likely due to merge-conflicts.

Steps to Cherry-Pick locally

To resolve:

  1. (Ensure your git working directory is clean)
  2. Run the following script to reproduce the merge-conflicts:
    git fetch https://github.com/pantsbuild/pants main \
      && git fetch https://github.com/pantsbuild/pants 2.18.x \
      && git checkout -b cherry-pick-19887-to-2.18.x FETCH_HEAD \
      && git cherry-pick 74127491f69e698bf9243c6af0a02c8ef42d9080
  3. Fix the merge conflicts and commit the changes
  4. Run build-support/cherry_pick/make_pr.sh "19887" "2.18.x"

Please note that I cannot re-run CI if a job fails. Please work with your PR approver(s) to re-run CI if necessary.


When you're done manually cherry-picking, please remove the needs-cherrypick label on this PR.

Thanks again for your contributions!

🤖 Beep Boop here's my run link

@WorkerPants WorkerPants added the auto-cherry-picking-failed Auto Cherry-Picking Failed label Oct 2, 2023
@benjyw
Copy link
Contributor

benjyw commented Oct 2, 2023

@tgolsson The auto-cherry-pick to 2.18.x failed. Can you do that manually? The steps are listed above.

@tgolsson
Copy link
Contributor Author

tgolsson commented Oct 2, 2023

Ack, will look tonight!

tgolsson added a commit to tgolsson/pants that referenced this pull request Oct 2, 2023
This PR adds support for mTLS when using remote caching. The API is
fairly similar to what we've already have, but I've taken the liberty of
doing some type cleanups. I also did some refactoring of the tls setup
code to move some earliers to parse-time rather than conversion time,
and simplify a bit.

I also found a bug with the `alpn` config, which isn't actually valid --
hyper just barfs when that is defined. Not sure how no-one else is
hitting that, but it is what it is. Maybe a version/release issue. Comes
from here:
https://github.com/rustls/hyper-rustls/blob/163b3f539a497ae9c4fa65f55a8133234ef33eb3/src/connector/builder.rs#L46-L51.
@tgolsson tgolsson added auto-cherry-picking-failed Auto Cherry-Picking Failed and removed auto-cherry-picking-failed Auto Cherry-Picking Failed needs-cherrypick labels Oct 2, 2023
benjyw pushed a commit that referenced this pull request Oct 2, 2023
…19960)

This PR adds support for mTLS when using remote caching. The API is
fairly similar to what we've already have, but I've taken the liberty of
doing some type cleanups. I also did some refactoring of the tls setup
code to move some earliers to parse-time rather than conversion time,
and simplify a bit.

I also found a bug with the `alpn` config, which isn't actually valid --
hyper just barfs when that is defined. Not sure how no-one else is
hitting that, but it is what it is. Maybe a version/release issue. Comes
from here:
https://github.com/rustls/hyper-rustls/blob/163b3f539a497ae9c4fa65f55a8133234ef33eb3/src/connector/builder.rs#L46-L51.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants