Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mutual TLS with customer provided certificates #34

Closed
jamsajones opened this issue Nov 28, 2018 · 31 comments
Closed

Mutual TLS with customer provided certificates #34

jamsajones opened this issue Nov 28, 2018 · 31 comments
Assignees

Comments

@jamsajones
Copy link

jamsajones commented Nov 28, 2018

@coultn coultn transferred this issue from aws/aws-app-mesh-examples Mar 28, 2019
@siamaksym
Copy link

+1

@bcelenza
Copy link
Contributor

bcelenza commented Jul 9, 2019

Potentially implemented by #68

@isaac-mj
Copy link

Hi guys,

I've noted that you have moved this issue from "Researching" to "We're Working On It". Do you have now a delivery date? Even if it's a quarter indication would be awesome!

Thx,
Isaac.

@shubharao
Copy link

We are targeting to deliver by EOY 2019, but this is subject to change and we'll keep this issue updated as we know more.

@shubharao shubharao added Roadmap: Accepted We are planning on doing this work. Phase: Working on it labels Sep 28, 2019
@bigdefect bigdefect self-assigned this Nov 13, 2019
@juandiegopalomino
Copy link

Hi there! Any update on the work on this request?

@bcelenza
Copy link
Contributor

@juandiegopalomino While you're here, a couple of questions to help us validate our design :)

  1. What do you plan on using for certificates in mTLS?
  2. Do you have any specific requirements around the certificates used for identity, such as validity length, naming, etc.?
  3. Do you plan on using only mTLS between services, or do you have other AuthN needs?

@totally-free-checking
Copy link

totally-free-checking commented Feb 11, 2020

@bcelenza This is an important feature for Open Banking customers in Australia. The pilot is in progress with the first phase of the rollout on July 1, 2020. The regs stipulate mTLS for certain endpoints. In this scenario:

  1. What do you plan on using for certificates in mTLS?
    The client and server certs are issued by the government-regulated CA.

  2. Do you have any specific requirements around the certificates used for identity, such as validity length, naming, etc.?
    This part of the spec is yet to be defined AFAIK, and during the pilot key issuance and rotation is done manually. But, once finalised, there is likely to be means of programmatically rotating keys.

  3. Do you plan on using only mTLS between services, or do you have other AuthN needs?
    mTLS is required for all back-channel (server-to-server) flows. Client-facing flows will use JWT and TLS.

@kamil-rogon-dragon
Copy link

I've got one use case for you - transparent authentication of services inside a AppMesh to AWS MSK.

@bigdefect
Copy link
Contributor

@kamil-rogon-dragon Could you elaborate? I'm unfamiliar with Kafka/MSK, just taking a cursory look over their docs; it looks like you can source your own certs via ACMPCA. Are you looking to represent MSK within the mesh?

Right now with #39 you can configure your services to retrieve a cert and a validation context for the PCA you've configured for your brokers. Assuming you can represent your cluster as a virtual service using existing primitives (looks like you get a list of IPs for the brokers?), at a high level that could be sufficient to configure Envoy correctly; though of course you're only getting CA validation on MSK's side (are you looking for stricter trust?).

@isaac-mj
Copy link

isaac-mj commented May 22, 2020

Hi guys, current setup:

  • Service A: Distributed in multiple AWS regions to improve QoS.
  • Service B: Deployed only in one region.
  • Service A (i.e. Dublin) needs to connect to service B (i.eNorth Virginia).

As far as I know, in order to connect these two services together, we would need (#97):
1- Create a new mesh in each region
2- VPC Peer both regions
3- Create a Virtual Gateway to connect both meshes

My question, would this feature support mTLS between the two messes? How would I have to secure the link between the two meshes?

@bigdefect
Copy link
Contributor

bigdefect commented May 26, 2020

@isaac-mj I believe you're right on your cross-region routing steps. As @dastbe alluded to there, having multi-region services natively represented is the ideal.

For encryption, the mTLS case of cross-region communication shouldn't be any different than it is with the encryption support we have today. I need to validate this (and have recorded your comment accordingly), but I think the only issue is retrieving secrets across regions - functionality shouldn't be impeded. You'll have to ensure that the root CA certificate that you're using in a validation context is retrievable/present for a given virtual node in a region; for file system-based and SDS-based certs, that's on you to provide to the proxies.

But if you're using ACM, I don't think we can currently relax the same-region requirement, so the entity and CA certs need to be in the same region as the virtual node that's referencing them. e.g. the root CA cert for the ingress gateway in us-east-1 would also need to be present in a PCA in eu-west-1, so Service A can retrieve it. Does that make sense?

@isaac-mj
Copy link

isaac-mj commented Jul 8, 2020

Hello everyone,

I have a follow-up question. We are launching AppMesh together with ACM to setup TLS between our mesh nodes. However, we are wondering about the rotation frequency that you will implement for mTLS.

For example, looking at the frequency of Consul (https://www.consul.io/docs/connect/ca/aws), they rotate certificates for their nodes every 54 hours. We did a brief calculation of the costs of that type of rotation in our current platform assuming we use ACM and the costs would skyrocket.

Have you defined a rotation policy for the mTLS feature?

Thx,
Isaac.

@bigdefect
Copy link
Contributor

bigdefect commented Jul 8, 2020

@isaac-mj Thanks for launching with us!

Our integration with ACM (currently) has you bringing your certs to App Mesh (as you're aware). So rotation is entirely within your control and likely will not dramatically change with mTLS. ACM's cert validity period is currently fixed, so we're looking into revocation (#172) and how we can make CRLs work and scale. I believe ACM is working on improving the validity story, though I can't speak to their designs or roadmap.

That doesn't directly answer your question, but let me know if it doesn't help.

@rizblie
Copy link

rizblie commented Jul 21, 2020

I have a question re: whether/how mTLS support will enable a service to authorise an invocation from a downstream service.

For example, as part of the virtual node listener spec, will it be possible to specify a list of acceptable downstream services, or a list of downstream services to block, based on the domain names associated with their certs?

And would it be possible to specify these as part of a virtual service spec in addition to a virtual node spec?

@bigdefect
Copy link
Contributor

@rizblie Thanks for your questions.

The API is still in flux but we'll have a draft here once we're working towards a preview version of the feature. If using the Subject Alternative Names on certs is sufficient as a a coarse authorization mechanism, this is certainly something we can support. We are also researching external authorizers (#140) for more control, feel free to chime in there if that's of interest. Do you have requirements on the structure/format/count of your SANs specified on an upstream node?

And would it be possible to specify these as part of a virtual service spec in addition to a virtual node spec?

This is a great question. We've focused on providing primitives via virtual nodes, but we recognize there's a lot of configuration there, some of which may make sense to specify in a higher level construct, e.g. a virtual service or something else.

Would you prefer to specify this "SAN validation" on a virtual service? What about CAs (e.g. TLS client policies on backends)? Do you find yourself specifying the same configuration on several virtual nodes behind a virtual service?

@rizblie
Copy link

rizblie commented Jul 22, 2020

Thanks @efe-selcuk .

While an external authorizer is certainly desirable for maximum flexibility, I believe there is also room for a simpler built-in mechanism based on the cert's SAN to address common, basic scenarios e.g. "I want to allow my service to be called by any downstream service in *.payments.local", or "I want to explicitly block downstream services in *.search.local". A simple approach would be to offer options to specify either an ALLOW list or a DENY list. Both options would be useful, but if forced to choose I think an ALLOW list would be more valuable - as it gives the owning team full control over who is allowed to access their service.

These ALLOW/DENY lists could be specified either at the virtual service level or the virtual node level. I think it makes more sense to do this at the service level, as the owning team is likely to want to apply the same policy to all virtual nodes that the service routes to. For example, if using weighted routing to two different versions of the service (canary deployment), then it would not make sense for one version to accept requests from a downstream service, while the other was rejecting them.

RE: CAs, yes I think it also makes sense to specify config at the service level, as in most cases all virtual nodes under the same virtual service parent would require the exact same configuration.

For situations where additional flexibility is required at the virtual node level, perhaps you could have a two-tier system - where virtual nodes inherit the policy/config from their parent virtual service, but can optionally override them at the virtual node level?

@bigdefect
Copy link
Contributor

bigdefect commented Jul 22, 2020

@rizblie The extra feedback is much appreciated. I can at least say we're acutely aware of the problems you're trying to solve.

At the risk of getting too into the weeds here... I'm hesitant to use SANs for anything outside of their specific domain; for example, the dns name for a virtual node or service doesn't necessarily map 1:1 to the SAN on the cert (like a SPIFFE SVID). It wouldn't exist in terms of a generic allow/deny list when modeling your services in app mesh, but I call it a "coarse authorization mechanism" exactly because it coincidentally ends up acting as an allow list when using TLS.

That being said, there is absolutely a need for better authorization controls in a way that maps better into the mesh (even simpler controls as you've described), and that's where the more focused authorization discussions come into play.

In terms of the rules around accepting SANs, there are several concerns, including what we have available in envoy's api (vs balancing against new contributions of course). For example, the mechanism to validate those SANs we have in one version versus a newer version. Wildcards are also sensitive just because of context.

As for specifying TLS configuration at a virtual service level, it's tricky both from an API confusion perspective as well as a "do the right thing with override behavior" perspective. For example, in TLS client policies on virtual node backends, we allow defaults with overrides, but we explicitly do not merge the fields.

@rizblie
Copy link

rizblie commented Jul 24, 2020

@efe-selcuk I see your point.

Thinking about it a bit more, a better mesh solution might be to employ a similar approach as for backends i.e. just like a virtual node can reference a backend using a virtual service name or ARN, a virtual service could specify a frontend service to specify which downstream services are allowed to call it.

The same dilemma arises re: whether to apply at service or node level. I would argue that service level is simpler and more useful. As soon as you introduce this at node level, you have to deal with the problem that the downstream service may be routing to a mix of virtual nodes, some of which are valid frontends for the target service, while others are not. This will lead to authZ errors if the routing does not take this into account.

@bigdefect
Copy link
Contributor

@rizblie A few questions:

  • Are you using plain hostnames for your SANs? SPIFFE SVIDs? Something else? Any length requirements?
  • Would you need many per mesh endpoint (whether on the client policy for a backend, or on a listener)?
  • Do you have services that call or are called by many others such that wildcard-style allow lists (as you mentioned earlier) would be strongly preferable?

We have to balance these against any security implications of course.

@bigdefect bigdefect changed the title authN based on mTLS Mutual TLS with customer provided certificates Sep 17, 2020
@bigdefect
Copy link
Contributor

bigdefect commented Sep 17, 2020

Hello everyone. Today, you can encrypt communication between your Envoy proxies with TLS by providing a certificate from the listeners on your upstream/server virtual nodes and gateways, and specifying validation criteria (trusted Certificate Authority) on your clients/downstreams.

App Mesh will be introducing support for authentication with Mutual TLS. In broad terms, this will allow you to mutually authenticate communication between your virtual nodes and virtual gateways (and external services), by also providing a client certificate on your downstream Envoys, and specifying validation criteria on upstream Envoys. Additionally, you’ll be able to optionally specify the Subject Alternative Names which must appear on the peer certificate as part of validation.

Mechanically, this will involve new APIs for:

  1. Specifying a client certificate per backend, or one for all backends, on a virtual node
    1. For virtual gateways, it will be a single certificate for all backends, per the single "default" client policy
  2. Specifying a TLS validation context on listeners
  3. Specifying (optionally) Subject Alternative Names in TLS validation

We will also be introducing support for external Secret Discovery Services (SDS) via unix domain socket. We are investigating support for SPIRE as an SDS provider in #68 and are generally looking into the experience across our platforms (i.e., ECS, Fargate, EKS).

The initial release of this feature will support file-based and SDS-based certificates and certificate authorities. We will explore the options for supporting ACM PCA after the initial release (#258).

This feature is an extension of our existing TLS support. For information, please see this documentation: https://docs.aws.amazon.com/app-mesh/latest/userguide/tls.html

Supporting Secrets Discovery Service over Unix Socket

An emerging pattern for TLS certificate binding in service mesh is through the use of the Envoy's Secret Discovery Service API. This option adds the ability for the proxy to connect to a local process (i.e. sidecar) which is hosting an SDS endpoint via a Unix Domain Socket (UDS). When using a technology like SPIRE, this would be the SPIRE Agent running on your infrastructure.

API Shapes

The models we’re adding to listeners and backends are very similar to the existing TLS shapes for listener certificates and TLS client policies on backends.

Listeners

listeners:
- ...
  tls:
    mode: "STRICT"
    # (OPTIONAL) ONE OF file, acm, (*NEW*) sds
    certificate:
      # (*NEW*) Configures Envoy to retrieve the certificate from SDS
      sds:
        secretName: "spiffe://mesh.com/cart-service"
    # (*NEW*) (OPTIONAL) Validation context for TLS connections to this listener
    validation:
      # (*NEW*) (REQUIRED) Determines where to retrieve the trust bundle
      # ONE OF of file, (*NEW*) sds
      trust:
        sds:
          secretName: "spiffe://mesh.com"
      # (*NEW*) (OPTIONAL) Subject Alternative Names to trust from server certificate
      # Must be FQDN or URI formatted. Limit of 20 SANs.
      subjectAlternativeNames:
        # (*NEW*) (REQUIRED) Specifies the matcher for SANs
        match:
          # (*NEW*) (REQUIRED) Matches exactly SANs from the peer certificate
          # Wildcards disallowed.
          exact:
          - "client.mesh.com"
          - "spiffe://mesh.com/clientservice"

Backends

Note on Virtual Gateways: The tls structure here will also be present within the clientPolicy structure for Virtual Gateways.

backends:
- virtualService:
    virtualServiceName: "cart.mesh.local"
    # (OPTIONAL) Client policy for this backend
    clientPolicy:
      # (OPTIONAL) Specifies TLS behavior for this backend
      tls: 
        # (OPTIONAL) When true, enforces the use of TLS for this backend
        # Default: true
        enforce: true
        # (OPTIONAL) Scope down which upstream ports to enforce TLS on
        # Default: all ports
        ports:
        - 443
        # (*NEW*) (OPTIONAL) Specifies the certificate to present to the backend
        # ONE OF file, (*NEW*) sds
        certificate:
          sds:
            secretName: "spiffe://mesh.com/cart-service"
        # (OPTIONAL) Validation context for TLS connections to this backend
        validation:
          # (REQUIRED) Determines where to retrieve the trust bundle
          # ONE OF acm, file, (*NEW*) sds
          trust:
            sds:
              secretName: "spiffe://mesh.com"
          # (*NEW*) (OPTIONAL) Subject Alternative Names to trust from client certificate
          # Must be FQDN or URI formatted. Limit of 20 SANs.
          subjectAlternativeNames:
            # (*NEW*) (REQUIRED) Specifies the matcher for SANs
            match:
              # (*NEW*) (REQUIRED) Matches exactly SANs from the peer certificate
              # Wildcards disallowed.
              exact:
              - "catalog.mesh.com"
              - "spiffe://mymesh/catalog-service"
 # (OPTIONAL) Specify options to apply to all backends
backendDefaults:
  clientPolicy:
    # Same structure as above
    tls: ...

Example

For brevity, most of the configuration is omitted. This simple example shows a downstream virtual node, using file-based certificates, backend by an upstream virtual node using SDS. While this illustrates mixed sources, within a mesh, you would be likely to use a single strategy (e.g. using SPIRE to vend all TLS materials via SDS to all proxies).

Upstream (server)

listeners:
- ...
  tls:
    mode: "STRICT"
    certificate:
      sds:
        secretName: "spiffe://mesh.com/backend-service"
    validation:
      trust:
        sds:
          secretName: "spiffe://mesh.com"
      subjectAlternativeNames:
        match:
          exact:
          - "client.mesh.com"

Downstream (client)

backends:
- virtualService:
    virtualServiceName: "backend.mesh.com"
    clientPolicy:
      tls:
        certificate:
          file:
            privateKey: "/path/to/key.pem"
            certificateChain: "/path/to/chain.pem"
        validation:
          trust:
            file:
              certificateChain: "/path/to/chain.pem"
          subjectAlternativeNames:
            match:
              exact:
              - "spiffe://mymesh/backend-service"

Summary

We hope this will enable your use-cases for mutual authentication within your meshes. We’ll update this issue once the feature is enabled in our Preview Channel. As always, we’d love to get your feedback on the feature. A few questions to get started:

  1. We are looking to limit the number of SANs per validation context (currently at 20) and limit the format. Specifically, excluding wildcards and requiring FQDN and URI formatting. Will this meet your needs? What kind of requirements do you have for length, count, and format for Subject Alternative Names?
  2. For those of you looking to take advantage of the SDS support, will this API meet your needs? What integration are you looking to use?
  3. Do you need a deep integration with ACM PCA to start using mTLS with App Mesh?

Thanks!

@rizblie
Copy link

rizblie commented Oct 15, 2020

RE: 1, limit is fine, but wildcards would make life easier.
RE: 2, looks fine.
RE: 3, not necessarily, but definitely would be good to have this at the earliest opportunity.

@bigdefect
Copy link
Contributor

Hello everyone. The Mutual TLS feature is now available in Preview.

You can find documentation about the feature here: https://docs.aws.amazon.com/app-mesh/latest/userguide/mutual-tls.html

And two walkthroughs:

As always, feel free to leave any feedback. Thanks!

@rberkovi
Copy link

Please advise, if it will work with applications running on EC2?

@bigdefect
Copy link
Contributor

Hey @rberkovi Are you referring to running App Mesh with ECS/EKS on EC2 with file-based certificates? That is certainly supported alongside our other features.

SPIRE on ECS (EC2 or Fargate) is not supported.

@rberkovi
Copy link

@efe-selcuk Our setup is applications running on EC2 (Tomcat, IIS), no containers. Have ALB before them with path base routing to target group

@bigdefect
Copy link
Contributor

@rberkovi Ah, are you on-boarding with App Mesh for the first time?

You can use App Mesh on EC2. We don't yet have the same resources we provide for containerized workloads (e.g. #161), so you would have to configure the iptables rules on your instances and the bootstrap configuration (see #264) for Envoy. Check out the iptables script at the bottom of this guide.

The rest of our features (including TLS/mTLS) are not restricted by platform.

If you have more general questions, we'd be happy to chat.

@bcelenza bcelenza removed their assignment Dec 3, 2020
@isaac-mj
Copy link

isaac-mj commented Jan 7, 2021

Hello folks - Do you have an idea of when this feature will come out of preview and be released into production?

@bigdefect
Copy link
Contributor

@isaac-mj We can't share dates or timelines, but the feature is currently in active development.

@bigdefect
Copy link
Contributor

Hey everyone. Mutual TLS is now Generally Available!

https://aws.amazon.com/about-aws/whats-new/2021/02/aws-app-mesh-supports-mutual-tls-authentication/

We're super excited to get this into your hands. The features are available in the AWS SDK and AWS Console.

CloudFormation support is currently slated for next week. We'll post an update here once it's live.

@bigdefect bigdefect added Roadmap: Shipped and removed Phase: Working on it Roadmap: Accepted We are planning on doing this work. labels Feb 4, 2021
@bigdefect
Copy link
Contributor

CloudFormation support is available in all regions. The new fields are not yet available in CloudFormation documentation, they should be published in about a week.

We'll get the walkthrough updated to use CloudFormation so you have something to reference. However the field names are all in line with the API.

@bigdefect
Copy link
Contributor

CloudFormation docs have been published. Resolving this issue. Please cut us a new one if you have any findings/feedback. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests