Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vault not booting on AWS using awskms seal #368

Closed
marcofranssen opened this issue Aug 4, 2020 · 14 comments
Closed

Vault not booting on AWS using awskms seal #368

marcofranssen opened this issue Aug 4, 2020 · 14 comments
Labels
bug Something isn't working

Comments

@marcofranssen
Copy link

I'm deploying vault on EKS using following values.yaml.

server:
  image:
    tag: 1.5.0

  standalone:
    enabled: true
    config: |
      ui = true

      listener "tcp" {
        tls_disable = 1
        address = "[::]:8200"
        cluster_address = "[::]:8201"
      }

      storage "file" {
        path = "/vault/data"
      }

      service_registration "kubernetes" {}

      seal "awskms" {
        kms_key_id = "my-kms-key-id"
      }

  service:
    enabled: true

  dataStorage:
    enabled: true
    size: 10Gi
    storageClass: null
    accessMode: ReadWriteOnce

  ui:
    enabled: true
    serviceType: LoadBalancer

I'm continuously getting the following error.

$ vault status
Error checking seal status: Get "http://127.0.0.1:8200/v1/sys/seal-status": dial tcp 127.0.0.1:8200: connect: connection refused

This prevents me from initializing vault.

My KMS key does have the required policy so the EKS node can access it.

$ aws kms list-grants --key-id my-kms-key-id
{
    "Grants": [
        {
            "KeyId": "arn:aws:kms:eu-west-1:759729069112:key/my-kms-key-id",
            "GrantId": "756045c9f34bc690c9a1fcb225de614566355a9d1aa2d5f92dbb547abca3bcc2",
            "Name": "k8s-worker-nodes-vault",
            "CreationDate": "2020-08-04T10:06:37+02:00",
            "GranteePrincipal": "arn:aws:iam::759729069112:role/dctna-dev-cluster20200804080403426400000007",
            "IssuingAccount": "arn:aws:iam::759729069112:root",
            "Operations": [
                "Decrypt",
                "Encrypt",
                "DescribeKey"
            ]
        }
    ]
}

Is there a way to debug?

I have also tried using the default image tag 1.4.2, also this one fails.

@cbohrtarwater
Copy link

I've got it working with this awskms block:

seal "awskms" {
   region      = "us-east-1"
   kms_key_id  = "QQQQQQQQ"
}

Maybe see if adding the region helps?

@echoboomer
Copy link
Contributor

I actually completely bypassed the awskms block in the config and instead used the vars:

          extraEnvironmentVars:
            VAULT_SEAL_TYPE: awskms
          extraSecretEnvironmentVars:
            - envName: AWS_ACCESS_KEY_ID
              secretName: vault-aws-auth
              secretKey: VAULT_AWS_ACCESS_KEY_ID
            - envName: AWS_SECRET_ACCESS_KEY
              secretName: vault-aws-auth
              secretKey: VAULT_AWS_SECRET_ACCESS_KEY
            - envName: AWS_REGION
              secretName: vault-aws-auth
              secretKey: VAULT_AWS_REGION
            - envName: VAULT_AWSKMS_SEAL_KEY_ID
              secretName: vault-aws-auth
              secretKey: VAULT_AWSKMS_SEAL_KEY_ID

@tvoran
Copy link
Member

tvoran commented Aug 6, 2020

@echoboomer As @cbohrtarwater mentioned, I've usually seen region specified in the awskms config block.

As for debugging it, we're working on adding more logging around the awskms credential code, but in the meantime check the logs of the vault container? vault should eventually log the error it encountered and which credentials it's using.

@marcofranssen
Copy link
Author

marcofranssen commented Aug 7, 2020

I have tried to provide the region both in config as well using environment variables.

There are no k8s logs neither logs in /vault/logs within the container.

Running below in the pod.

$ vault status
Error checking seal status: Get "http://127.0.0.1:8200/v1/sys/seal-status": dial tcp 127.0.0.1:8200: connect: connection refused

The helm value I use are the following.

server:
  standalone:
    enabled: true

  extraEnvironmentVars:
    VAULT_SEAL_TYPE: awskms
    VAULT_AWS_REGION: $VAULT_KMS_KEY_ID
    VAULT_AWSKMS_SEAL_KEY_ID: $AWS_REGION

  dataStorage:
    enabled: true
    size: 10Gi
    storageClass: null
    accessMode: ReadWriteOnce

  service:
    enabled: true

  ui:
    enabled: true
    serviceType: LoadBalancer

Also tried this

server:
  standalone:
    enabled: true
    config: |
      ui = true

      listener "tcp" {
        tls_disable = 1
        address = "[::]:8200"
        cluster_address = "[::]:8201"
      }

      storage "file" {
        path = "/vault/data"
      }

      service_registration "kubernetes" {}

      seal "awskms" {
        kms_key_id = "$VAULT_KMS_KEY_ID"
        region = "$AWS_REGION"
      }

  dataStorage:
    enabled: true
    size: 10Gi
    storageClass: null
    accessMode: ReadWriteOnce

  service:
    enabled: true

  ui:
    enabled: true
    serviceType: LoadBalancer

Both are resulting in same issue, causing vault to not be reachable.

@marcofranssen
Copy link
Author

When explicitly setting the following environment values it works.

    AWS_ACCESS_KEY_ID: $AWS_ACCESS_KEY_ID
    AWS_SECRET_ACCESS_KEY: $AWS_SECRET_ACCESS_KEY
    AWS_SESSION_TOKEN: $AWS_SESSION_TOKEN

According to documentation it should also be possible to run without these and take the role principle.
https://www.vaultproject.io/docs/configuration/seal/awskms#kms_key_id

We are sure we managed to get it working once, but can't reproduce.

@tvoran
Copy link
Member

tvoran commented Aug 14, 2020

So vault should eventually crash and log which AWS credentials it's trying to use for auto-unseal, and why they failed. Something like this:

Error parsing Seal configuration: error fetching AWS KMS wrapping key information: AccessDeniedException: User: <role-arn> is not authorized to perform: kms:DescribeKey on resource: <kms-arn>
    status code: 400, request id: f273c40f-ff81-47f4-b0d6-0898161a287d

I have noticed that sometimes it takes a few minutes for the AWS auth to fail, the vault pod then exits with an error, and then the pod will restart. So you may need to use something like kubectl logs vault-0 --previous or k9s to see the log easily.

@tvoran
Copy link
Member

tvoran commented Dec 7, 2020

In case it helps, we added more debug logging to the awskms auto-unseal credential code in Vault 1.6: hashicorp/vault#9794

And as of Vault 1.5.5 and 1.6.0 we also decreased the timeout in the AWS client so that a failure to authenticate returns faster: https://www.vaultproject.io/docs/upgrading/upgrade-to-1.5.0#aws-instance-metadata-timeout

@cabrinha
Copy link

cabrinha commented Dec 18, 2020

Does EKS auto-unseal work with IAM Roles for Service Accounts?

When my stateful set comes up, I get:

==> Vault server configuration:                                                                                                                                        
           AWS KMS KeyID: XXXX                                                
          AWS KMS Region: us-west-2                                                                                                                                    
             Api Address: http://192.168.23.8:8200                                                                                                                     
                     Cgo: disabled                                                                                                                                     
         Cluster Address: https://vault-0.vault-internal:8201                                                                                                          
              Go Version: go1.15.4                                                                                                                                     
              Listener 1: tcp (addr: "[::]:8200", cluster address: "[::]:8201", max_request_duration: "1m30s", max_request_size: "33554432", tls: "disabled")          
               Log Level: info                                                                                                                                         
                   Mlock: supported: true, enabled: false                                                                                                              
           Recovery Mode: false                                                                                                                                        
                 Storage: dynamodb (HA available)                                                                                                                      
                 Version: Vault v1.6.1                                                                                                                                 
             Version Sha: 6d2db3f033e02e70202bef9ec896360062b88b03                                                                                                     
==> Vault server started! Log data will stream in below:                                                                                                               
2020-12-19T00:23:25.084Z [INFO]  proxy environment: http_proxy= https_proxy= no_proxy=                                                                                 
2020-12-19T00:23:26.618Z [WARN]  core: entering seal migration mode; Vault will not automatically unseal even if using an autoseal: from_barrier_type=shamir to_barrier
_type=awskms

Using vault 1.6.1, seems like vault starts in seal migration mode.

Same behavior when using the IAM role on the EC2 instance. Doesn't even get to the authentication step.

@tvoran
Copy link
Member

tvoran commented Dec 19, 2020

@cabrinha Yes, Vault's auto-unseal works with EKS and IAM Roles for Service Accounts. Though it looks like this was unsealed using shamir, then started up with auto-unseal configured. I'd suggest taking a look at the migration docs: https://www.vaultproject.io/docs/concepts/seal#seal-migration

@cabrinha
Copy link

cabrinha commented Dec 19, 2020

@cabrinha Yes, Vault's auto-unseal works with EKS and IAM Roles for Service Accounts. Though it looks like this was unsealed using shamir, then started up with auto-unseal configured. I'd suggest taking a look at the migration docs: https://www.vaultproject.io/docs/concepts/seal#seal-migration

End goal is to not need to migrate at all.

After deleting and recreating my KMS key, DynamoDB table and statefulset, I'm now getting this error:

2020-12-19T01:00:31.763Z [INFO]  core.autoseal: seal configuration missing, but cannot check old path as core is sealed: seal_type=recovery
2020-12-19T01:00:36.390Z [INFO]  core: stored unseal keys supported, attempting fetch
2020-12-19T01:00:36.393Z [WARN]  failed to unseal core: error="stored unseal keys are supported, but none were found"
2020-12-19T01:00:36.760Z [INFO]  core: security barrier not initialized
2020-12-19T01:00:36.763Z [INFO]  core.autoseal: seal configuration missing, but cannot check old path as core is sealed: seal_type=recovery

Is it required to run k exec -it vault-0 -- vault operator init on the first unseal?

@tvoran
Copy link
Member

tvoran commented Dec 19, 2020

Yeah, if you just want auto-unseal, set an awskms seal config block:

seal "awskms" {
  region     = "us-west-2"
  kms_key_id = "alias/my-vault-role-key"
}

And either add the role annotation to the service account in the chart values:

server:
  serviceAccount:
    annotations:
      eks.amazonaws.com/role-arn: <role-arn>

or specify which existing service account to use: https://www.vaultproject.io/docs/platform/k8s/helm/configuration#serviceaccount

server:
  serviceAccount:
    create: false
    name: vault

EKS will set AWS_ROLE_ARN and AWS_WEB_IDENTITY_TOKEN_FILE environment variables in the pod if IRSA is setup correctly, and the awskms logic will attempt to use those credentials for accessing the KMS (turn on debug logging for more info in that part of the process). Then run vault operator init.

(If you haven't see this tutorial you may also find it useful: https://learn.hashicorp.com/tutorials/vault/autounseal-aws-kms)

@worldofgeese
Copy link

worldofgeese commented Jan 16, 2021

Yeah, if you just want auto-unseal, set an awskms seal config block:

seal "awskms" {
  region     = "us-west-2"
  kms_key_id = "alias/my-vault-role-key"
}

And either add the role annotation to the service account in the chart values:

server:
  serviceAccount:
    annotations:
      eks.amazonaws.com/role-arn: <role-arn>

or specify which existing service account to use: https://www.vaultproject.io/docs/platform/k8s/helm/configuration#serviceaccount

server:
  serviceAccount:
    create: false
    name: vault

EKS will set AWS_ROLE_ARN and AWS_WEB_IDENTITY_TOKEN_FILE environment variables in the pod if IRSA is setup correctly, and the awskms logic will attempt to use those credentials for accessing the KMS (turn on debug logging for more info in that part of the process). Then run vault operator init.

(If you haven't see this tutorial you may also find it useful: https://learn.hashicorp.com/tutorials/vault/autounseal-aws-kms)

I have IRSA working for cluster_autoscaler. I defined a new role based on the one used for cluster_autoscaler in Terraform:

 module "iam_assumable_role_vault" {
  source                        = "terraform-aws-modules/iam/aws//modules/iam-assumable-role-with-oidc"
  version                       = "2.14.0"
  create_role                   = true
  role_name                     = "vault"
  provider_url                  = replace(module.eks.cluster_oidc_issuer_url, "https://", "")
  role_policy_arns              = [aws_iam_policy.vault.arn]
  oidc_fully_qualified_subjects = ["system:serviceaccount:vault:vault"]
}


resource "aws_iam_policy" "vault" {
  name_prefix = "vault"
  description = "EKS vault cluster ${module.eks.cluster_id}"
  policy      = data.aws_iam_policy_document.vault.json
}

data "aws_iam_policy_document" "vault" {
  statement {
    sid    = "VaultKMSUnseal"
    effect = "Allow"

    actions = [
      "kms:Encrypt",
      "kms:Decrypt",
      "kms:DescribeKey",
    ]

    resources = ["*"]
  }
}

vault-chart-values.yaml:

global:
  enabled: true

server:
  serviceAccount:
    annotations:
      # attach IAM vault role so Vault can interact with, and unlock, our vault
      eks.amazonaws.com/role-arn: arn:aws:iam::472409228388:role/vault
  extraEnvironmentVars:
    VAULT_SEAL_TYPE: awskms
  extraSecretEnvironmentVars:
    - envName: AWS_ACCESS_KEY_ID
      secretName: eks-creds
      secretKey: AWS_ACCESS_KEY_ID
    - envName: AWS_SECRET_ACCESS_KEY
      secretName: eks-creds
      secretKey: AWS_SECRET_ACCESS_KEY
    - envName: AWS_REGION
      secretName: eks-creds
      secretKey: VAULT_AWS_REGION
    - envName: VAULT_AWSKMS_SEAL_KEY_ID
      secretName: eks-creds
      secretKey: VAULT_AWSKMS_SEAL_KEY_ID

  ha:
    enabled: true

I kubectl port-forward -n vault vault-0 8200:8200. vault status shows:

Key                      Value
---                      -----
Recovery Seal Type       awskms
Initialized              false
Sealed                   true
Total Recovery Shares    0
Threshold                0
Unseal Progress          0/0
Unseal Nonce             n/a
Version                  1.6.1
Storage Type             consul
HA Enabled               true

So far, so good. Here's where I encounter difficulties.
vault operator init -key-shares=5 -key-threshold=3 gives me the following permissions error:

Error initializing: Error making API request.

URL: PUT http://127.0.0.1:8200/v1/sys/init
Code: 400. Errors:

* failed to store keys: failed to encrypt keys for storage: error encrypting dat
a: AccessDeniedException: User: arn:aws:sts::472409228388:assumed-role/vault/161
0984202537365570 is not authorized to perform: kms:Encrypt on resource: arn:aws:
kms:eu-north-1:472409228388:key/260757fd-c5e8-49b4-972d-66fb5a366a1e
        status code: 400, request id: 1abb7c91-0f8c-4175-a745-7f103d07169a

EKS sets AWS_ROLE_ARN and AWS_WEB_IDENTITY_TOKEN_FILE correctly:
image

kubectl run --serviceaccount=vault --rm -i --tty --attach amazonlinux --image=amazonlinux -- /bin/bash -c "yum update -y && yum install awscli -y && aws sts get-caller-identity && aws sts assume-role-with-web-identity --role-session-name test --role-arn arn:aws:iam::472409228388:role/vault --web-identity-token file://var/run/secrets/eks.amazonaws.com/serviceaccount/token" returns:

Complete!
{
    "Account": "accountid", 
    "UserId": "AROAW37OIYRSAI6NDAM6A:botocore-session-1610986297", 
    "Arn": "arn:aws:sts::472409228388:assumed-role/vault/botocore-session-1610986297"
}
{
    "AssumedRoleUser": {
        "AssumedRoleId": "AROAW37OIYRSAI6NDAM6A:test", 
        "Arn": "arn:aws:sts::472409228388:assumed-role/vault/test"
    }, 
    "Audience": "sts.amazonaws.com", 
    "Provider": "arn:aws:iam::472409228388:oidc-provider/oidc.eks.eu-north-1.amazonaws.com/id/accountid", 
    "SubjectFromWebIdentityToken": "system:serviceaccount:vault:vault", 
    "Credentials": {
        "SecretAccessKey": "longkey", 
        "SessionToken": "reallylongsessiontoken", 
        "Expiration": "2021-01-18T17:11:40Z", 
        "AccessKeyId": "accesskeyid"
    }
}

What am I doing wrong here?

@worldofgeese
Copy link

This turned out to be KMS key related. For some reason the original key just didn't want to allow permissions. Created a new key and boom, all is well

@tvoran
Copy link
Member

tvoran commented Jan 27, 2021

Glad you got it figured out, @worldofgeese! Closing this issue for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants