Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Vault secrets cannot be injected into Kubernetes pods - AWS/RHEL/flannel|canal #1500

Closed
przemyslavic opened this issue Jul 29, 2020 · 2 comments

Comments

@przemyslavic
Copy link
Collaborator

przemyslavic commented Jul 29, 2020

Describe the bug
Cannot inject Vault secrets into Kubernetes pods in the following configurations:

- AWS/RHEL/flannel

- AWS/RHEL/canal

To Reproduce
Steps to reproduce the bug:

  1. Deploy a new cluster
  2. Login to Vault vault login
  3. Add a secret vault kv put secret/devwebapp/config username='test' password='test'
  4. Deploy test application
apiVersion: apps/v1
kind: Deployment
metadata:
  name: devwebapp
  labels:
    app: devwebapp
spec:
  replicas: 1
  selector:
    matchLabels:
      app: devwebapp
  template:
    metadata:
      labels:
        app: devwebapp
      annotations:
        vault.hashicorp.com/agent-inject: "true"
        vault.hashicorp.com/role: "devweb-app"
        vault.hashicorp.com/agent-inject-secret-credentials.txt: "secret/data/devwebapp/config"
        vault.hashicorp.com/tls-skip-verify: "true"
    spec:
      serviceAccountName: internal-app
      containers:
      - name: app
        image: busybox
        command:
        - sleep
        - "3600"
        imagePullPolicy: IfNotPresent
  1. Check logs for container vault-agent-init: kubectl logs devwebapp-xxx-xxx -c vault-agent-init
  2. Check if the secret exists in the application container kubectl exec devwebapp-xxx-xxx -c app -- cat /vault/secrets/credentials.txt

Expected behavior
The secrets have been injected properly into the pod and are accessible from within the pod.

Config files
Configuration that should be included in the yaml file:

---
kind: configuration/vault
title: Vault Config
name: default
provider: aws
specification:
  vault_enabled: true

OS (please complete the following information):

  • OS: [RHEL]

Cloud Environment (please complete the following information):

  • Cloud Provider [AWS]

Actual behavior:
There is only one container named 'app'.

[ec2-user@ec2-xx-xx-xx-xx ~]$ kubectl get pods -A
NAMESPACE              NAME                                                                      READY   STATUS    RESTARTS   AGE
default                devwebapp-xx-xx                                                 1/1     Running   0          5s

Neither vault-agent-init nor vault-agent containers exist.
There is no possibility to inject secrets.

[ec2-user@ec2-xx-xx-xx-xx ~]$ kubectl logs devwebapp-xx-xx -c vault-agent-init
error: container vault-agent-init is not valid for pod devwebapp-xx-xx
[ec2-user@ec2-xx-xx-xx-xx ~]$ kubectl logs devwebapp-xx-xx -c vault-agent
error: container vault-agent is not valid for pod devwebapp-xx-xx

Additional context
Apiserver logs showing the issue:

I0723 13:46:14.698896       1 trace.go:116] Trace: "Call mutating webhook" configuration:vault-agent-injector-cfg,webhook:vault.hashicorp.com,resource:/v1, Resource=pods,subresource:,operation:CREATE,UID:xxx (started: 2020-07-23 13:45:44.698698084 +0000 UTC m=+5802.412268297) (total time: 30.000155842s):
Trace: [30.000155842s] [30.000155842s] END
W0723 13:46:14.698966       1 dispatcher.go:168] Failed calling webhook, failing open vault.hashicorp.com: failed calling webhook "vault.hashicorp.com": Post https://vault-agent-injector-svc.vault.svc:443/mutate?timeout=30s: context deadline exceeded
E0723 13:46:14.698984       1 dispatcher.go:169] failed calling webhook "vault.hashicorp.com": Post https://vault-agent-injector-svc.vault.svc:443/mutate?timeout=30s: context deadline exceeded
I0723 13:46:14.702704       1 trace.go:116] Trace: "Create" url:/api/v1/namespaces/default/pods,user-agent:kube-controller-manager/v1.17.7 (linux/amd64) kubernetes/b445510/system:serviceaccount:kube-system:replicaset-controller,client:10.1.2.210 (started: 2020-07-23 13:45:44.693306426 +0000 UTC m=+5802.406876613) (total time: 30.00934833s):
Trace: [30.005736493s] [30.005653171s] About to store object in database

I also tested with tls disabled.
Exactly the same two configurations AWS/RHEL/flannel and AWS/RHEL/canal do not work properly.

Originally posted by @przemyslavic in #1398 (comment)

@atsikham
Copy link
Contributor

atsikham commented Sep 3, 2020

Can be reproduced in 0.7.0 with listed configurations, but not with develop branch.

develop results (HEAD is f4f2e5dc6f2926e54da38ca97cb7613cac9596af)

[root@ec2-54-162-110-36 ec2-user]# kubectl logs vault-agent-injector-7cf744b6fc-mxpq7 -n vault
2020-09-03T06:45:28.154Z [INFO]  handler: Starting handler..
Listening on ":8080"...
Updated certificate bundle received. Updating certs...
2020-09-03T06:57:45.206Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=30s

image.png

0.7.0

image.png

verified with configurations:

---
kind: epiphany-cluster
title: Epiphany cluster Config
provider: aws
name: default
specification:
  name: vault-7<0|1>-<canal|flannel>
  prefix: atsikham
  admin_user:
    name: ec2-user
    key_path: /home/vscode/.ssh/id_rsa
  cloud:
    use_public_ips: true
    credentials:
      key: <replace>
      secret: <replace>
    region: us-east-1
  components:
    kubernetes_master:
      count: 1
      machine: kubernetes-master-machine-rhel
      subnets:
        - availability_zone: us-east-1a
          address_pool: 10.1.2.0/24
    kubernetes_node:
      count: 2
      machine: kubernetes-node-machine-rhel
      subnets:
        - availability_zone: us-east-1a
          address_pool: 10.1.2.0/24
    logging:
      count: 0
    monitoring:
      count: 0
    kafka:
      count: 0
    postgresql:
      count: 0
    load_balancer:
      count: 0
    rabbitmq:
      count: 0
version: <replace>
---
kind: configuration/vault
title: Vault Config
name: default
provider: aws
specification:
  vault_enabled: true
---
kind: infrastructure/virtual-machine
name: kubernetes-master-machine-rhel
provider: aws
based_on: kubernetes-master-machine
specification:
  os_full_name: RHEL-7.8_HVM_GA-20200225-x86_64-1-Hourly2-GP2
---
kind: infrastructure/virtual-machine
name: kubernetes-node-machine-rhel
provider: aws
based_on: kubernetes-node-machine
specification:
  os_full_name: RHEL-7.8_HVM_GA-20200225-x86_64-1-Hourly2-GP2
---
kind: configuration/kubernetes-master
name: default
provider: aws
specification:
  advanced:
    networking:
      plugin: <canal|flannel>

@przemyslavic przemyslavic self-assigned this Sep 4, 2020
@przemyslavic
Copy link
Collaborator Author

I confirm, there is no issue anymore on the current develop version. The changes made to version 0.7.1 fixed the problem.

@mkyc mkyc closed this as completed Sep 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants