Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confirm handling of consul-clients in separate cluster to consul-server (non-federated) #3280

Closed
nabadger opened this issue Nov 29, 2023 · 4 comments
Labels
type/bug Something isn't working

Comments

@nabadger
Copy link

nabadger commented Nov 29, 2023

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request. Searching for pre-existing feature requests helps us consolidate datapoints for identical requirements into a single place, thank you!
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request.
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment.

Overview of the Issue

I am upgrading the helm chart from 0.41.1 to 0.49.8 (in preparation for future 1.x updates).

Server & Client consul clusters are running:

image: 'index.docker.io/hashicorp/consul:1.13.9',
imageK8S: 'index.docker.io/hashicorp/consul-k8s-control-plane:0.49.8',
imageEnvoy: 'index.docker.io/envoyproxy/envoy:v1.23.10',

Currently I cannot get the new k8s auth method to work.

One of the key changes are the feature changes described here https://github.com/hashicorp/consul-k8s/blob/main/CHANGELOG.md#0420-april-04-2022

The one I think I'm having issues with is:

Support issuing global ACL tokens via k8s auth method. [https://github.com//pull/1075]

I'm finding the client-acl-init container in the consul-client daemonset is failing to run and produces the following error:

{"@level":"error","@message":"Consul login failed","@timestamp":"2023-11-29T10:11:00.779661Z","error":"error logging in: Unexpected response code: 500 (rpc error making call: Post \"https://kubernetes.default.svc/apis/authentication.k8s.io/v1/tokenreviews\": tls: failed to verify certificate: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"kubernetes\"))"}

From my understanding this error is actually from the consul-server (i see the same on the log entry).

Environment details

The architecture of our setup is that we have consul-servers running on 1 cluster (a dedicated cluster for consul-servers and termianting gateways) and a number of separate kubernetes clusters running consul-client. These are connected via vpc-peering.

We run Kubernetes 1.24 (EKS) on both server/client clusters.

Additional Context

In our previous setup the client-acl-init init-container was configured like so:

 initContainers:
  - command:
    - /bin/sh
    - -ec
    - |
      consul-k8s-control-plane acl-init \
        -secret-name="consul-client-client-acl-token" \
        -k8s-namespace=hybrid-consul \
        -init-type="client"
    image: index.docker.io/hashicorp/consul-k8s-control-plane:0.41.1
    name: client-acl-init
    volumeMounts:
    - mountPath: /consul/aclconfig
      name: aclconfig
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-5n66f
      readOnly: true
  serviceAccount: consul-client-client
  serviceAccountName: consul-client-client

In the new setup (via the helm-chart upgrade) we have:

  initContainers:
  - command:
    - /bin/sh
    - -ec
    - |
      consul-k8s-control-plane acl-init \
        -component-name=client \
        -acl-auth-method="consul-client-k8s-component-auth-method" \
        -log-level=debug \
        -log-json=true \
        -use-https \
        -server-address="provider=k8s kubeconfig=/consul/userconfig/consul-server-kubeconfig/kubeconfig label_selector=\"app=consul,component=server\" namespace=\"hybrid-consul\"" \
        -server-port=8501 \
        -tls-server-name=server.eu-west-2-dev.hybrid \
        -consul-api-timeout=5s \
        -init-type="client"
    env:
    - name: CONSUL_HTTP_ADDR
      value: https://consul-client-server.hybrid-consul.svc:8501
    - name: CONSUL_CACERT
      value: /consul/tls/ca/tls.crt
    image: index.docker.io/hashicorp/consul-k8s-control-plane:0.49.8
    name: client-acl-init
    volumeMounts:
    - mountPath: /consul/aclconfig
      name: aclconfig
    - mountPath: /consul/login
      name: consul-data
    - mountPath: /consul/tls/ca
      name: consul-ca-cert
    - mountPath: /consul/userconfig/consul-server-kubeconfig
      name: consul-server-kubeconfig
      readOnly: true
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-tskx2
      readOnly: true
  serviceAccount: consul-client-client
  serviceAccountName: consul-client-client

I've checked that consul-client-k8s-component-auth-method exists as an auth method on the server.

Any tips on how to debug this?

@nabadger nabadger added the type/bug Something isn't working label Nov 29, 2023
@nabadger
Copy link
Author

nabadger commented Nov 29, 2023

I suspect this is related to our architecture.

In our consul-server clusters we also run the consul-client daemonset there, and they are working fine.

The same consul-client daemonset on remote (client only) kubernetes clusters are the ones showing the issue.

@nabadger
Copy link
Author

nabadger commented Nov 29, 2023

I've made some progress here, but not sure if I'm going in the right direction :)

I think the general approach to get this working could be (for-each consul client cluster):

  1. Create additional Kubernetes Auth Methods for every consul-client component (at least those that talk to consul-server with different service accounts)
  2. Patch use of -acl-auth-method="consul-client-k8s-component-auth-method to use new auth-methods
  3. Create additional RBAC on the consul-client clusters to support tokenreviews

The example here shows how I got the client-acl-init working, and is mostly based on https://medium.com/@shy_40788/intro-to-hashicorp-consuls-kubernetes-authentication-181bc1418318

kubectl port-forward -n hybrid-consul consul-server-1 8500      

export CONSUL_HTTP_TOKEN=$(kubectl get secret -n hybrid-consul consul-bootstrap-acl-token --template='{{.data.token | base64decode }}')

consul acl auth-method create -type kubernetes -name "consul-client-k8s-component-auth-method_eu-dev1" -description "Kubernetes Auth Method" -kubernetes-host "https://$redacted.eu-west-2.eks.amazonaws.com" [email protected] -kubernetes-service-account-jwt="$JWT"

In this case the kubernetes-ca-cert and kubernetes-service-account-jwt come from the secret generated by the consul-client-client service-account (extract from the secret and base-64 decode them).

I also needed some cluster-role updates on the consul-client cluster

kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: poc-consul-review-tokens
  namespace: hybrid-consul
subjects:
- kind: ServiceAccount
  name: consul-client-client
  namespace: hybrid-consul
roleRef:
  kind: ClusterRole
  name: system:auth-delegator
  apiGroup: rbac.authorization.k8s.io
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: poc-consul-service-account-getter
  namespace: hybrid-consul
rules:
- apiGroups: [""]
  resources: ["serviceaccounts"]
  verbs: ["get"]
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: poc-consul-get-service-accounts
  namespace: hybrid-consul
subjects:
- kind: ServiceAccount
  name: consul-client-client
  namespace: hybrid-consul
roleRef:
  kind: ClusterRole
  name: poc-consul-service-account-getter
  apiGroup: rbac.authorization.k8s.io

At this point the client-acl-init container works, and now makes it to the main consul container (which also works for me).

@nabadger
Copy link
Author

nabadger commented Nov 29, 2023

So I think where I'm at now is.

  1. This isn't a bug, but is it a setup that should be better supported in the helm charts? There's perhaps some missing rbac and some hard-coded options around which auth method to use (maybe this could be more custom)
  2. It does suggest that we would need to deploy consul-clients in a broken state, whilst we extract the service-account secrets and populate the new auth methods (is there a better way to achieve this)
  3. Is this is a common setup at all - not sure if I'm doing something strange here...

Also wondering how the default k8s auth-methods are created - I tried to match them against existing service-accounts (well secret tokens) in the consul-server cluster, but they didn't seem to align...

edit

I think I can just make use of the existing service-acl-init Job and just manipulate the args to get the auth methods I need.

This may need custom service-accounts/secrets to be created and some role-binding adjustments - essentially I can just call with a custom -auth-method-host and resource-prefix args.

@nabadger nabadger changed the title Upgrading to 0.49.8 causing failed to verify certificate for tokenreview service Confirm handling of consul-clients in separate cluster to consul-server (non-federated) Nov 30, 2023
@nabadger
Copy link
Author

nabadger commented Dec 5, 2023

Closing this as i got it working with custom manifests.

@nabadger nabadger closed this as completed Dec 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant