Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for the VenafiConnection CRD so that users can start using the Workload Identity Federation authentication ("secretless") #552

Merged
merged 16 commits into from
Aug 22, 2024

Conversation

maelvls
Copy link
Member

@maelvls maelvls commented Jul 16, 2024

We have decided to make the Venafi Kubernetes Agent compatible with the VenafiConnection CRD in order to bring support for secretless authentication to VCP (also called workload identity federation). When using the following flags in the Agent's deployment, the Agent uses the authentication method provided in the given VenafiConnection to connect to VCP:

agent --venafi-connection "venafi-components" --venafi-connection-namespace "venafi"

This PR relies on the PR https://github.com/jetstack/venafi-connection-lib/pull/220. Most of the actual changes are visible in this other PR.

Refs:

Work done in this PR

  • Write unit tests.
  • Show an error when the VenafiConnection uses the tpp field.
  • Show an error when the VenafiConnection uses the vcp field using the API key method. Even though the authorization header tppl-api-key: <apikey> is supported for uploading cluster data (see https://github.com/jetstack/venafi-connection-lib/pull/220#discussion_r1692041030), we will still show an error message. Slack message from Atanas:

    API keys will be deprecated in the future as we have more secure authentication methods. We don’t need to support API key in VC.

Work to be done in follow-up PRs

  • Add venafiConnection to the Helm chart → VC-35374
  • Write an E2E smoke test with Kind + Helm → VC-35374
  • Fix logging that shows vCert for all logs → VC-35375
    
    

What will be the UX?

To use the secretless auth (which is the entire point for bringing the Venafi Connection CRD into the agent), the user will first need to install the Venafi Connection CRD Helm chart.

Then, they will go to the UI to create an "Agent" Workload Identity Federation service account by filling in the Kubernetes cluster's issuer and JWKS URI (which needs to be repeated for each Kubernetes cluster).

Screenshot_26-7-2024_132029_ven-tlspk venafi cloud
Screenshot_26-7-2024_132051_ven-tlspk venafi cloud
Screenshot_26-7-2024_13217_ven-tlspk venafi cloud

Then, they will look for their "company ID" (which is called "tenant ID" in Venafi Connection) by extracting it from the "Token URL" visible in the UI. For example, the token URL in the UI may look like:

https://api.venafi.cloud/v1/oauth2/v2.0/756db001-280e-11ee-84fb-991f3177e2d0/token
                                        <---------------------------------->
                                         companyID (also called tenant ID)

With this company ID, the user will create a VenafiConnection resource:

apiVersion: jetstack.io/v1alpha1
kind: VenafiConnection
metadata:
  # This venafi connection will be shared by the three components:
  # venafi-kubernetes-agent, venafi-enhanced-issuer, and
  # approver-policy-enterprise.
  name: venafi-components
  namespace: venafi
spec:
  vcp:
    accessToken:
      - serviceAccountToken:
          name: venafi-components
          audiences: [vcp]
      - vcpOAuth:
          tenantID: 756db001-280e-11ee-84fb-991f3177e2d0

Finally, they will install the agent's own Helm chart using the following command:

helm upgrade --install venafi-kubernetes-agent oci://registry.venafi.cloud/charts/venafi-kubernetes-agent \
  --namespace "venafi" \
  --set config.venafiConnection.enable="true" \
  --set config.venafiConnection.name="venafi-components" \
  --set config.venafiConnection.namespace="venafi" \
  --set config.server="https://api.venafi.cloud/" \
  --set config.clusterName="mael's kind cluster"

Testing Manually

For this test, we will be using the test tenant https://ven-tlspk.venafi.cloud (US). The user we will be using is [email protected], and the password is at the top of the document Production Accounts.

First, go to https://ven-tlspk.venafi.cloud/platform-settings/user-preferences?key=api-keys to get the API key. You will need it in the next step. Set the env var APIKEY:

export APIKEY=...

Now, we need a Kind cluster for which the OIDC endpoint is reachable from the internet. We will use kind-tailscale for this. First, create the cluster:

kind create cluster

Then, install Tailscale.

Then, install kind-tailscale (I wrote this) and run it

curl -sSLO https://gist.githubusercontent.com/maelvls/adf680ae01612ff79658872c7dca013f/raw/kind-tailscale
install kind-tailscale ~/bin

Run kind-tailscale (please review the contents of the script first):

kind-tailscale

Now, curl the OIDC configuration:

oidc_conf=$(curl https://$(tailscale status --json | jq -r .Self.DNSName):8443/.well-known/openid-configuration -sS --fail-with-body | tee /dev/stderr)

Now, you are ready to create the Workload Identity Federation Service Account:

# Don't forget to set your APIKEY (see above).
curl --fail-with-body -sS https://api.venafi.cloud/v1/serviceaccounts -H "tppl-api-key: $APIKEY" --json @- <<EOF
{
  "name": "$USER temp",
  "authenticationType": "rsaKeyFederated",
  "scopes": ["kubernetes-discovery-federated"],
  "subject": "system:serviceaccount:venafi:venafi-components",
  "audience": "vcp",
  "issuerURL": $(jq .issuer <<<"$oidc_conf"),
  "jwksURI": $(jq .jwks_uri <<<"$oidc_conf"),
  "owner": $(curl --fail-with-body -sS https://api.venafi.cloud/v1/teams -H "tppl-api-key: $APIKEY" | jq '.teams[0].id')
}
EOF

Note

If you need to delete the service account, use the command:

curl --fail-with-body -sS https://api.venafi.cloud/v1/serviceaccounts -H "tppl-api-key: $APIKEY" \
  | jq ".[] | select(.name == \"$USER temp\").id' -r \
  | xargs -L1 -I@ curl --fail-with-body -sSX DELETE https://api.venafi.cloud/v1/serviceaccounts/@ -H "tppl-api-key: $APIKEY"

Now, use the following command to clone the project venafi-connection-lib in the same folder as your clone of jetstack-secure:

# From the jetstack-secure folder.
git clone https://github.com/jetstack/venafi-connection-lib ../venafi-connection-lib
cd ../venafi-connection-lib
gh pr checkout 552
cd -

Now, install the VenafiConnection CRD:

helm upgrade -i venafi-connection oci://registry.venafi.cloud/charts/venafi-connection --version v0.1.0 -n venafi --create-namespace

Then, do:

# From the jetstack-secure folder.
go work init
go work use . ../venafi-connection-lib

Then, create the necessary RBAC and VenafiConnection:

kubectl apply -f - <<EOF
apiVersion: v1
kind: Namespace
metadata:
  name: venafi
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: venafi-components
  namespace: venafi
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: create-tokens-for-venafi-components
  namespace: venafi
rules:
- apiGroups: [ "" ]
  resources: [ "serviceaccounts/token" ]
  verbs: [ "create" ]
  resourceNames: [ "venafi-components" ]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: rolebinding-for-venafi-components
  namespace: venafi
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: create-tokens-for-venafi-components
subjects:
- kind: ServiceAccount
  name: venafi-connection
  namespace: venafi

Then, create the VenafiConnection:

tenant_domain=ven-tlspk
tenant_id=$(curl --fail-with-body -sS "https://api.venafi.cloud/v1/companies/$(jq -R -r '@uri' <<<"$tenant_domain")/loginconfig" | tee /dev/stderr | jq -r .companyId)
kubectl apply -f - <<EOF | tee /dev/stderr
apiVersion: jetstack.io/v1alpha1
kind: VenafiConnection
metadata:
  name: venafi-components
  namespace: venafi
spec:
  vcp:
    accessToken:
      - serviceAccountToken:
          name: venafi-components
          audiences: [vcp]
      - vcpOAuth:
          tenantID: $tenant_id
EOF

Run this:

cat >agent.yaml <<EOF
cluster_id: "$USER kind cluster"
cluster_description: "$USER's kind cluster"
server: https://api.venafi.cloud
data-gatherers:
  - kind: "dummy"
    name: "dummy"
period: 1m
EOF

Finally, run:

# From the jetstack-secure folder.
go run . agent -c agent.yaml --venafi-connection venafi-components --install-namespace venafi

@maelvls maelvls changed the title Venconn Add support for the VenafiConnection CRD Jul 16, 2024
cmd/agent.go Outdated Show resolved Hide resolved
go.mod Outdated Show resolved Hide resolved
Comment on lines 153 to 216
func loadRESTConfig(path string) (*rest.Config, error) {
switch path {
// If the kubeconfig path is not provided, use the default loading rules
// so we read the regular KUBECONFIG variable or create a non-interactive
// client for agents running in cluster
case "":
loadingrules := clientcmd.NewDefaultClientConfigLoadingRules()
cfg, err := clientcmd.NewNonInteractiveDeferredLoadingClientConfig(
loadingrules, &clientcmd.ConfigOverrides{}).ClientConfig()
if err != nil {
return nil, errors.WithStack(err)
}
return cfg, nil
// Otherwise use the explicitly named kubeconfig file.
default:
cfg, err := clientcmd.NewNonInteractiveDeferredLoadingClientConfig(
&clientcmd.ClientConfigLoadingRules{ExplicitPath: path},
&clientcmd.ConfigOverrides{}).ClientConfig()
if err != nil {
return nil, errors.WithStack(err)
}
return cfg, nil
}
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Self-review: I copy-pasted this from somewhere else in the Agent codebase. I'll find a better way than copy-pasting.

pkg/client/client_venconn.go Outdated Show resolved Hide resolved
@@ -0,0 +1,176 @@
package client
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mostly a copy-paste from the file client_oauth.go. Since both use one form or another of the OAuth flow, I think it would be good to merge both later on.

@maelvls maelvls marked this pull request as draft July 16, 2024 18:20
@maelvls maelvls force-pushed the venconn branch 3 times, most recently from 9a35e7a to 1deb969 Compare July 25, 2024 18:43
@maelvls maelvls changed the title Add support for the VenafiConnection CRD Add support for the VenafiConnection CRD so that users can start using the Workload Identity Federation authentication ("secretless") Jul 26, 2024
@maelvls maelvls force-pushed the venconn branch 2 times, most recently from 1ac29bc to bd26101 Compare July 26, 2024 15:58
@wallrj
Copy link
Member

wallrj commented Jul 26, 2024

Review

What will be the UX?

To use the secretless auth (which is the entire point for bringing the Venafi Connection CRD into the agent), the user will first need to install the Venafi Connection CRD Helm chart.

This sounds like a pain, but later I suppose we might

  • add the Venafi Connection CRD into the venafi-kubernetes-agent chart
  • add Venafi connection chart as a sub-chart of the venafi-kubernetes-agent chart
  • add venafi connection chart as a dependency of venafi-kubernetes-agent in venctl components install so that the user doesn't have to think about it.

Then, they will go to the UI to create an "Agent" Workload Identity Federation service account by filling in the Kubernetes cluster's issuer and JWKS URI (which needs to be repeated for each Kubernetes cluster).

Does it need to be repeated for each component too?
Do I have to set up three separate Workload Identity Federation service accounts for:

  • venafi-kubernetes-agent
  • venafi-enhanced-issuer
  • approver-policy-enterprise

Then, they will look for their "company ID" (which is called "tenant ID" in Venafi Connection) by either extracting it from the "Token URL"

How does this relate to the client-id that is mentioned in the the venafi-kubernetes-agent Helm chart:

or by going to to create a VenafiConnection resource:

apiVersion: jetstack.io/v1alpha1
kind: VenafiConnection
metadata:
  name: test-federated
  namespace: venafi
spec:
  vcp:
    accessToken:
      - serviceAccountToken:
          name: test
          audiences: [test]
      - vcpOAuth:
          tenantID: 756db001-280e-11ee-84fb-991f3177e2d0

Please add a more realistic name and serviceAccount name and audiences here.
It's not clear to me whether there are going to be separate VenafiConnection resources for each component,
or whether I'll have to create separate Service Account and VenafiConnection resources for agent, issuer, and approver

Finally, they will install the agent's own Helm chart using the following command:

helm upgrade --install venafi-kubernetes-agent oci://registry.venafi.cloud/charts/venafi-kubernetes-agent \
  --namespace "venafi" \
  --set config.venafiConnection.enable="true" \
  --set config.venafiConnection.name="test-federated" \
  --set config.venafiConnection.namespace="venafi" \
  --set config.server="https://api.venafi.cloud/" \
  --set config.clusterName="mael's kind cluster"

@maelvls
Copy link
Member Author

maelvls commented Jul 26, 2024

I asked whether I should let users use an API key with a VenafiConnection resource.

Atanas answered:

API keys will be deprecated in the future as we have more secure authentication methods. We don’t need to support API key in VC

I'll show an error if someone tries using an API key using a VenafiConnection resource in the Venafi Kubernetes Agent.

@maelvls
Copy link
Member Author

maelvls commented Aug 5, 2024

Discussed today during the afternoon standup:

  • Atanas reminded me that the new venafi connection auth must "work well" with the operator, the problem being (in the lines of) the operator needs to succeed a deployment even if a venafi connection isn't created, similarly to how venafi-enhanced-issuer and approver-policy-enterprise don't require a venafi connection to successfully be deployed.

I've discussed this with Adam, and the plan is to do the following:

  1. In the Venafi Kubernetes Agent's Helm chart, I will add a new field venafiConnection that will contain the tenant ID so that the Helm chart can create a well-known VenafiConnection (named venafi-components) that the Agent and other components can use. The addition to the Helm values will look like this:

    venafiConnection:
      enable: true
      tenantID: 756db001-280e-11ee-84fb-991f3177e2d0

    The resulting VenafiConnection + Service Account that will get created will look like this:

    apiVersion: jetstack.io/v1alpha1
    kind: VenafiConnection
    metadata:
    name: test-federated
    namespace: venafi
    spec:
    vcp:
      accessToken:
      - serviceAccountToken:
          name: venafi-components   # Fixed.
          audiences: [vcp]          # Fixed.
      - vcpOAuth:
          tenantID: 756db001-280e-11ee-84fb-991f3177e2d0
    ---
    apiVersion: v1
    kind: ServiceAccount
    metadata:
      name: venafi-components
      namespace: venafi
  2. venctl and the Operator will be able to use the Helm value venafiConnection.

@maelvls
Copy link
Member Author

maelvls commented Aug 5, 2024

Does [the creation of the Workload Identity Federation service account] need to be repeated for each component too?

Yes, the minimum number of service accounts to be created in the UI or using venctl is 2 per cluster:

  • One "Custom API Integration Service Account using Workload Identity Federation" per cluster. This service account can be shared between venafi-enhanced-issuer and approver-policy-enterprise because they both rely on the scope certificate-issuance.
  • One "Agent Service Account using Workoad Identity Federation" per cluster. This service can't be shared with venafi-enhanced-issuer and approver-policy-enterprise:
    • The UI doesn't allow you to create a service account with the scopes kubernetes-discovery-federated and certificate-issuance simultanously,
    • Same with venctl 1.12.0. Note that venctl doesn't support workload identity federation for now (not for the agent, not for custom API integrations).

Note that it might be possible to create a single SA instead of two by using VCP's serviceaccount API directly, but I haven't tried that.

So, if you have 20 clusters, you will have to either go through the UI 40 times or run venctl 40 times.


How does [the company ID/tenant ID/token URL] relate to the client-id that is mentioned in the the venafi-kubernetes-agent Helm chart docs pages c-vka-helmvalues and t-install-tlspk-agent?

The client ID is only needed by the Private Key JWT authentication that already exists in the agent.

The Workoad Identity Federation service accounts only require the "company ID" (or "tenant ID"). Thus, the client ID that is displayed in the UI isn't useful to the user:

On the other hand, the "company ID" (or "tenant ID") is important but isn't displayed on the UI. One has to extract it from the Token URL. What's weird is that both methods (private key and workoad identity federation) use exactly the same mechanism under the hood, but one requires a client ID to figure out which service account it is, and the other uses a company ID along with the public key to figure out the service account. I guess this technical difference doesn't matter a lot, but the difference in experience will be unexpected to users.

@maelvls
Copy link
Member Author

maelvls commented Aug 12, 2024

Tim pointed out that my "envtest" tests could have been written using the ConnectionHandler fake. He shared the example of signer_test.go that uses this fake.

Unfortunately, I won't have time to move from envtest to the fake by the end of tomorrow. I plan to do that later on.

Would it be OK if we go ahead with these slow envtests as a "first step" so we can review/merge this PR? I'd like to finish this feature by August 19th. My remaining work days are Aug 13th and Aug 19th.

Tomorrow, I'll work on testing https://github.com/jetstack/venafi-connection-lib/pull/220.

@maelvls maelvls force-pushed the venconn branch 2 times, most recently from c5de925 to 4303312 Compare August 13, 2024 07:27
@maelvls maelvls marked this pull request as ready for review August 13, 2024 18:34
@maelvls
Copy link
Member Author

maelvls commented Aug 14, 2024

I wasn’t able to finish fixing the CI before leaving. I'm back on 19th Aug I’m assigning @wallrj to this since he will be reviewing it. @inteon can you help with fixing the CI if needed? Thanks

maelvls added 11 commits August 20, 2024 18:20
Let's pin venafi-connection-lib to the latest commit on main until we
decide to release a new version of venafi-connection-lib (if we actually
do).
Note that I should probably have gone with a fake of the
ConnectionHandler instead of an envtest. We will move to the fake later
on.

I added the venaficonnection CRDs manually for now. I have a PR to
automate pulling these CRDs from the venafi-connection-lib project:
#556

For now, I added these manifests manually with the following commands:

  gh pr checkout 556
  git checkout -
  git checkout step1-makefile-modules -- deploy/charts/venafi-kubernetes-agent/templates/venafi-connection-crd{,.without-validations}.yaml
@maelvls
Copy link
Member Author

maelvls commented Aug 20, 2024

CI is fixed and PR is ready to be reviewed.

Copy link
Member

@wallrj wallrj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @maelvls

I've done quite a lot of testing of the VenafiConnection feature while updating the Helm chart in #559, but I haven't tested whether the agent still works in traditional Service account mode.

I left a few comments and optional suggestions. Ping me for another review if you choose to address those here.

/lgtm
/approve
/hold in case you prefer to make changes here.

cmd/agent.go Show resolved Hide resolved
deploy/charts/venafi-kubernetes-agent/templates/rbac.yaml Outdated Show resolved Hide resolved
pkg/agent/run.go Show resolved Hide resolved
pkg/agent/run.go Show resolved Hide resolved
return func(t *testing.T) {
fakeVenafiCloud, certCloud := fakeVenafiCloud(t)
fakeTPP, certTPP := fakeTPP(t)
_, restconf, kclient := startEnvtest(t)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The returned envtest is discarded here, but I think you should t.Cleanup(env.Stop). I see envtest processes lingering after I run these tests

$ pidof etcd kube-apiserver
374054 373960

Copy link
Member Author

@maelvls maelvls Aug 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's odd.

The func startEnvtest already calls t.Cleanup(envtest.Stop) to stop the kube-apiserver and etcd processes:

func startEnvtest(t testing.TB) (_ *envtest.Environment, _ *rest.Config, kclient ctrlruntime.WithWatch) {
	// ...
	t.Cleanup(func() {
		t.Log("Waiting for envtest to exit")
		err = envtest.Stop()
		require.NoError(t, err)
	})

Not sure why it isn't cleaned up. I'll have to investigate. Would it be OK if we merge this PR even if this oddity is still there?

pkg/client/client_venconn_test.go Show resolved Hide resolved
pkg/client/client_venconn_test.go Outdated Show resolved Hide resolved
pkg/client/client_venconn_test.go Show resolved Hide resolved
- secret:
name: accesstoken
fields: [accesstoken]`),
expectReadyCondMsg: "ea744d098c2c1c6044e4c4e9d3bf7c2a68ef30553db00f1714886cedf73230f1",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a confusing Ready condition message! What does it mean? Where does it come from?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey. I've looked at the official Venafi Connection API reference page and wasn't able to find where this is documented. I don't think it is documented anywhere.

I found an example of status here. It looks like this:

kind: VenafiConnection
status:
  conditions:
  - type: VenafiEnhancedIssuerReady
    status: "True"
    reason: Generated a token
    message: 8db2c15ace6f4c7b59138909b6b69d6caca69e2d308e695206b5e15ddaf28e81
    tokenValidUntil: "2123-10-02T19:57:36Z"

What (I think) happens is that venafi-connection-lib stores some form of hash in the message field... The reason is actually a message (reason should be TokenGenerated or something like that), and the message contains a non-readable string that should rather be in some custom tokenHash field. Funnily enough, the official Venafi Connection API reference says that message is a human-readable explanation:

Message is a human readable description of the details of the last transition, complementing reason.

I don't think this quirk is a big deal, but IMO it should be clearly stated in the API reference or somewhere else that message contains an opaque string used by venafi-connection-lib to work, and that reason contains the actual human-readable string. For example, in case of error, it shows:

kind: VenafiConnection
status:
  conditions:
    - type: VenafiEnhancedIssuerReady
    status: "False"
    reason: |
      connection is not ready yet (Venafi self-test failed): error authenticating
      with Venafi: vcert error: server error: server unavailable: Get "https://api.venafi.cloud/v1/useraccounts":
      net/http: request canceled (Client.Timeout exceeded while awaiting headers)
    message: ""
    tokenValidUntil: ""

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now, the message field contains a hash of the token.
We can change this, there is no logic that depends on this behavior (only the human experience).

Copy link
Member

@wallrj wallrj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I merged the Helm templates into this branch. And the tests are passing.
Please merge unless you want to address any thing else here.

@maelvls
Copy link
Member Author

maelvls commented Aug 22, 2024

This PR is good to go!

Three things I'd like to address in later PRs:

@maelvls maelvls merged commit 8a765ef into master Aug 22, 2024
8 checks passed
@wallrj wallrj deleted the venconn branch August 22, 2024 14:17
@wallrj
Copy link
Member

wallrj commented Aug 22, 2024

I found an example of status https://github.com/jetstack/venafi-connection-lib/pull/146. It looks like this:

tokenValidUntil: "2123-10-02T19:57:36Z"

That's 100 years away! I noticed the same when testing this branch. Is it a bug in VenafConnection lib?

// TODO(mael): The rest of the codebase uses the standard "log" package,
// venafi-connection-lib uses "go-logr/logr", and client-go uses "klog". We
// should standardize on one of them, probably "slog".
ctrlruntimelog.SetLogger(logr.Logger{})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maelvls
Copy link
Member Author

maelvls commented Dec 12, 2024

Today, @hawksight asked: why do we recommend using a single venafi-components?

Peter made the point that he would most likely recommend customers to use separate service accounts, one for each component. Peter said it would be OK for demo purposes, but not for production.

I think I said something about a well-known name:

so that the Helm chart can create a well-known VenafiConnection (named venafi-components) that the Agent and other components can use.

Should we not rather show the recommended way on the website instead of venafi-components?

@wallrj @achuchev

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants