Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VC-36950: It is now possible to exclude labels and annotations #614

Merged
merged 6 commits into from
Nov 14, 2024

Conversation

maelvls
Copy link
Member

@maelvls maelvls commented Nov 8, 2024

Some annotations and labels contain sensitive information. A well-known example is the kubectl.kubernetes.io/last-applied-configuration annotation. We already exclude this annotations from Secret resources, but we have found that customers use other sensitive annotations and labels that they would like to not be sent to the Venafi Control Plane API.

A realistic example lies in the Kapp project. The Kapp project uses four annotations that all start with kapp.k14s.io/original* as seen in 1. These annotations are similar to kubectl.kubernetes.io/last-applied-configuration in that they may contain sensitive information. The annotations would look like this:

annotations:
  kapp.k14s.io/original: |
    {"apiVersion":"v1","kind":"Secret","spec":{"data": {"password": "cGFzc3dvcmQ=","username": "bXl1c2VybmFtZQ=="}}}
  kapp.k14s.io/original-diff: |
    - type: test
      path: /data
      value:
      password: cygpcGVyUzNjcmV0UEBhc3N3b3JkIQ==
      username: bXl1c2VybmFtZQ==

In this case, customers will be suggested to use the following exclusion setting in values.yaml to exclude both kapp.k14s.io/original and kapp.k14s.io/original-diff:

config:
  excludeAnnotationKeysRegex:
    - ^kapp\.k14s\.io/original

For labels, it looks like this:

config:
  excludeLabelKeysRegex:
    - \.company\.com/

The documentation for this feature is being worked on in https://gitlab.com/venafi/vaas/ua/clouddocs/-/merge_requests/1220.

Ref: VC-36950

Why regex, why not fixed string search with wildcards?

We have chosen to go with regexes instead of fixed-string search or wildcards. We know that using regexes will be confusing, but we don't expect this feature to be used so often, and since we don't know exactly the type of filtering that customers will want, we have decided to use the most versatile one : regexes.

Manual Testing

Although I've added a couple of unit tests, I haven't had time to add an automated end-to-end test. In particular, my PR is lacking a smoke test to check that the Helm chart value excludeAnnotationKeysRegex does what it claims it does.

Here is the two manual tests that make me confident that this feature works as expected.

Test 1: Using --output-path + fake Jetstack Secure API token

I used an empty Kind cluster for this test.

First, run this:

cat <<EOF >minimal-config.yaml
period: 5m
cluster_id: "kind-mael"
server: "https://api.venafi.cloud/"
organization_id: foo
venafi-cloud:
  upload_path: "/v1/tlspk/upload/clusterdata"
exclude-annotation-keys-regex:
  - ".*password.*"
  - ".*apikey.*"
exclude-label-keys-regex:
  - ".*password.*"
  - ".*apikey.*"
data-gatherers:
  - kind: "k8s-dynamic"
    name: "k8s/namespaces"
    config:
      resource-type:
        resource: namespaces
        version: v1
  - kind: "k8s-dynamic"
    name: "k8s/secrets"
    config:
      resource-type:
        version: v1
        resource: secrets
EOF

Then:

openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout tls.key -out tls.crt -subj "/CN=mydomain.com/O=myorganization"
kubectl create secret tls my-tls-secret --cert=tls.crt --key=tls.key
kubectl annotate secret my-tls-secret some-secret-apikey=1234
kubectl label secret my-tls-secret some-secret-password=1234

kubectl create ns my-ns
kubectl annotate ns my-ns some-secret-apikey=1234
kubectl label ns my-ns some-secret-password=1234

Finally, run the agent in one-shot mode. You should see no output when running this:

$ go run . agent -c minimal-config.yaml \
  --api-token should-not-be-required \
  --install-namespace=venafi \
  --output-path /dev/stdout \
  --one-shot | grep -B5 1234
2024/11/08 16:46:56 Preflight agent version: development ()
2024/11/08 16:46:56 Using the Jetstack Secure API Token auth mode since --api-token was specified.
2024/11/08 16:46:56 error messages will not show in the pod's events because the POD_NAME environment variable is empty
2024/11/08 16:46:56 starting "k8s/namespaces" datagatherer
2024/11/08 16:46:56 starting "k8s/secrets" datagatherer
2024/11/08 16:46:56 successfully gathered 7 items from "k8s/namespaces" datagatherer
2024/11/08 16:46:56 successfully gathered 5 items from "k8s/secrets" datagatherer
2024/11/08 16:46:56 Data saved to local file: /dev/stdout

Before this PR, the output would look like this:

$ go run . agent -c minimal-config.yaml \
  --api-token should-not-be-required \
  --install-namespace=venafi \
  --output-path /dev/stdout \
  --one-shot | grep -B5 1234
2024/11/08 16:47:26 Preflight agent version: development ()
2024/11/08 16:47:26 Using the Jetstack Secure API Token auth mode since --api-token was specified.
2024/11/08 16:47:26 error messages will not show in the pod's events because the POD_NAME environment variable is empty
2024/11/08 16:47:26 starting "k8s/namespaces" datagatherer
2024/11/08 16:47:26 starting "k8s/secrets" datagatherer
2024/11/08 16:47:26 successfully gathered 7 items from "k8s/namespaces" datagatherer
2024/11/08 16:47:26 successfully gathered 5 items from "k8s/secrets" datagatherer
2024/11/08 16:47:26 Data saved to local file: /dev/stdout
          "resource": {
            "apiVersion": "v1",
            "kind": "Namespace",
            "metadata": {
              "annotations": {
                "some-secret-apikey": "1234"
              },
              "creationTimestamp": "2024-11-08T15:37:03Z",
              "labels": {
                "kubernetes.io/metadata.name": "my-ns",
                "some-secret-password": "1234"
--
              "tls.crt": "..."
            },
            "kind": "Secret",
            "metadata": {
              "annotations": {
                "some-secret-apikey": "1234"
              },
              "labels": {
                "some-secret-password": "1234"

Test 2: Real API + Venafi Cloud Key Pair Service Account auth

In this test, I check that agent's HTTP request doesn't contain the annotations and labels using mitmproxy.

For this test, I've used the tenant https://ven-tlspk.venafi.cloud/. To access the API key, use the user [email protected] and the password is visible in the page Production Accounts (private to Venafi). Then go to the settings and find the API key, and set it as an env var:

APIKEY=...

Create the service account and key pair:

venctl iam service-account agent create --name "$USER temp" \
  --vcp-region US \
  --output json \
  --owning-team $(curl -sS https://api.venafi.cloud/v1/teams -H "tppl-api-key: $APIKEY" | jq '.teams[0].id') \
  --output-file /tmp/agent-credentials.json \
  --api-key $APIKEY

Now, make sure to have 127.0.0.1 me in your /etc/hosts.

Then, run mitmproxy with:

curl -L https://raw.githubusercontent.com/maelvls/kubectl-incluster/main/watch-stream.py >/tmp/watch-stream.py
mitmproxy --mode regular@9090 --ssl-insecure -s /tmp/watch-stream.py --set client_certs=$(kubectl incluster --print-client-cert >/tmp/me.pem && echo /tmp/me.pem)

Run this:

cat <<EOF >minimal-config.yaml
period: 5m
cluster_id: "kind-mael"
server: "https://api.venafi.cloud/"
venafi-cloud:
  upload_path: "/v1/tlspk/upload/clusterdata"
exclude-annotation-keys-regex:
  - ".*password.*"
  - ".*apikey.*"
exclude-label-keys-regex:
  - ".*password.*"
  - ".*apikey.*"
data-gatherers:
  - kind: "k8s-dynamic"
    name: "k8s/namespaces"
    config:
      resource-type:
        resource: namespaces
        version: v1
  - kind: "k8s-dynamic"
    name: "k8s/secrets"
    config:
      resource-type:
        version: v1
        resource: secrets
EOF

Now, don't forget to create a Kind cluster. Then:

openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout tls.key -out tls.crt -subj "/CN=mydomain.com/O=myorganization"
kubectl create secret tls my-tls-secret --cert=tls.crt --key=tls.key
kubectl annotate secret my-tls-secret some-secret-apikey=1234
kubectl label secret my-tls-secret some-secret-password=1234

kubectl create ns my-ns
kubectl annotate ns my-ns some-secret-apikey=1234
kubectl label ns my-ns some-secret-password=1234

Finally, run the Agent with:

go install github.com/maelvls/kubectl-incluster@latest
export HTTPS_PROXY=http://localhost:9090 KUBECONFIG=/tmp/kube && KUBECONFIG= HTTPS_PROXY= kubectl incluster --replace-ca-cert ~/.mitmproxy/mitmproxy-ca-cert.pem --sa=venafi/venafi-kubernetes-agent | sed 's|127.0.0.1|me|' >/tmp/kube

go run . agent -c minimal-config.yaml \
  --client-id $(jq -r .client_id /tmp/agent-credentials.json) \
  --private-key-path <(jq -r .private_key /tmp/agent-credentials.json) \
  --install-namespace=venafi \
  --one-shot

Look at the mitmproxy logs and look for the entry that has the path

/v1/tlspk/upload/clusterdata/no?name=...

Open it (enter) and type / and 1234. You should not see any instance of "1234", proving that the annotations and labels have been filtered out.

When looking at the JSON blob, all of the annotations and labels containing apikey and password are gone:

type name namespace annotations labels
Namespace local-path-storage - {} {“kubernetes.io/metadata.name”: “local-path-storage”}
Namespace my-ns - {} {“kubernetes.io/metadata.name”: “my-ns”}
Namespace venafi - {} {“kubernetes.io/metadata.name”: “venafi”, “name”: “venafi”}
Namespace default - {} {“kubernetes.io/metadata.name”: “default”}
Namespace kube-node-lease - {} {“kubernetes.io/metadata.name”: “kube-node-lease”}
Namespace kube-public - {} {“kubernetes.io/metadata.name”: “kube-public”}
Namespace kube-system - {} {“kubernetes.io/metadata.name”: “kube-system”}
Secret agent-credentials venafi {} -
Secret sh.helm.release.v1.venafi-kubernetes-agent.v1 venafi - {“modifiedAt”: “1728905433”, “name”: “venafi-kubernetes-agent”, “owner”: “helm”,
Secret sh.helm.release.v1.venafi-kubernetes-agent.v2 venafi - {“modifiedAt”: “1728905433”, “name”: “venafi-kubernetes-agent”, “owner”: “helm”,
Secret sh.helm.release.v1.venafi-kubernetes-agent.v3 venafi - {“modifiedAt”: “1728905433”, “name”: “venafi-kubernetes-agent”, “owner”: “helm”,
Secret my-tls-secret default {} {}

Here is the output that I got without this PR:

type name namespace annotations labels
Namespace default - - {“kubernetes.io/metadata.name”: “default”}
Namespace kube-node-lease - - {“kubernetes.io/metadata.name”: “kube-node-lease”}
Namespace kube-public - - {“kubernetes.io/metadata.name”: “kube-public”}
Namespace kube-system - - {“kubernetes.io/metadata.name”: “kube-system”}
Namespace local-path-storage - {} {“kubernetes.io/metadata.name”: “local-path-storage”}
Namespace my-ns - {“some-secret-apikey”: “1234”} {“kubernetes.io/metadata.name”: “my-ns”, “some-secret-password”: “1234”}
Namespace venafi - - {“kubernetes.io/metadata.name”: “venafi”, “name”: “venafi”}
Secret agent-credentials venafi {} -
Secret sh.helm.release.v1.venafi-kubernetes-agent.v1 venafi - {“modifiedAt”: “1728905433”, “name”: “venafi-kubernetes-agent”, “owner”: “helm”,
Secret sh.helm.release.v1.venafi-kubernetes-agent.v2 venafi - {“modifiedAt”: “1728905433”, “name”: “venafi-kubernetes-agent”, “owner”: “helm”,
Secret sh.helm.release.v1.venafi-kubernetes-agent.v3 venafi - {“modifiedAt”: “1728905433”, “name”: “venafi-kubernetes-agent”, “owner”: “helm”,
Secret my-tls-secret default {“some-secret-apikey”: “1234”} {“some-secret-password”: “1234”}

@maelvls maelvls changed the title annot-exclusion: it is now possible to exclude labels and annotations VC-36950: It is now possible to exclude labels and annotations Nov 8, 2024
@maelvls maelvls force-pushed the VC-36950-add-exclude-annots-labels branch from dfcfa3c to 9f8f724 Compare November 8, 2024 16:12
Comment on lines +158 to +184
dynDg, isDynamicGatherer := newDg.(*k8s.DataGathererDynamic)
if isDynamicGatherer {
dynDg.ExcludeAnnotKeys = config.ExcludeAnnotationKeysRegex
dynDg.ExcludeLabelKeys = config.ExcludeLabelKeysRegex
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is super hacky... I tried finding a cleaner way of "injecting" the ExcludeAnnotKeys and ExcludeLabelKeys, but gave up.

@maelvls maelvls force-pushed the VC-36950-add-exclude-annots-labels branch from 9f8f724 to 4097047 Compare November 8, 2024 16:17
@maelvls maelvls requested a review from wallrj November 8, 2024 16:17
@maelvls maelvls marked this pull request as ready for review November 8, 2024 18:02
Copy link
Member

@wallrj wallrj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maelvls I read the code but didn't test it yet.

  1. The unit tests are failing
  2. The example regex patterns don't seem very realistic. Why would a user put "secrets" in the annotations of labels?
  3. And if it is realistic, shouldn't the venafi-kubernetes-agent redact labels containing those terms by default...like a log sanitizer or like GitHub actions attempts to do to prevent accidentally leaking these things.
  4. The original Jira ticket says: "Note: sensitive data could be something like organizational structure or team ownership of a resource, which could be defined through labels." but does not give any real world examples. I imagine something like ^.*\.example\.com/.* to filter out all the labels with the well known organization prefix.

Comment on lines 246 to 249
# If you would like to exclude annotations keys that contain the word
# `secret`, use the regular expression `.*secret.*`. The leading and ending .*
# are important if you want to filter out keys that contain `secret` anywhere
# in the key string.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# If you would like to exclude annotations keys that contain the word
# `secret`, use the regular expression `.*secret.*`. The leading and ending .*
# are important if you want to filter out keys that contain `secret` anywhere
# in the key string.
# If you would like to exclude annotation keys that contain the word
# `secret`, use the regular expression `.*secret.*`. The leading and ending .*
# are important if you want to filter out keys that contain `secret` anywhere
# in the key string.

Comment on lines 379 to 382
annotsRaw, ok, err := unstructured.NestedFieldNoCopy(resource.Object, "metadata", "annotations")
if err != nil {
return fmt.Errorf("wasn't able to find the metadata.annotations field: %w", err)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this should return an error, because the object might not have any annotations. I guess it depends whether the metadata.annotations field has an omitempty marker.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. I will rework this and add more unit tests to feel a little more confident.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed this error and added unit tests in 120453f. PTAL

@maelvls
Copy link
Member Author

maelvls commented Nov 12, 2024

@wallrj FYI, the documentation for this feature is being worked on in https://gitlab.com/venafi/vaas/ua/clouddocs/-/merge_requests/1220.

@maelvls maelvls force-pushed the VC-36950-add-exclude-annots-labels branch from eecf784 to 365d7a1 Compare November 12, 2024 18:37
@maelvls
Copy link
Member Author

maelvls commented Nov 12, 2024

I've fixed the unit tests. The issue was each test case focused on a single data gatherer, and one data gatherer only ever deals with one resource at a time. And the test case in question was set up with the Secret resource. I could have created a different test case... But I've decided to remove the Route resource altogether since it doesn't increase my level of confidence.

The tests are now passing. You can go ahead and try the feature out. Thanks! @wallrj

@maelvls
Copy link
Member Author

maelvls commented Nov 13, 2024

  1. The example regex patterns don't seem very realistic. Why would a user put "secrets" in the annotations of labels?

You are right.

To give a realistic regex in this test case, let’s use the example of the Kapp project that uses four annotations that all start with kapp.k14s.io/original* as seen in 1. These annotations are similar to kubectl.kubernetes.io/last-applied-configuration in that they may contain sensitive information. The annotations would look like this:

annotations:
  kapp.k14s.io/original: |
    {"apiVersion":"v1","kind":"Secret","spec":{"data": {"password": "cGFzc3dvcmQ=","username": "bXl1c2VybmFtZQ=="}}}
  kapp.k14s.io/original-diff: |
    - type: test
      path: /data
      value:
      password: cygpcGVyUzNjcmV0UEBhc3N3b3JkIQ==
      username: bXl1c2VybmFtZQ==

In this case, I'd suggest using the following exclusion setting in values.yaml:

config:
  excludeAnnotationKeysRegex:
    - kapp\.k14s\.io/original.*

^ Note that the regex doesn't need to start with ^ and end with $ since the expression is implicitly matching the whole word. Adding ^ and $ will not have any effect.

  1. And if it is realistic, shouldn't the venafi-kubernetes-agent redact labels containing those terms by default...like a log sanitizer or like GitHub actions attempts to do to prevent accidentally leaking these things.

Good point, that'd be perfect. I don't know how to get there though, so until we get there, I like the idea of providing a way of letting customers exclude specific annotations.

  1. The original Jira ticket says: "Note: sensitive data could be something like organizational structure or team ownership of a resource, which could be defined through labels." but does not give any real world examples. I imagine something like ^.*\.example\.com/.* to filter out all the labels with the well known organization prefix.

I think your example is sensible, and I'll change my examples in code and documentation to match what you proposed. The values.yaml would look like this:

config:
  excludeLabelKeysRegex:
    - .*\.company\.com/.*

It seems like . is the only character that people should worry about since annotation keys and label keys must be of the form [prefix/]name, where prefix is a DNS subdomain and name is a DNS label.

It reminds me that I need to clarify a few things in the documentation:

  • Escaping: The dot (.) is the only character that needs escaping. The character / doesn't need to be escaped. Example: company\.com/team. In your values.yaml, make sure to use either a single-quoted string ('regex') or an unquoted string (regex) for the config.excludeAnnotationKeysRegex and config.excludeLabelKeysRegex fields. Avoid using double-quoted strings ("regex") for these fields, as YAML interprets escape sequences like \. differently in double quotes, which can lead to unexpected behavior.
  • Contains: If you want to match anything that contains word, you can use the regular expression word.
  • Starts With: If you want to match anything that starts with word, you need to start the regular expression with ^. For example, if you would like to exclude all annotations starting with company.com, you can use the regular expression ^company\.com.*.
  • Ends With: If you want to match anything that ends with word, use $. For example, if you would like to match all annotations that end with team, you can use team$.

Copy link
Member

@wallrj wallrj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maelvls I tested it locally and it does strip the labels and annotations from the metdata of objects.
But it misses the labels and annotations that are added to PodTemplate in Deployment, StatefulSet, Job etc.
So I don't think it's going to satisfy users who want to prevent the sharing of sensitive labels and annotations.

Happy to approve this if you consider it to be a step in the right direction, but I think a general solution which scrubs all sensitive fields, will look different to this.
I suppose it might be applied to all keys in the venafi-kubernetes-agent JSON output,
regardless of which DataGatherers have been used.

helm template deploy/charts/venafi-kubernetes-agent \
  --show-only templates/configmap.yaml \
  --set fullnameOverride=venafi-kubernetes-agent \
  --set config.excludeAnnotationKeysRegex={'.*'} \
  --set config.excludeLabelKeysRegex={'.*'} \
| yq '.data."config.yaml" 
        | @yamld 
        | .organization_id |= "example.com" 
        | .cluster_id |= "cluster-1.example.com"' \
> examples/venafi-kubernetes-agent.yaml

go run ./ agent \
  --one-shot \
  --api-token should-not-be-required \
  --install-namespace venafi \
  --output-path /dev/stdout \
  --agent-config-file examples/venafi-kubernetes-agent.yaml 2>/dev/null \
|  grep -E -C 5 -e '"(labels|annotations)": {$'
...
--
                }
              },
              "template": {
                "metadata": {
                  "creationTimestamp": null,
                  "labels": {
                    "app.kubernetes.io/instance": "venafi-enhanced-issuer",
                    "app.kubernetes.io/managed-by": "Helm",
                    "app.kubernetes.io/name": "venafi-enhanced-issuer",
                    "app.kubernetes.io/version": "v0.14.0",
                    "helm.sh/chart": "venafi-enhanced-issuer-v0.14.0",
                    "pod-template-hash": "d7496dc68"
                  },
                  "annotations": {
                    "kubectl.kubernetes.io/default-container": "venafi-enhanced-issuer"
                  }
                },
                "spec": {
                  "containers": [

@maelvls maelvls force-pushed the VC-36950-add-exclude-annots-labels branch from 689b311 to 120453f Compare November 13, 2024 12:31
@maelvls
Copy link
Member Author

maelvls commented Nov 13, 2024

But it misses the labels and annotations that are added to PodTemplate in Deployment, StatefulSet, Job etc.

I totally forgot about these. I think it's worth investigating this further in a separate PR, but I'd be OK shipping the next version of Venafi Kubernetes Agent with the feature as presented in this PR.

pkg/datagatherer/k8s/dynamic_test.go Outdated Show resolved Hide resolved
Copy link
Member

@wallrj wallrj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed some problems with the new tests, and suggested a diff.
I didn't push the changes, you'd better sanity check it first.

pkg/datagatherer/k8s/dynamic_test.go Show resolved Hide resolved
@maelvls maelvls force-pushed the VC-36950-add-exclude-annots-labels branch 2 times, most recently from 42a4e34 to 0e609ff Compare November 14, 2024 08:49
Copy link
Member

@wallrj wallrj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @maelvls

Thanks for making the new unit tests so easily readable.
You've acknowledged that this only catches the labels in the object metadata,
not in the PodTemplates of Deployment, StatefulSet etc..
But we've agreed that you'll address that in another PR.
The user-facing configuration makes sense and is very well documented.
But we've also acknowledged that it is going to be quite difficult for the user to know whether their supplied exclusion regular expressions are actually working.
They don't have an easy way to see the data that is being collected by the agent before it is posted to the Venafi API.

I imagine that users may ask for a label / annotation (or even a general field) whitelist feature in future, so that they can declare exactly which fields and values are shared with Venafi....but hopefully this new feature will help us to validate that.

/approve
/lgtm

@maelvls maelvls force-pushed the VC-36950-add-exclude-annots-labels branch from 0e609ff to 8f99daa Compare November 14, 2024 12:31
@maelvls maelvls merged commit a8aaf84 into master Nov 14, 2024
2 checks passed
@maelvls maelvls deleted the VC-36950-add-exclude-annots-labels branch November 14, 2024 12:49
@wallrj
Copy link
Member

wallrj commented Nov 21, 2024

But we've also acknowledged that it is going to be quite difficult for the user to know whether their supplied exclusion regular expressions are actually working.
They don't have an easy way to see the data that is being collected by the agent before it is posted to the Venafi API.

Now being worked on in https://venafi.atlassian.net/browse/VC-37240

  • [TLSPK agent] Allow the users to know whether excludeAnnotationKeysRegex works as expected

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants