Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GlobalTenantRessource only creates object. No updates possibles. #687

Closed
h4wkmoon opened this issue Jan 29, 2023 · 9 comments · Fixed by #689
Closed

GlobalTenantRessource only creates object. No updates possibles. #687

h4wkmoon opened this issue Jan 29, 2023 · 9 comments · Fixed by #689
Assignees
Labels
bug Something isn't working
Milestone

Comments

@h4wkmoon
Copy link
Contributor

Bug description

Globaltenantresource reconciliation fails after the creation of the replicated items. Items are just created, never updated.

Impacts are:

  • Replicated items are only created, they are never updated.

How to reproduce

Steps to reproduce the behavior:

Create any tenant, a namespace inthe tenant, and any globaltenantresource.

Expected behavior

No error. Items are updated.

Logs

capsule-controller-manager-78569d95f-f9xlt manager {"level":"error","ts":"2023-01-29T13:35:36.405Z","msg":"Reconciler error","controller":"globaltenantresource","controllerGroup":"capsule.clastix.io","controllerKind":"GlobalTenantResource","globalTenantResource":{"name":"alertmanager"},"namespace":"","name":"alertmanager","reconcileID":"5b37ac80-de92-48c6-8baa-393869b62e40","error":"1 error occurred:\n\t* alertmanagers.monitoring.coreos.com "alertmanager" is invalid: metadata.resourceVersion: Invalid value: 0x0: must be specified for an update\n\n","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:273\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:234"}

Additional context

  • Capsule version: 0.2
  • Kubernetes version: 1.25

I think the issue is here:
https://github.com/clastix/capsule/blob/9d6f766cc1af157bb548284ea8dbd37e9ae80fa4/controllers/resources/processor.go#L260

@h4wkmoon h4wkmoon added blocked-needs-validation Issue need triage and validation bug Something isn't working labels Jan 29, 2023
@h4wkmoon
Copy link
Contributor Author

I tried to edit a kubernetes resources, and set it resourceVersion to "". I got the same error capsule gets.

metadata.resourceVersion: Invalid value: 0x0: must be specified for an update

@aslafy-z
Copy link
Contributor

aslafy-z commented Jan 29, 2023

AFAIK metadata.resourceVersion field should be copied from the initial object on update. I guess that line can be deleted without issues isn't it?

@prometherion
Copy link
Member

Hey @h4wkmoon, may I ask you to share the GlobalTenantResource you're using?

Just to tried to understand if you're using existing resources, or raw ones.

@prometherion prometherion self-assigned this Jan 29, 2023
@prometherion prometherion added this to the v0.2.1 milestone Jan 29, 2023
@h4wkmoon
Copy link
Contributor Author

@prometherion, I tried with both raw & namespaced. Same results.

@prometherion
Copy link
Member

I wasn't able to reproduce, sharing the steps tested so far.

apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"capsule.clastix.io/v1beta2","kind":"Tenant","metadata":{"annotations":{},"name":"oil"},"spec":{"owners":[{"kind":"User","name":"alice"}]}}
  creationTimestamp: "2023-01-26T16:22:57Z"
  generation: 5
  labels:
    energy: green
  name: oil
  resourceVersion: "232417"
  uid: 0d34f8c7-7b17-4626-ad1b-4aabe06422a4
spec:
  owners:
  - clusterRoles:
    - admin
    - capsule-namespace-deleter
    kind: User
    name: alice
status:
  namespaces:
  - oil-development
  - oil-production
  - oil-staging
  size: 3
  state: Active

This is the GlobalTenantResource I create:

apiVersion: capsule.clastix.io/v1beta2
kind: GlobalTenantResource
metadata:
  name: green-production
spec:
  pruningOnDelete: true
  resources:
  - rawItems:
    - apiVersion: v1
      kind: Secret
      metadata:
        name: raw-secret-1
    - apiVersion: v1
      kind: Secret
      metadata:
        name: raw-secret-2
    - apiVersion: v1
      kind: Secret
      metadata:
        name: raw-secret-3
  resyncPeriod: 60s
  tenantSelector:
    matchLabels:
      energy: green

I don't have any secret created in my Namespaces:

$: kubectl get secret -A
NAMESPACE        NAME                                  TYPE                 DATA   AGE
capsule-system   capsule-proxy                         kubernetes.io/tls    2      4d18h
capsule-system   capsule-tls                           Opaque               3      17d
capsule-system   sh.helm.release.v1.capsule-proxy.v1   helm.sh/release.v1   1      4d18h
capsule-system   sh.helm.release.v1.capsule.v1         helm.sh/release.v1   1      17d

Now I apply the GlobalTenantResource and I check the Secrets

NAMESPACE         NAME                                  TYPE                 DATA   AGE
capsule-system    capsule-proxy                         kubernetes.io/tls    2      4d18h
capsule-system    capsule-tls                           Opaque               3      17d
capsule-system    sh.helm.release.v1.capsule-proxy.v1   helm.sh/release.v1   1      4d18h
capsule-system    sh.helm.release.v1.capsule.v1         helm.sh/release.v1   1      17d
oil-development   raw-secret-1                          Opaque               0      2s
oil-development   raw-secret-2                          Opaque               0      2s
oil-development   raw-secret-3                          Opaque               0      2s
oil-production    raw-secret-1                          Opaque               0      2s
oil-production    raw-secret-2                          Opaque               0      2s
oil-production    raw-secret-3                          Opaque               0      2s
oil-staging       raw-secret-1                          Opaque               0      2s
oil-staging       raw-secret-2                          Opaque               0      2s
oil-staging       raw-secret-3                          Opaque               0      2s

I'm going to edit the raw items, adding a field to those secrets.

apiVersion: capsule.clastix.io/v1beta2
kind: GlobalTenantResource
metadata:
  name: green-production
spec:
  pruningOnDelete: true
  resources:
  - rawItems:
    - data:
        admin: YWRtaW4=
      apiVersion: v1
      kind: Secret
      metadata:
        name: raw-secret-1
    - data:
        admin: YWRtaW4=
      apiVersion: v1
      kind: Secret
      metadata:
        name: raw-secret-2
    - data:
        admin: YWRtaW4=
      apiVersion: v1
      kind: Secret
      metadata:
        name: raw-secret-3
  resyncPeriod: 60s
  tenantSelector:
    matchLabels:
      energy: green

Checking if the secret is containing the updated value:

$: kubectl -n oil-development get secret raw-secret-1 -o jsonpath='{.data.admin}'
YWRtaW4=                                                                                                                                                                                     
$: kubectl -n oil-development get secret raw-secret-3 -o jsonpath='{.data.admin}'
YWRtaW4=                                                                                                                                                                                     
$: kubectl -n oil-development get secret raw-secret-2 -o jsonpath='{.data.admin}'
YWRtaW4=

Also, tried to grep the logs with the error you reported, but wasn't able to find anything.

$: kubectl logs -n capsule-system deployments/capsule-controller-manager | grep "must be specified for an update"

@h4wkmoon please, to help you in triaging the bug, and propose a fix, we need first to replicate the issue: it would be absolutely valuable if you could share the data you're trying to replicate, even redacted, so I can work on a patch although I already have an idea on a possible root cause.

@h4wkmoon
Copy link
Contributor Author

This one will trigger the error :
(prerequisite: you need the prometheus-operator)

apiVersion: capsule.clastix.io/v1beta2
kind: GlobalTenantResource
metadata:
  name: green-production
spec:
  pruningOnDelete: true
  resources:
  - rawItems:
    - apiVersion: monitoring.coreos.com/v1
      kind: Alertmanager
      metadata:
        name: alertmanager
      spec:
        alertmanagerConfigNamespaceSelector: {}
        alertmanagerConfigSelector: {}
        image: quay.io/prometheus/alertmanager:v0.24.0
        listenLocal: false
        logFormat: logfmt
        logLevel: info
        paused: false
        portName: http-web
        replicas: 0
        retention: 120h
        routePrefix: /
        securityContext:
          fsGroup: 2000
          runAsGroup: 2000
          runAsNonRoot: true
          runAsUser: 1000
        storage:
          emptyDir:
            sizeLimit: 1Gi
        version: v0.24.0
  resyncPeriod: 60s
  tenantSelector:
    matchLabels:
        energy: green

I guess the difference with secret is that there is an operator that updates the resource somehow.

@h4wkmoon
Copy link
Contributor Author

Also, I see "managedFields" in the generated alertmanager resource, not for secrets.

@prometherion
Copy link
Member

Thanks, I was able to reproduce it now :)

Going to propose a hotfix for this, thanks again!

@prometherion
Copy link
Member

Just tested the changes introduced with #689, replication seems working.

{"level":"info","ts":"2023-01-31T15:04:21.003Z","msg":"start processing","controller":"globaltenantresource","controllerGroup":"capsule.clastix.io","controllerKind":"GlobalTenantResource","globalTenantResource":{"name":"green-production"},"namespace":"","name":"green-production","reconcileID":"9162fbd1-cc76-4b44-a06d-a782ddfbab37"}
{"level":"debug","ts":"2023-01-31T15:04:21.010Z","logger":"controller-runtime.webhook.webhooks","msg":"received request","webhook":"/cordoning","UID":"8963eed0-58c5-44e6-a5dd-6beed7681c18","kind":"monitoring.coreos.com/v1, Kind=Alertmanager","resource":{"group":"monitoring.coreos.com","version":"v1","resource":"alertmanagers"}}
{"level":"debug","ts":"2023-01-31T15:04:21.010Z","logger":"controller-runtime.webhook.webhooks","msg":"wrote response","webhook":"/cordoning","code":200,"reason":"","UID":"8963eed0-58c5-44e6-a5dd-6beed7681c18","allowed":true}
{"level":"info","ts":"2023-01-31T15:04:21.024Z","msg":"resource has been replicated","controller":"globaltenantresource","controllerGroup":"capsule.clastix.io","controllerKind":"GlobalTenantResource","globalTenantResource":{"name":"green-production"},"namespace":"","name":"green-production","reconcileID":"9162fbd1-cc76-4b44-a06d-a782ddfbab37","resource":"oil-development/alertmanager"}
{"level":"debug","ts":"2023-01-31T15:04:21.024Z","logger":"controller-runtime.webhook.webhooks","msg":"received request","webhook":"/cordoning","UID":"7edd835e-f631-4b71-9ba6-8ee4d7506788","kind":"monitoring.coreos.com/v1, Kind=Alertmanager","resource":{"group":"monitoring.coreos.com","version":"v1","resource":"alertmanagers"}}
{"level":"debug","ts":"2023-01-31T15:04:21.024Z","logger":"controller-runtime.webhook.webhooks","msg":"wrote response","webhook":"/cordoning","code":200,"reason":"","UID":"7edd835e-f631-4b71-9ba6-8ee4d7506788","allowed":true}
{"level":"debug","ts":"2023-01-31T15:04:21.025Z","logger":"controller-runtime.webhook.webhooks","msg":"received request","webhook":"/cordoning","UID":"d75a5aac-70b7-4701-9cb2-00c09c277468","kind":"monitoring.coreos.com/v1, Kind=Alertmanager","resource":{"group":"monitoring.coreos.com","version":"v1","resource":"alertmanagers"}}
{"level":"debug","ts":"2023-01-31T15:04:21.025Z","logger":"controller-runtime.webhook.webhooks","msg":"wrote response","webhook":"/cordoning","code":200,"reason":"","UID":"d75a5aac-70b7-4701-9cb2-00c09c277468","allowed":true}
{"level":"info","ts":"2023-01-31T15:04:21.027Z","msg":"resource has been replicated","controller":"globaltenantresource","controllerGroup":"capsule.clastix.io","controllerKind":"GlobalTenantResource","globalTenantResource":{"name":"green-production"},"namespace":"","name":"green-production","reconcileID":"9162fbd1-cc76-4b44-a06d-a782ddfbab37","resource":"oil-staging/alertmanager"}
{"level":"info","ts":"2023-01-31T15:04:21.027Z","msg":"resource has been replicated","controller":"globaltenantresource","controllerGroup":"capsule.clastix.io","controllerKind":"GlobalTenantResource","globalTenantResource":{"name":"green-production"},"namespace":"","name":"green-production","reconcileID":"9162fbd1-cc76-4b44-a06d-a782ddfbab37","resource":"oil-production/alertmanager"}
{"level":"info","ts":"2023-01-31T15:04:21.027Z","msg":"processing completed","controller":"globaltenantresource","controllerGroup":"capsule.clastix.io","controllerKind":"GlobalTenantResource","globalTenantResource":{"name":"green-production"},"namespace":"","name":"green-production","reconcileID":"9162fbd1-cc76-4b44-a06d-a782ddfbab37"}

@prometherion prometherion removed the blocked-needs-validation Issue need triage and validation label Jan 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants