Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No-diff Sync Fails #16070

Open
antwacky opened this issue Oct 23, 2023 · 12 comments
Open

No-diff Sync Fails #16070

antwacky opened this issue Oct 23, 2023 · 12 comments
Labels
bug Something isn't working component:argo-cd type:bug version:EOL Latest confirmed affected version has reached EOL

Comments

@antwacky
Copy link

Checklist:

  • [ /] I've searched in the docs and FAQ for my answer: https://bit.ly/argocd-faq.
  • [ /] I've included steps to reproduce the bug.
  • [ /] I've pasted the output of argocd version.

Describe the bug

I restored a PVC, which added dataSource and dataSourceRef spec fields to the PVC (referring to the source VolumeSnapShot).

This rightly caused ArgoCD to report the application as out of sync, as it is unaware of these dataSource fields.

To work around this, I added the below configuration to the ArgoCD configmap:

  resource.customizations.ignoreDifferences.all: |
    jsonPointers:
      - /spec/dataSource
      - /spec/dataSourceRef
      - /spec/storageClassName

This worked, and now the PVC and application shows as 'Synced'.

However, despite the fact that ArgoCD shows now there is no diff, and the application is in sync, when I click sync, I get sync failed.

To Reproduce

  • Deploy an application with a PVC
  • Once synced, delete the PVC and restore from a VolumeSnapshot using the dataSource field
  • Configure ArgoCD configmap to ignore the relevant fields
  • Observe that the application is now 'Synced'
  • Attempt to sync
  • Observe that the sync fails

Expected behavior

I expect the application to be in sync, and syncing to be successful.

Version

argocd: v2.8.4+c279299
  BuildDate: 2023-09-13T19:12:09Z
  GitCommit: c27929928104dc37b937764baf65f38b78930e59
  GitTreeState: clean
  GoVersion: go1.20.6
  Compiler: gc
  Platform: linux/amd64

Logs

one or more objects failed to apply, reason: PersistentVolumeClaim "jenkins" is invalid: spec: Forbidden: spec is immutable after creation except resources.requests for bound claims   core.PersistentVolumeClaimSpec{    ... // 2 identical fields    Resources: {Requests: {s"storage": {i: {...}, s: "150Gi", Format: "BinarySI"}}},    VolumeName: "pvc-6cc09d41-7fd1-40b5-b79f-bfb624aa2f5d", -  StorageClassName: &"vsphere-csi", +  StorageClassName: nil,    VolumeMode: &"Filesystem",    DataSource: &{APIGroup: &"snapshot.storage.k8s.io", Kind: "VolumeSnapshot", Name: "jenkins-20-10-2023-15-19-48"},    DataSourceRef: &{APIGroup: &"snapshot.storage.k8s.io", Kind: "VolumeSnapshot", Name: "jenkins-20-10-2023-15-19-48"},   }
@antwacky antwacky added the bug Something isn't working label Oct 23, 2023
@rumstead
Copy link
Member

Have you enabled the respect ignore diff sync option?

@antwacky
Copy link
Author

I have indeed, the application has application level ignore diffs which are working.

This is a system level ignore diffs.

The app shows as synced, but clicking sync says failed, although the app is still in sync. I've tried hard refresh and deleting/recreating the application (non-cascade delete).

@rumstead
Copy link
Member

Can you share the application spec?

@antwacky
Copy link
Author

Sure, here you go:

project: default
source:
  repoURL: 'https://gitlab.com/argocd.git'
  path: ./apps/jenkins
  targetRevision: master
destination:
  server: 'https://kubernetes.default.svc'
  namespace: jenkins
syncPolicy:
  automated:
    prune: true
    selfHeal: true
  syncOptions:
    - CreateNamespace=true
    - RespectIgnoreDifferences=true
ignoreDifferences:
  - kind: Secret
    jsonPointers:
      - /data/jenkins

@rumstead
Copy link
Member

#12569 looks to be the same issue.

Nasty, but I wonder if you can ignore the entire spec.

@antwacky
Copy link
Author

I've tried that an it's still failing to sync...

It's slightly strange because as I've said, ArgoCD reports that there is no differences, and the app is in sync.

So theoretically, clicking sync shouldn't actually do anything.

project: default
source:
  repoURL: 'https://gitlab.com/argocd.git'
  path: ./apps/jenkins
  targetRevision: master
destination:
  server: 'https://kubernetes.default.svc'
  namespace: jenkins
syncPolicy:
  automated:
    prune: true
    selfHeal: true
  syncOptions:
    - CreateNamespace=true
    - RespectIgnoreDifferences=true
ignoreDifferences:
  - kind: Secret
    jsonPointers:
      - /data/jenkins
  - kind: PersistentVolumeClaim
    jsonPointers:
      - /spec

@antwacky
Copy link
Author

Looking further at the error, it actually appears that argo is complaining about the storage class name field:

// 2 identical fields    Resources: {Requests: {s"storage": {i: {...}, s: "150Gi", Format: "BinarySI"}}},    VolumeName: "pvc-6cc09d41-7fd1-40b5-b79f-bfb624aa2f5d", -  StorageClassName: &"vsphere-csi", +  StorageClassName: nil

I've tried ignoring this too, to no avail.

@antwacky
Copy link
Author

I've managed to resolve this by adding the storageClassName explicitly within the helm chart values, so ArgoCD now syncs successfully.

I'm going to leave this open though, as ignoring the storageClassName did not work.

  - kind: PersistentVolumeClaim
    jsonPointers:
      - /spec/storageClassName

Neither did ignoring the entire PVC spec.

And as mentioned previously, ArgoCD reports that the individual PVC resource is synced, so syncing should not fail as there is no work to do.

@jMarkP
Copy link
Contributor

jMarkP commented Mar 13, 2024

We have a similar issue, but with the volumeName field instead. This causes issues anytime there's a change to the manifest (e.g. a new helm chart version label) as K8s refuses to update the volumeName field.

After some digging and comparing an Application that we can sync successfully to one we can't (both using the same underlying Helm chart and very similar PVC definitions) I think I might have found the issue, and (spoilers) I think it's the same as #13965 (comment)

In our case ArgoCD is adopting resources previously created with client-side kubectl apply, and I can see in the failing Application that the argocd-controller has claimed ownership of the volumeName field:

  managedFields:
    - apiVersion: v1
      fieldsType: FieldsV1
      fieldsV1:
        'f:metadata':
          'f:annotations':
            .: {}
            'f:kubectl.kubernetes.io/last-applied-configuration': {}
            'f:pv.kubernetes.io/bind-completed': {}
            'f:pv.kubernetes.io/bound-by-controller': {}
            'f:volume.beta.kubernetes.io/storage-provisioner': {}
          'f:finalizers':
            .: {}
            'v:"kubernetes.io/pvc-protection"': {}
          'f:labels':
            .: {}
            'f:app': {}
            'f:argocd.argoproj.io/instance': {}
            'f:chartVersion': {}
            'f:instance': {}
        'f:spec':
          'f:accessModes': {}
          'f:resources':
            'f:requests':
              .: {}
              'f:storage': {}
          'f:storageClassName': {}
          'f:volumeMode': {}
          'f:volumeName': {}
      manager: argocd-controller
      operation: Apply
      time: '2024-03-12T10:06:07Z'
  ... other managers ...

Whereas in the working Application, that field is still managed by the kube-controller-manager:

  managedFields:
    - apiVersion: v1
      fieldsType: FieldsV1
      fieldsV1:
        'f:metadata':
          'f:labels':
            'f:app': {}
            'f:argocd.argoproj.io/instance': {}
            'f:chartVersion': {}
            'f:instance': {}
        'f:spec':
          'f:accessModes': {}
          'f:resources':
            'f:requests':
              'f:storage': {}
          'f:storageClassName': {}
      manager: argocd-controller
      operation: Apply
      time: '2024-03-12T09:16:03Z'
    - apiVersion: v1
      fieldsType: FieldsV1
      fieldsV1:
        'f:metadata':
          'f:annotations':
            'f:pv.kubernetes.io/bind-completed': {}
            'f:pv.kubernetes.io/bound-by-controller': {}
          'f:finalizers':
            .: {}
            'v:"kubernetes.io/pvc-protection"': {}
        'f:spec':
          'f:volumeName': {}
        'f:status':
          'f:accessModes': {}
          'f:capacity': {}
          'f:phase': {}
      manager: kube-controller-manager
      operation: Update
      time: '2021-05-10T11:16:02Z'
  ... other managers ...

So presumably when we issue a server-side apply in ArgoCD, K8s added all those fields to Argo's ownership, and now when Argo applies a patch to that resource, K8s treats a missing managed field as having its default value of empty string.

See https://kubernetes.io/docs/reference/using-api/server-side-apply/#field-management

If you remove a field from a manifest and apply that manifest, Server-Side Apply checks if there are any other field managers that also own the field. If the field is not owned by any other field managers, it is either deleted from the live object or reset to its default value, if it has one. The same rule applies to associative list or map items.

So, because our Helm chart doesn't set that volumeName field, but Argo manages it, K8s tries to patch it with empty string, and the controller rejects it.

That tallies with the sync error I see:

error when patching \"/dev/shm/1766200245\": PersistentVolumeClaim \"pvc-name\" is invalid: spec: Forbidden: spec is immutable after creation except resources.requests for bound claims
  core.PersistentVolumeClaimSpec{
  AccessModes:      {\"ReadWriteOnce\"},
  Selector:         nil,
  Resources:        {Requests: {s\"storage\": {i: {...}, s: \"100Gi\", Format: \"BinarySI\"}}},
- VolumeName:       \"pvc-...[GUID]...\",
+ VolumeName:       \"\",
  StorageClassName: &\"cinder\",
  VolumeMode:       &\"Filesystem\",
  ... // 2 identical fields
  }" syncId=00067-pHyze task="Sync/0 resource /PersistentVolumeClaim:my-namespace/pvc-name obj->obj (,,)"

We're going to try to resolve this by patching out f:volumeName from that resource's managedFields.

Based on kubernetes/kubectl#1337 it sounds like this is expected behaviour on the K8s side, so maybe it needs some warning docs on the ArgoCD side to warn about adopting old client-side applied resources into server-side-applied Applications?


Edited to add, we're running ArgoCD v2.10.1, deploying to a K8s cluster on v1.24

@jMarkP
Copy link
Contributor

jMarkP commented Mar 14, 2024

Ah, some new information - this happens for us even without server-side-apply getting involved. We believe this happens because - before we used ArgoCD to manage this deployment - the previous team injected the existing volumeName of a PVC into the kubectl apply:

kubectl -n ${instance.namespace} get pvc ${instance.name} -o json 2>/dev/null | jq -r .spec.volumeName || echo ''
helm template ... --set volumeName="${volumeName} ... | kubectl apply -f -

And that means that volumeName appears in the last-applied-configuration annotation which explains why a subsequent sync fails

@clementnuss
Copy link

thanks a lot @jMarkP ! removing the last-applied-configuration made ArgoCD happy 🙃

maybe this should be mentioned in the doc?

@andrii-korotkov-verkada
Copy link
Contributor

ArgoCD versions 2.10 and below have reached EOL. Can you upgrade and let us know if the issue is still present, please?

@andrii-korotkov-verkada andrii-korotkov-verkada added the version:EOL Latest confirmed affected version has reached EOL label Nov 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working component:argo-cd type:bug version:EOL Latest confirmed affected version has reached EOL
Projects
None yet
Development

No branches or pull requests

6 participants