Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kube-prometheus-stack stuck in OutOfSync #11074

Closed
3 tasks done
yellowhat opened this issue Oct 26, 2022 · 22 comments
Closed
3 tasks done

kube-prometheus-stack stuck in OutOfSync #11074

yellowhat opened this issue Oct 26, 2022 · 22 comments
Assignees
Labels
bug Something isn't working

Comments

@yellowhat
Copy link

Checklist:

  • I've searched in the docs and FAQ for my answer: https://bit.ly/argocd-faq.
  • I've included steps to reproduce the bug.
  • I've pasted the output of argocd version.

Describe the bug

Hi,
I am deploying the kube-prometheus-stack helm chart using ArgoCD:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: monitoring
  namespace: argocd
spec:
  project: default
  source:
    chart: kube-prometheus-stack
    repoURL: https://prometheus-community.github.io/helm-charts
    targetRevision: 41.6.1
  destination:
    namespace: monitoring
    server: https://kubernetes.default.svc
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
      - ServerSideApply=true
      - CreateNamespace=true

It create all the resources but it stays in Current sync status: OutOfSync due to the resource:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: monitoring-kube-prometheus-kubelet
...

If I click the resource, using the ArgoCD WebUI, Summary and DiffI get:

image

Expected behavior

Current sync status: Sync

Version

Argo CD: v2.5.0+b895da4
Build Date: 2022-10-25T14:40:01Z
Go Version: go1.18.7
Go Compiler: gc
Platform: linux/amd64
jsonnet: v0.18.0
kustomize: v4.5.7 2022-08-02T16:35:54Z
Helm: v3.10.1+g9f88ccb
kubectl: v0.24.2

Logs

From the argocd-application-controller-0 pod it shows:

time="2022-10-26T13:03:09Z" level=info msg="Adding resource result, status: 'Synced', phase: 'Running', message: 'servicemonitor.monitoring.coreos.com/monitoring-kube-prometheus-kubelet serverside-applied'" application=argocd/monitoring kind=ServiceMonitor name=monitoring-kube-prometheus-kubelet namespace=monitoring phase=Sync syncId=00106-mKqpp
time="2022-10-26T13:03:14Z" level=info msg="Initialized new operation: {&SyncOperation{Revision:41.6.1,Prune:true,DryRun:false,SyncStrategy:nil,Resources:[]SyncOperationResource{SyncOperationResource{Group:monitoring.coreos.com,Kind:ServiceMonitor,Name:monitoring-kube-prometheus-kubelet,Namespace:,},},Source:nil,Manifests:[],SyncOptions:[ServerSideApply=true CreateNamespace=true],} { true} [] {5 nil}}" application=argocd/monitoring
time="2022-10-26T13:03:14Z" level=info msg="Tasks (dry-run)" application=argocd/monitoring syncId=00107-rGhzV tasks="[Sync/0 resource monitoring.coreos.com/ServiceMonitor:monitoring/monitoring-kube-prometheus-kubelet obj->obj (,,)]"
time="2022-10-26T13:03:14Z" level=info msg="Applying resource ServiceMonitor/monitoring-kube-prometheus-kubelet in cluster: https://10.100.0.1:443, namespace: monitoring"
time="2022-10-26T13:03:14Z" level=info msg="Applying resource ServiceMonitor/monitoring-kube-prometheus-kubelet in cluster: https://10.100.0.1:443, namespace: monitoring"
time="2022-10-26T13:03:14Z" level=info msg="Adding resource result, status: 'Synced', phase: 'Running', message: 'servicemonitor.monitoring.coreos.com/monitoring-kube-prometheus-kubelet serverside-applied'" application=argocd/monitoring kind=ServiceMonitor name=monitoring-kube-prometheus-kubelet namespace=monitoring phase=Sync syncId=00107-rGhzV
time="2022-10-26T13:03:19Z" level=info msg="Initialized new operation: {&SyncOperation{Revision:41.6.1,Prune:true,DryRun:false,SyncStrategy:nil,Resources:[]SyncOperationResource{SyncOperationResource{Group:monitoring.coreos.com,Kind:ServiceMonitor,Name:monitoring-kube-prometheus-kubelet,Namespace:,},},Source:nil,Manifests:[],SyncOptions:[ServerSideApply=true CreateNamespace=true],} { true} [] {5 nil}}" application=argocd/monitoring
time="2022-10-26T13:03:19Z" level=info msg="Tasks (dry-run)" application=argocd/monitoring syncId=00108-awxqW tasks="[Sync/0 resource monitoring.coreos.com/ServiceMonitor:monitoring/monitoring-kube-prometheus-kubelet obj->obj (,,)]"
time="2022-10-26T13:03:19Z" level=info msg="Applying resource ServiceMonitor/monitoring-kube-prometheus-kubelet in cluster: https://10.100.0.1:443, namespace: monitoring"
time="2022-10-26T13:03:19Z" level=info msg="Applying resource ServiceMonitor/monitoring-kube-prometheus-kubelet in cluster: https://10.100.0.1:443, namespace: monitoring"
time="2022-10-26T13:03:19Z" level=info msg="Adding resource result, status: 'Synced', phase: 'Running', message: 'servicemonitor.monitoring.coreos.com/monitoring-kube-prometheus-kubelet serverside-applied'" application=argocd/monitoring kind=ServiceMonitor name=monitoring-kube-prometheus-kubelet namespace=monitoring phase=Sync syncId=00108-awxqW
time="2022-10-26T13:03:24Z" level=info msg="Initialized new operation: {&SyncOperation{Revision:41.6.1,Prune:true,DryRun:false,SyncStrategy:nil,Resources:[]SyncOperationResource{SyncOperationResource{Group:monitoring.coreos.com,Kind:ServiceMonitor,Name:monitoring-kube-prometheus-kubelet,Namespace:,},},Source:nil,Manifests:[],SyncOptions:[ServerSideApply=true CreateNamespace=true],} { true} [] {5 nil}}" application=argocd/monitoring
time="2022-10-26T13:03:24Z" level=info msg="Tasks (dry-run)" application=argocd/monitoring syncId=00109-rqkbn tasks="[Sync/0 resource monitoring.coreos.com/ServiceMonitor:monitoring/monitoring-kube-prometheus-kubelet obj->obj (,,)]"
time="2022-10-26T13:03:24Z" level=info msg="Applying resource ServiceMonitor/monitoring-kube-prometheus-kubelet in cluster: https://10.100.0.1:443, namespace: monitoring"
time="2022-10-26T13:03:24Z" level=info msg="Applying resource ServiceMonitor/monitoring-kube-prometheus-kubelet in cluster: https://10.100.0.1:443, namespace: monitoring"
time="2022-10-26T13:03:24Z" level=info msg="Adding resource result, status: 'Synced', phase: 'Running', message: 'servicemonitor.monitoring.coreos.com/monitoring-kube-prometheus-kubelet serverside-applied'" application=argocd/monitoring kind=ServiceMonitor name=monitoring-kube-prometheus-kubelet namespace=monitoring phase=Sync syncId=00109-rqkbn

Thanks

@yellowhat yellowhat added the bug Something isn't working label Oct 26, 2022
@Cowboy-coder
Copy link

I got the same issue and I verified it works without ServerSideApply=true.

Not sure if it should be tracked in a separate issue but I also had problem with Loki using ServerSideApply=true on 2.5 with this helm chart: https://github.com/grafana/loki/tree/main/production/helm/loki

Screenshot 2022-10-26 at 13 13 34

@yellowhat
Copy link
Author

Thank you very much.
Removing ServerSideApply=true. fixed the issue.
Is this still a bug?

@Cowboy-coder
Copy link

Yes I think so. At least I want to use ServerSideApply because it fixes other problems like being able to apply large CRDs etc.

@yellowhat
Copy link
Author

yellowhat commented Oct 26, 2022

Strangely, I was getting error about large CRDs (for kube-prometheus-stack) with ArgoCD 2.4.x, I was waiting for 2.5.x to include the ServerSideApply option and now it works without it.
I am a bit confused.

@yellowhat
Copy link
Author

I have just reinstalled from scratch, unfortunately ServerSideApply is required.

one or more objects failed to apply, reason: CustomResourceDefinition.apiextensions.k8s.io "prometheuses.monitoring.coreos.com" is invalid: metadata.annotations: Too long: must have at most 262144 bytes,resource mapping not found for name: "system-kube-prometheus-sta-prometheus" namespace: "system" from "/dev/shm/3789263989": no matches for kind "Prometheus" in version "monitoring.coreos.com/v1" ensure CRDs are installed first. Retrying attempt #5 at 2:58PM.

@leoluz leoluz self-assigned this Oct 27, 2022
@leoluz
Copy link
Collaborator

leoluz commented Oct 27, 2022

I am not sure but I'd like to understand if this is a problem with Helm Charts only or if it can be reproduced with Kustomize as well? If so, can someone provide a Kustomize based Application yaml so I can use for tests?

@leoluz
Copy link
Collaborator

leoluz commented Oct 27, 2022

I got the same issue and I verified it works without ServerSideApply=true.

Not sure if it should be tracked in a separate issue but I also had problem with Loki using ServerSideApply=true on 2.5 with this helm chart: https://github.com/grafana/loki/tree/main/production/helm/loki

Screenshot 2022-10-26 at 13 13 34

@Cowboy-coder Can you provide the full yaml of your live resource? Please include the metadata.managedFields

@msw-kialo
Copy link

@leoluz I have the same issue / background. Currently, syncing kube-prometheus-stack with Replace=true for the too big CRDs and testing SSA to resolve that hack.

prometheus-operator-kubelet live manifest (after recreating it with ServerSideApply=true on 2.5.0)
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  creationTimestamp: "2022-10-27T16:03:32Z"
  generation: 1
  labels:
    app: prometheus-operator-kubelet
    app.kubernetes.io/instance: prometheus-operator
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/part-of: prometheus-operator
    app.kubernetes.io/version: 40.5.0
    chart: kube-prometheus-stack-40.5.0
    heritage: Helm
    release: prometheus-operator
  name: prometheus-operator-kubelet
  namespace: services
  resourceVersion: "97862637"
  uid: f8e4315e-ff4c-46ae-b86b-9a3c51cfd9c1
spec:
  endpoints:
  - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
    honorLabels: true
    port: https-metrics
    relabelings:
    - action: replace
      sourceLabels:
      - __metrics_path__
      targetLabel: metrics_path
    scheme: https
    tlsConfig:
      caFile: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      insecureSkipVerify: true
  - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
    honorLabels: true
    metricRelabelings:
    - action: drop
      regex: container_cpu_(cfs_throttled_seconds_total|load_average_10s|system_seconds_total|user_seconds_total)
      sourceLabels:
      - __name__
    - action: drop
      regex: container_fs_(io_current|io_time_seconds_total|io_time_weighted_seconds_total|reads_merged_total|sector_reads_total|sector_writes_total|writes_merged_total)
      sourceLabels:
      - __name__
    - action: drop
      regex: container_memory_(mapped_file|swap)
      sourceLabels:
      - __name__
    - action: drop
      regex: container_(file_descriptors|tasks_state|threads_max)
      sourceLabels:
      - __name__
    - action: drop
      regex: container_spec.*
      sourceLabels:
      - __name__
    - action: drop
      regex: .+;
      sourceLabels:
      - id
      - pod
    path: /metrics/cadvisor
    port: https-metrics
    relabelings:
    - action: replace
      sourceLabels:
      - __metrics_path__
      targetLabel: metrics_path
    scheme: https
    tlsConfig:
      caFile: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      insecureSkipVerify: true
  - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
    honorLabels: true
    path: /metrics/probes
    port: https-metrics
    relabelings:
    - action: replace
      sourceLabels:
      - __metrics_path__
      targetLabel: metrics_path
    scheme: https
    tlsConfig:
      caFile: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      insecureSkipVerify: true
  jobLabel: k8s-app
  namespaceSelector:
    matchNames:
    - kube-system
  selector:
    matchLabels:
      app.kubernetes.io/name: kubelet
      k8s-app: kubelet
Desired resource definition (generated locally via helm template; but synced via kustomize by ArgoCD)
---
# Source: kube-prometheus-stack/templates/exporters/kubelet/servicemonitor.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: prometheus-operator-kubelet
  namespace: services
  labels:
    app: prometheus-operator-kubelet    
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/instance: prometheus-operator
    app.kubernetes.io/version: "40.5.0"
    app.kubernetes.io/part-of: prometheus-operator
    chart: kube-prometheus-stack-40.5.0
    release: "prometheus-operator"
    heritage: "Helm"
spec:
  endpoints:
  - port: https-metrics
    scheme: https
    tlsConfig:
      caFile: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      insecureSkipVerify: true
    bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
    honorLabels: true
    relabelings:
    - sourceLabels:
      - __metrics_path__
      targetLabel: metrics_path
  - port: https-metrics
    scheme: https
    path: /metrics/cadvisor
    honorLabels: true
    tlsConfig:
      caFile: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      insecureSkipVerify: true
    bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
    metricRelabelings:
    - action: drop
      regex: container_cpu_(cfs_throttled_seconds_total|load_average_10s|system_seconds_total|user_seconds_total)
      sourceLabels:
      - __name__
    - action: drop
      regex: container_fs_(io_current|io_time_seconds_total|io_time_weighted_seconds_total|reads_merged_total|sector_reads_total|sector_writes_total|writes_merged_total)
      sourceLabels:
      - __name__
    - action: drop
      regex: container_memory_(mapped_file|swap)
      sourceLabels:
      - __name__
    - action: drop
      regex: container_(file_descriptors|tasks_state|threads_max)
      sourceLabels:
      - __name__
    - action: drop
      regex: container_spec.*
      sourceLabels:
      - __name__
    - action: drop
      regex: .+;
      sourceLabels:
      - id
      - pod
    relabelings:
    - sourceLabels:
      - __metrics_path__
      targetLabel: metrics_path
  - port: https-metrics
    scheme: https
    path: /metrics/probes
    honorLabels: true
    tlsConfig:
      caFile: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      insecureSkipVerify: true
    bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
    relabelings:
    - sourceLabels:
      - __metrics_path__
      targetLabel: metrics_path
  jobLabel: k8s-app
  namespaceSelector:
    matchNames:
    - kube-system
  selector:
    matchLabels:
      app.kubernetes.io/name: kubelet
      k8s-app: kubelet

I suspect the default value from the CRD plays a role here.

@Cowboy-coder
Copy link

Cowboy-coder commented Oct 28, 2022

@leoluz

Live manifest
apiVersion: apps/v1
kind: StatefulSet
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: >
      {"apiVersion":"apps/v1","kind":"StatefulSet","metadata":{"annotations":{},"labels":{"app.kubernetes.io/component":"write","app.kubernetes.io/instance":"loki","app.kubernetes.io/managed-by":"Helm","app.kubernetes.io/name":"loki","app.kubernetes.io/part-of":"memberlist","app.kubernetes.io/version":"2.6.1","argocd.argoproj.io/instance":"logging","helm.sh/chart":"loki-3.2.0"},"name":"loki-write","namespace":"logging"},"spec":{"podManagementPolicy":"Parallel","replicas":3,"revisionHistoryLimit":10,"selector":{"matchLabels":{"app.kubernetes.io/component":"write","app.kubernetes.io/instance":"loki","app.kubernetes.io/name":"loki"}},"serviceName":"loki-write-headless","template":{"metadata":{"annotations":{"checksum/config":"dc4356fb9c8ae2285982e39f348eaa3087a7bd09084224adb6915903fdf04574"},"labels":{"app.kubernetes.io/component":"write","app.kubernetes.io/instance":"loki","app.kubernetes.io/name":"loki","app.kubernetes.io/part-of":"memberlist"}},"spec":{"affinity":{"podAntiAffinity":{"requiredDuringSchedulingIgnoredDuringExecution":[{"labelSelector":{"matchLabels":{"app.kubernetes.io/component":"write","app.kubernetes.io/instance":"loki","app.kubernetes.io/name":"loki"}},"topologyKey":"kubernetes.io/hostname"}]}},"automountServiceAccountToken":true,"containers":[{"args":["-config.file=/etc/loki/config/config.yaml","-target=write"],"env":[{"name":"AWS_ACCESS_KEY_ID","valueFrom":{"secretKeyRef":{"key":"AWS_ACCESS_KEY_ID","name":"loki-s3"}}},{"name":"AWS_SECRET_ACCESS_KEY","valueFrom":{"secretKeyRef":{"key":"AWS_SECRET_ACCESS_KEY","name":"loki-s3"}}}],"image":"docker.io/grafana/loki:2.6.1","imagePullPolicy":"IfNotPresent","name":"write","ports":[{"containerPort":3100,"name":"http-metrics","protocol":"TCP"},{"containerPort":9095,"name":"grpc","protocol":"TCP"},{"containerPort":7946,"name":"http-memberlist","protocol":"TCP"}],"readinessProbe":{"httpGet":{"path":"/ready","port":"http-metrics"},"initialDelaySeconds":30,"timeoutSeconds":1},"resources":{},"securityContext":{"allowPrivilegeEscalation":false,"capabilities":{"drop":["ALL"]},"readOnlyRootFilesystem":true},"volumeMounts":[{"mountPath":"/etc/loki/config","name":"config"},{"mountPath":"/var/loki","name":"data"}]}],"securityContext":{"fsGroup":10001,"runAsGroup":10001,"runAsNonRoot":true,"runAsUser":10001},"serviceAccountName":"loki","terminationGracePeriodSeconds":300,"volumes":[{"configMap":{"name":"loki"},"name":"config"}]}},"updateStrategy":{"rollingUpdate":{"partition":0}},"volumeClaimTemplates":[{"metadata":{"name":"data"},"spec":{"accessModes":["ReadWriteOnce"],"resources":{"requests":{"storage":"10Gi"}},"storageClassName":"openebs-hostpath"}}]}}
  creationTimestamp: '2022-10-26T09:13:13Z'
  generation: 1
  labels:
    app.kubernetes.io/component: write
    app.kubernetes.io/instance: loki
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: loki
    app.kubernetes.io/part-of: memberlist
    app.kubernetes.io/version: 2.6.1
    argocd.argoproj.io/instance: logging
    helm.sh/chart: loki-3.2.0
  managedFields:
    - apiVersion: apps/v1
      fieldsType: FieldsV1
      fieldsV1:
        'f:metadata':
          'f:labels':
            'f:app.kubernetes.io/component': {}
            'f:app.kubernetes.io/instance': {}
            'f:app.kubernetes.io/managed-by': {}
            'f:app.kubernetes.io/name': {}
            'f:app.kubernetes.io/part-of': {}
            'f:app.kubernetes.io/version': {}
            'f:argocd.argoproj.io/instance': {}
            'f:helm.sh/chart': {}
        'f:spec':
          'f:podManagementPolicy': {}
          'f:replicas': {}
          'f:revisionHistoryLimit': {}
          'f:selector': {}
          'f:serviceName': {}
          'f:template':
            'f:metadata':
              'f:annotations':
                'f:checksum/config': {}
              'f:labels':
                'f:app.kubernetes.io/component': {}
                'f:app.kubernetes.io/instance': {}
                'f:app.kubernetes.io/name': {}
                'f:app.kubernetes.io/part-of': {}
            'f:spec':
              'f:affinity':
                'f:podAntiAffinity':
                  'f:requiredDuringSchedulingIgnoredDuringExecution': {}
              'f:automountServiceAccountToken': {}
              'f:containers':
                'k:{"name":"write"}':
                  .: {}
                  'f:args': {}
                  'f:env':
                    'k:{"name":"AWS_ACCESS_KEY_ID"}':
                      .: {}
                      'f:name': {}
                      'f:valueFrom':
                        'f:secretKeyRef': {}
                    'k:{"name":"AWS_SECRET_ACCESS_KEY"}':
                      .: {}
                      'f:name': {}
                      'f:valueFrom':
                        'f:secretKeyRef': {}
                  'f:image': {}
                  'f:imagePullPolicy': {}
                  'f:name': {}
                  'f:ports':
                    'k:{"containerPort":3100,"protocol":"TCP"}':
                      .: {}
                      'f:containerPort': {}
                      'f:name': {}
                      'f:protocol': {}
                    'k:{"containerPort":7946,"protocol":"TCP"}':
                      .: {}
                      'f:containerPort': {}
                      'f:name': {}
                      'f:protocol': {}
                    'k:{"containerPort":9095,"protocol":"TCP"}':
                      .: {}
                      'f:containerPort': {}
                      'f:name': {}
                      'f:protocol': {}
                  'f:readinessProbe':
                    'f:httpGet':
                      'f:path': {}
                      'f:port': {}
                    'f:initialDelaySeconds': {}
                    'f:timeoutSeconds': {}
                  'f:resources': {}
                  'f:securityContext':
                    'f:allowPrivilegeEscalation': {}
                    'f:capabilities':
                      'f:drop': {}
                    'f:readOnlyRootFilesystem': {}
                  'f:volumeMounts':
                    'k:{"mountPath":"/etc/loki/config"}':
                      .: {}
                      'f:mountPath': {}
                      'f:name': {}
                    'k:{"mountPath":"/var/loki"}':
                      .: {}
                      'f:mountPath': {}
                      'f:name': {}
              'f:securityContext':
                'f:fsGroup': {}
                'f:runAsGroup': {}
                'f:runAsNonRoot': {}
                'f:runAsUser': {}
              'f:serviceAccountName': {}
              'f:terminationGracePeriodSeconds': {}
              'f:volumes':
                'k:{"name":"config"}':
                  .: {}
                  'f:configMap':
                    'f:name': {}
                  'f:name': {}
          'f:updateStrategy':
            'f:rollingUpdate':
              'f:partition': {}
          'f:volumeClaimTemplates': {}
      manager: argocd-controller
      operation: Apply
      time: '2022-10-28T07:36:52Z'
    - apiVersion: apps/v1
      fieldsType: FieldsV1
      fieldsV1:
        'f:metadata':
          'f:annotations': {}
          'f:labels':
            .: {}
            'f:app.kubernetes.io/component': {}
            'f:app.kubernetes.io/managed-by': {}
            'f:app.kubernetes.io/name': {}
            'f:app.kubernetes.io/part-of': {}
            'f:app.kubernetes.io/version': {}
            'f:helm.sh/chart': {}
        'f:spec':
          'f:podManagementPolicy': {}
          'f:replicas': {}
          'f:revisionHistoryLimit': {}
          'f:selector': {}
          'f:serviceName': {}
          'f:template':
            'f:metadata':
              'f:annotations':
                .: {}
                'f:checksum/config': {}
              'f:labels':
                .: {}
                'f:app.kubernetes.io/component': {}
                'f:app.kubernetes.io/instance': {}
                'f:app.kubernetes.io/name': {}
                'f:app.kubernetes.io/part-of': {}
            'f:spec':
              'f:affinity':
                .: {}
                'f:podAntiAffinity':
                  .: {}
                  'f:requiredDuringSchedulingIgnoredDuringExecution': {}
              'f:automountServiceAccountToken': {}
              'f:containers':
                'k:{"name":"write"}':
                  .: {}
                  'f:args': {}
                  'f:env':
                    .: {}
                    'k:{"name":"AWS_ACCESS_KEY_ID"}':
                      .: {}
                      'f:name': {}
                      'f:valueFrom':
                        .: {}
                        'f:secretKeyRef': {}
                    'k:{"name":"AWS_SECRET_ACCESS_KEY"}':
                      .: {}
                      'f:name': {}
                      'f:valueFrom':
                        .: {}
                        'f:secretKeyRef': {}
                  'f:image': {}
                  'f:imagePullPolicy': {}
                  'f:name': {}
                  'f:ports':
                    .: {}
                    'k:{"containerPort":3100,"protocol":"TCP"}':
                      .: {}
                      'f:containerPort': {}
                      'f:name': {}
                      'f:protocol': {}
                    'k:{"containerPort":7946,"protocol":"TCP"}':
                      .: {}
                      'f:containerPort': {}
                      'f:name': {}
                      'f:protocol': {}
                    'k:{"containerPort":9095,"protocol":"TCP"}':
                      .: {}
                      'f:containerPort': {}
                      'f:name': {}
                      'f:protocol': {}
                  'f:readinessProbe':
                    .: {}
                    'f:failureThreshold': {}
                    'f:httpGet':
                      .: {}
                      'f:path': {}
                      'f:port': {}
                      'f:scheme': {}
                    'f:initialDelaySeconds': {}
                    'f:periodSeconds': {}
                    'f:successThreshold': {}
                    'f:timeoutSeconds': {}
                  'f:resources': {}
                  'f:securityContext':
                    .: {}
                    'f:allowPrivilegeEscalation': {}
                    'f:capabilities':
                      .: {}
                      'f:drop': {}
                    'f:readOnlyRootFilesystem': {}
                  'f:terminationMessagePath': {}
                  'f:terminationMessagePolicy': {}
                  'f:volumeMounts':
                    .: {}
                    'k:{"mountPath":"/etc/loki/config"}':
                      .: {}
                      'f:mountPath': {}
                      'f:name': {}
                    'k:{"mountPath":"/var/loki"}':
                      .: {}
                      'f:mountPath': {}
                      'f:name': {}
              'f:dnsPolicy': {}
              'f:restartPolicy': {}
              'f:schedulerName': {}
              'f:securityContext':
                .: {}
                'f:fsGroup': {}
                'f:runAsGroup': {}
                'f:runAsNonRoot': {}
                'f:runAsUser': {}
              'f:serviceAccount': {}
              'f:serviceAccountName': {}
              'f:terminationGracePeriodSeconds': {}
              'f:volumes':
                .: {}
                'k:{"name":"config"}':
                  .: {}
                  'f:configMap':
                    .: {}
                    'f:defaultMode': {}
                    'f:name': {}
                  'f:name': {}
          'f:updateStrategy':
            'f:rollingUpdate':
              .: {}
              'f:partition': {}
            'f:type': {}
      manager: argocd-application-controller
      operation: Update
      time: '2022-10-26T09:13:13Z'
    - apiVersion: apps/v1
      fieldsType: FieldsV1
      fieldsV1:
        'f:status':
          'f:availableReplicas': {}
          'f:collisionCount': {}
          'f:currentReplicas': {}
          'f:currentRevision': {}
          'f:observedGeneration': {}
          'f:readyReplicas': {}
          'f:replicas': {}
          'f:updateRevision': {}
          'f:updatedReplicas': {}
      manager: kube-controller-manager
      operation: Update
      subresource: status
      time: '2022-10-26T09:18:53Z'
    - apiVersion: apps/v1
      fieldsType: FieldsV1
      fieldsV1:
        'f:metadata':
          'f:annotations':
            'f:kubectl.kubernetes.io/last-applied-configuration': {}
          'f:labels':
            'f:app.kubernetes.io/instance': {}
            'f:argocd.argoproj.io/instance': {}
      manager: argocd-controller
      operation: Update
      time: '2022-10-28T07:19:13Z'
  name: loki-write
  namespace: logging
  resourceVersion: '46346521'
  uid: 159449f2-01c3-4ee1-8b91-2e1e90c1e9eb
spec:
  podManagementPolicy: Parallel
  replicas: 3
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app.kubernetes.io/component: write
      app.kubernetes.io/instance: loki
      app.kubernetes.io/name: loki
  serviceName: loki-write-headless
  template:
    metadata:
      annotations:
        checksum/config: dc4356fb9c8ae2285982e39f348eaa3087a7bd09084224adb6915903fdf04574
      creationTimestamp: null
      labels:
        app.kubernetes.io/component: write
        app.kubernetes.io/instance: loki
        app.kubernetes.io/name: loki
        app.kubernetes.io/part-of: memberlist
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchLabels:
                  app.kubernetes.io/component: write
                  app.kubernetes.io/instance: loki
                  app.kubernetes.io/name: loki
              topologyKey: kubernetes.io/hostname
      automountServiceAccountToken: true
      containers:
        - args:
            - '-config.file=/etc/loki/config/config.yaml'
            - '-target=write'
          env:
            - name: AWS_ACCESS_KEY_ID
              valueFrom:
                secretKeyRef:
                  key: AWS_ACCESS_KEY_ID
                  name: loki-s3
            - name: AWS_SECRET_ACCESS_KEY
              valueFrom:
                secretKeyRef:
                  key: AWS_SECRET_ACCESS_KEY
                  name: loki-s3
          image: 'docker.io/grafana/loki:2.6.1'
          imagePullPolicy: IfNotPresent
          name: write
          ports:
            - containerPort: 3100
              name: http-metrics
              protocol: TCP
            - containerPort: 9095
              name: grpc
              protocol: TCP
            - containerPort: 7946
              name: http-memberlist
              protocol: TCP
          readinessProbe:
            failureThreshold: 3
            httpGet:
              path: /ready
              port: http-metrics
              scheme: HTTP
            initialDelaySeconds: 30
            periodSeconds: 10
            successThreshold: 1
            timeoutSeconds: 1
          resources: {}
          securityContext:
            allowPrivilegeEscalation: false
            capabilities:
              drop:
                - ALL
            readOnlyRootFilesystem: true
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          volumeMounts:
            - mountPath: /etc/loki/config
              name: config
            - mountPath: /var/loki
              name: data
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext:
        fsGroup: 10001
        runAsGroup: 10001
        runAsNonRoot: true
        runAsUser: 10001
      serviceAccount: loki
      serviceAccountName: loki
      terminationGracePeriodSeconds: 300
      volumes:
        - configMap:
            defaultMode: 420
            name: loki
          name: config
  updateStrategy:
    rollingUpdate:
      partition: 0
    type: RollingUpdate
  volumeClaimTemplates:
    - apiVersion: v1
      kind: PersistentVolumeClaim
      metadata:
        creationTimestamp: null
        name: data
      spec:
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 10Gi
        storageClassName: openebs-hostpath
        volumeMode: Filesystem
      status:
        phase: Pending
status:
  availableReplicas: 3
  collisionCount: 0
  currentReplicas: 3
  currentRevision: loki-write-68f4b7bcfc
  observedGeneration: 1
  readyReplicas: 3
  replicas: 3
  updateRevision: loki-write-68f4b7bcfc
  updatedReplicas: 3
Desired manifest
apiVersion: apps/v1
kind: StatefulSet
metadata:
  labels:
    app.kubernetes.io/component: write
    app.kubernetes.io/instance: loki
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: loki
    app.kubernetes.io/part-of: memberlist
    app.kubernetes.io/version: 2.6.1
    argocd.argoproj.io/instance: logging
    helm.sh/chart: loki-3.2.0
  name: loki-write
  namespace: logging
spec:
  podManagementPolicy: Parallel
  replicas: 3
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app.kubernetes.io/component: write
      app.kubernetes.io/instance: loki
      app.kubernetes.io/name: loki
  serviceName: loki-write-headless
  template:
    metadata:
      annotations:
        checksum/config: dc4356fb9c8ae2285982e39f348eaa3087a7bd09084224adb6915903fdf04574
      labels:
        app.kubernetes.io/component: write
        app.kubernetes.io/instance: loki
        app.kubernetes.io/name: loki
        app.kubernetes.io/part-of: memberlist
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchLabels:
                  app.kubernetes.io/component: write
                  app.kubernetes.io/instance: loki
                  app.kubernetes.io/name: loki
              topologyKey: kubernetes.io/hostname
      automountServiceAccountToken: true
      containers:
        - args:
            - '-config.file=/etc/loki/config/config.yaml'
            - '-target=write'
          env:
            - name: AWS_ACCESS_KEY_ID
              valueFrom:
                secretKeyRef:
                  key: AWS_ACCESS_KEY_ID
                  name: loki-s3
            - name: AWS_SECRET_ACCESS_KEY
              valueFrom:
                secretKeyRef:
                  key: AWS_SECRET_ACCESS_KEY
                  name: loki-s3
          image: 'docker.io/grafana/loki:2.6.1'
          imagePullPolicy: IfNotPresent
          name: write
          ports:
            - containerPort: 3100
              name: http-metrics
              protocol: TCP
            - containerPort: 9095
              name: grpc
              protocol: TCP
            - containerPort: 7946
              name: http-memberlist
              protocol: TCP
          readinessProbe:
            httpGet:
              path: /ready
              port: http-metrics
            initialDelaySeconds: 30
            timeoutSeconds: 1
          resources: {}
          securityContext:
            allowPrivilegeEscalation: false
            capabilities:
              drop:
                - ALL
            readOnlyRootFilesystem: true
          volumeMounts:
            - mountPath: /etc/loki/config
              name: config
            - mountPath: /var/loki
              name: data
      securityContext:
        fsGroup: 10001
        runAsGroup: 10001
        runAsNonRoot: true
        runAsUser: 10001
      serviceAccountName: loki
      terminationGracePeriodSeconds: 300
      volumes:
        - configMap:
            name: loki
          name: config
  updateStrategy:
    rollingUpdate:
      partition: 0
  volumeClaimTemplates:
    - metadata:
        name: data
      spec:
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 10Gi
        storageClassName: openebs-hostpath

I also saw this old issue #4126 related to what looks like the same problem.

Apart from that I also now see another issue using the loki helmchart in a ServiceMonitor - Probably same issue as with kube-prometheus-stack helm chart.

Screenshot 2022-10-28 at 09 50 26

Live for loki servicemonitor
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: >
      {"apiVersion":"monitoring.coreos.com/v1","kind":"ServiceMonitor","metadata":{"annotations":{},"labels":{"app.kubernetes.io/instance":"loki","app.kubernetes.io/managed-by":"Helm","app.kubernetes.io/name":"loki","app.kubernetes.io/version":"2.6.1","argocd.argoproj.io/instance":"logging","helm.sh/chart":"loki-3.2.0"},"name":"loki","namespace":"logging"},"spec":{"endpoints":[{"path":"/metrics","port":"http-metrics","relabelings":[{"replacement":"logging/$1","sourceLabels":["job"],"targetLabel":"job"},{"replacement":"loki","targetLabel":"cluster"}],"scheme":"http"}],"selector":{"matchExpressions":[{"key":"prometheus.io/service-monitor","operator":"NotIn","values":["false"]}],"matchLabels":{"app.kubernetes.io/instance":"loki","app.kubernetes.io/name":"loki"}}}}
  creationTimestamp: '2022-10-27T21:13:07Z'
  generation: 1
  labels:
    app.kubernetes.io/instance: loki
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: loki
    app.kubernetes.io/version: 2.6.1
    argocd.argoproj.io/instance: logging
    helm.sh/chart: loki-3.2.0
  managedFields:
    - apiVersion: monitoring.coreos.com/v1
      fieldsType: FieldsV1
      fieldsV1:
        'f:metadata':
          'f:labels':
            'f:app.kubernetes.io/instance': {}
            'f:app.kubernetes.io/managed-by': {}
            'f:app.kubernetes.io/name': {}
            'f:app.kubernetes.io/version': {}
            'f:argocd.argoproj.io/instance': {}
            'f:helm.sh/chart': {}
        'f:spec':
          'f:endpoints': {}
          'f:selector': {}
      manager: argocd-controller
      operation: Apply
      time: '2022-10-28T07:50:32Z'
    - apiVersion: monitoring.coreos.com/v1
      fieldsType: FieldsV1
      fieldsV1:
        'f:metadata':
          'f:annotations':
            .: {}
            'f:kubectl.kubernetes.io/last-applied-configuration': {}
          'f:labels':
            .: {}
            'f:app.kubernetes.io/instance': {}
            'f:app.kubernetes.io/managed-by': {}
            'f:app.kubernetes.io/name': {}
            'f:app.kubernetes.io/version': {}
            'f:argocd.argoproj.io/instance': {}
            'f:helm.sh/chart': {}
        'f:spec':
          .: {}
          'f:selector': {}
      manager: argocd-controller
      operation: Update
      time: '2022-10-28T07:19:13Z'
  name: loki
  namespace: logging
  resourceVersion: '46358803'
  uid: a7df54d3-fa2a-4e63-b1a2-b2a643ff06bb
spec:
  endpoints:
    - path: /metrics
      port: http-metrics
      relabelings:
        - action: replace
          replacement: logging/$1
          sourceLabels:
            - job
          targetLabel: job
        - action: replace
          replacement: loki
          targetLabel: cluster
      scheme: http
  selector:
    matchExpressions:
      - key: prometheus.io/service-monitor
        operator: NotIn
        values:
          - 'false'
    matchLabels:
      app.kubernetes.io/instance: loki
      app.kubernetes.io/name: loki
Desired for loki servicemonitor
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    app.kubernetes.io/instance: loki
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: loki
    app.kubernetes.io/version: 2.6.1
    argocd.argoproj.io/instance: logging
    helm.sh/chart: loki-3.2.0
  name: loki
  namespace: logging
spec:
  endpoints:
    - path: /metrics
      port: http-metrics
      relabelings:
        - replacement: logging/$1
          sourceLabels:
            - job
          targetLabel: job
        - replacement: loki
          targetLabel: cluster
      scheme: http
  selector:
    matchExpressions:
      - key: prometheus.io/service-monitor
        operator: NotIn
        values:
          - 'false'
    matchLabels:
      app.kubernetes.io/instance: loki
      app.kubernetes.io/name: loki

@yonahd
Copy link

yonahd commented Oct 30, 2022

Currently to workaround I just added

metadata:
  annotations:
    argocd.argoproj.io/sync-options: ServerSideApply=true

To the dashboards that were too long and resources that needed it

@hsharrison
Copy link

I'm having the same issue with a Crossplan Composition
image

Like the previous examples, fields within arrays are getting added by CRD defaults.
However, I am not using server-side apply.

@leoluz
Copy link
Collaborator

leoluz commented Nov 1, 2022

I'm having the same issue with a Crossplan Composition
Like the previous examples, fields within arrays are getting added by CRD defaults.
However, I am not using server-side apply.

@hsharrison This is actually expected. There is currently a limitation in Argo CD, only for CRDs, that avoids default values to be considered during diff calculation. This affects Argo CD Application status as it thinks it is out-of-sync when in fact it isn't. All Application's resources in the cluster are correctly applied. This limitation just affects Argo CD diff logic for CRDs.

While there is no fix for the current CRD diff limitation for default values the suggested workaround is configuring ignoreDifference in the application level:
https://argo-cd.readthedocs.io/en/stable/user-guide/diffing/

@moritz31
Copy link

moritz31 commented Nov 1, 2022

i've resolved the issue by basically using the same approach as mentioned here:
https://blog.ediri.io/kube-prometheus-stack-and-argocd-23-how-to-remove-a-workaround

just replaced the Replace with ServerSideApply

@brunocascio
Copy link

brunocascio commented Nov 1, 2022

I installed it using this way:

helmCharts:
- name: kube-prometheus-stack
  repo: https://prometheus-community.github.io/helm-charts
  version: 41.7.0
  releaseName: kube-prometheus-stack
  namespace: kube-prometheus-stack
  includeCRDs: true
  valuesFile: values.yml

patches:
  - patchAnnotationTooLong.yml

Where patchAnnotationTooLong.yml contains:

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  annotations:
    argocd.argoproj.io/sync-options: Replace=true
  name: prometheuses.monitoring.coreos.com

It fixes the annotation too long error

@leoluz
Copy link
Collaborator

leoluz commented Nov 1, 2022

Like mentioned above, there are a few approaches that can be used to address this issue in Argo CD:

  1. If deploying with Kustomize and patch the CRDs with Replace=true annotation
  2. If deploying with Helm, first wrap it inside a kustomize project so you can patch CRD like in 1
  3. Create multiple Argo CD applications: One without CRDs syncing normally and another one just with CRDs syncing with Replace=true

All the approaches above will fix the problem but require some amount of work to be done.

In Argo CD 2.5 you can now use ServerSideApply to avoid the error with big CRDs while syncing. However, Argo CD is unable to consider CRD default values during diff calculation which causes it to show resources out-of-sync when in fact they aren't. To address this issue with the minimal amount of work users can leverage ignoreDifferences configuration.

To deploy prometheus stack with Argo CD you can apply this Application resource:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: monitoring
  namespace: argocd
spec:
  project: default
  source:
    chart: kube-prometheus-stack
    repoURL: https://prometheus-community.github.io/helm-charts
    targetRevision: 41.6.1
  destination:
    namespace: monitoring
    server: https://kubernetes.default.svc
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
      - ServerSideApply=true
      - CreateNamespace=true
  ignoreDifferences:
  - group: monitoring.coreos.com
    kind: ServiceMonitor
    jqPathExpressions:
    - .spec.endpoints[]?.relabelings[]?.action

With this approach users don't need to create an additional project to patch CRDs. Everything can be configured from within the Application resource. Note that every default value can be added to the jqPathExpression list, like in the example above, and it will be ignored during diff calculation.

Ideally Argo CD should be able to retrieve all schemas from the target cluster with the proper structure so it can be used to consider CRD default values during diff calculation. I created the following issue to track this enhancement (#11139). Please vote for it if you want to see it implemented.

Closing this issue for now.

@leoluz leoluz closed this as completed Nov 1, 2022
@Cowboy-coder
Copy link

But how come this works without any diff when using client-side apply? Does client-side-apply have some special handling for these issues that server-side-apply doesn't?

@leoluz
Copy link
Collaborator

leoluz commented Nov 1, 2022

@Cowboy-coder client-side-apply diff is based on patches calculated with 3-way-diff using desired state, live state and last-applied-configuration annotation. Server-side-apply diffs is a brand new implementation that uses the same library used by kubernetes while applying resources server side which leverages the managedFields to inspect and define fields ownership.

@leoluz
Copy link
Collaborator

leoluz commented Nov 1, 2022

I got the same issue and I verified it works without ServerSideApply=true.

Not sure if it should be tracked in a separate issue but I also had problem with Loki using ServerSideApply=true on 2.5 with this helm chart: https://github.com/grafana/loki/tree/main/production/helm/loki

Screenshot 2022-10-26 at 13 13 34

@Cowboy-coder Can you confirm if this diff is in StatefulSet? If so, this is another edgecase with statefulsets and it is better if tracked as a separate issue.

@Cowboy-coder
Copy link

@leoluz Yes, this is a StatefulSet. Do you want me to create the issue?

@leoluz
Copy link
Collaborator

leoluz commented Nov 1, 2022

@Cowboy-coder yes please.. Just copy/paste the statefulset details in the new ticket from your previous comment:
#11074 (comment)

@lknite
Copy link

lknite commented May 2, 2023

The ignoredifferences solution above worked for me except I had to specify something different to match:

      ignoreDifferences:
      - group: monitoring.coreos.com
        kind: ServiceMonitor
        jqPathExpressions:
        - .metadata.annotations

bossjones added a commit to bossjones/k3d-playground that referenced this issue Jan 25, 2024
InsomniaCoder added a commit to InsomniaCoder/argo-bootstrap-app that referenced this issue Jan 31, 2024
dfry added a commit to mojaloop/iac-modules that referenced this issue Apr 4, 2024
dfry added a commit to mojaloop/iac-modules that referenced this issue Apr 5, 2024
aaronreynoza pushed a commit to mojaloop/iac-modules that referenced this issue Apr 10, 2024
@yardenws
Copy link

yardenws commented May 23, 2024

The issue also occurs with relabling via remoteWrite.writeRelabelConfigs section.
Can be solved like so:

  ignoreDifferences:
  - group: monitoring.coreos.com
    kind: Prometheus
    jqPathExpressions:
    - .spec.remoteWrite[]?.writeRelabelConfigs[]?.action

kleyow added a commit to mojaloop/iac-modules that referenced this issue Jun 12, 2024
* fix: unreasonably high delays for probes

* Adding refresh remplates for bootstrap

* Renaming GITLAB TOKEN

* Adding comments

* Change in permission

* including setcivars script

* Integrate minio with Loki

* refactor

* ensure only 1 nat gw

* fix subnet ordering

* Adding generation of custom config for pm4ml

* Adding default config tag changes

* Revert "Merge pull request #196 from mojaloop/muz/iprod-502/integrate-minio-with-loki"

This reverts commit 3f6325a, reversing
changes made to c156aa6.

* placeholders for vnext add

* missed appdeploy placeholder

* 2nd draft

* feat: enhance mysql logging

* fix stateful resource env vars, new values file

* add missing vars

* adding sts.json in list

* correction

* fix configs for stateful resources

* fix mongodb secret naming, add vnext app

* clean up missing vars

* another missing var

* fix chart repo

* fix anchors

* fix: use local storage by default

* disable ingresses

* add es and reconfigure mongo url secret

* bump release

* fix secret name

* bump version

* fix path

* fix name again

* add service confs to value

* bump version

* try root pw on mongodb no db

* fix: added custom dumper for pm4ml merge function

* blanket apply env vars

* updaate topics for kafka

* fix hostnames in nginx.conf

* try adding ttk

* fix ttk config

* turn off ingress

* add api base url

* add admin ui vs

* Adding minio provider, minio tf code for loki and loghorn data storage

* Bringing the docker volume size to env.yaml

* Fixing the typo

* provider config

* Adding output deps

* Adding stored params

* correcting longhorn typo

* Changing the attrbut name

* Changes for accessing minio loki creds

* Adding to kustmz

* passing external_secret_sync_wave

* correcting the secret name

* Adding converstion and decoding strategy

* Adding minio config in loki values

* debug

* fixing the retrval

* Correcting the minio api port

* adding policy attachemnt

* removing taint

* adding changes for longhorn backup

* adding data resource for longhorn bucket

* correcting longhorn config

* commenting out longhorn s3 backups

* adding lifecycle rule

* correcting the variable reference

* Removing longhorn old refs

* removing commented lines

* fix typo on internal/external

* change in policy

* change in policy

* adding changes in permission

* add more values pt-1

* bump to latest chart version

* revert output change to use old secret for migration

* add http for non ssl url

* Add multi-line config in promtail configuration (#206)

* add more dynamic variables for mysql

* IPROD-525: Display offending processes (cpu+memory) on performance-troubleshooting-dashboard (#204)

* Add process-exporter

* add prometheus_process_exporter_version

* turn off service monitor

* enable service monitor again

* added process-exporter-service monitor

* add a recording rule

* add tpl to rules file

* update performance-troubleshooting url

* add instance_nodename:node_memory_MemTotal_bytes

* use v16.0.0-snapshot.6 tag for dashboards

* fix dashboard-performance-troubleshooting url

* update kafka-topic-overview

* update dashboard urls

* add node-exporter relabellings

* remove recording rules

* add comment in process exporter service monitor

* upgrade performance troublesshoting dashboard to v16.1.0-snapshot.7

* rm resources folder

---------

Co-authored-by: David Fry <[email protected]>
Co-authored-by: David Fry <[email protected]>

* add more dynamic variables for mysql (#207)

* feat: standardise poc demos changes (#205)

* fix: pm4ml vs paths

* fix: indent in mojaloop tolerations

* feat: added vs for payment token adapter in pm4ml

* feat: added core connector customization logic to pm4ml

* argoproj/argo-cd#11074 (#208)

* set version tags in default cluster config (#209)

* [IPROD-563] Make loki run on monitoring nodes (#210)

* add monitoring workload label

* add node affinities for different components

* added a comment

* IPROD-563: Run Prometheus, Grafana and Tempo on monitoring nodes only (#212)

* set node affinities for tempo

* add node affinities for prometheus and related services

* move grafana to monitoring nodes as well

* enable updating version tags for prometheus  and grafana CRDs

* Polling freq and backup job freq (#213)

* disable default logs for mysql

* IPROD-525: Display offending processes (cpu+memory) on performance-troubleshooting-dashboard (#204)

* Add process-exporter

* add prometheus_process_exporter_version

* turn off service monitor

* enable service monitor again

* added process-exporter-service monitor

* add a recording rule

* add tpl to rules file

* update performance-troubleshooting url

* add instance_nodename:node_memory_MemTotal_bytes

* use v16.0.0-snapshot.6 tag for dashboards

* fix dashboard-performance-troubleshooting url

* update kafka-topic-overview

* update dashboard urls

* add node-exporter relabellings

* remove recording rules

* add comment in process exporter service monitor

* upgrade performance troublesshoting dashboard to v16.1.0-snapshot.7

* rm resources folder

---------

Co-authored-by: David Fry <[email protected]>
Co-authored-by: David Fry <[email protected]>

* feat: standardise poc demos changes (#205)

* fix: pm4ml vs paths

* fix: indent in mojaloop tolerations

* feat: added vs for payment token adapter in pm4ml

* feat: added core connector customization logic to pm4ml

* argoproj/argo-cd#11074 (#208)

* set version tags in default cluster config (#209)

* [IPROD-563] Make loki run on monitoring nodes (#210)

* add monitoring workload label

* add node affinities for different components

* added a comment

* IPROD-563: Run Prometheus, Grafana and Tempo on monitoring nodes only (#212)

* set node affinities for tempo

* add node affinities for prometheus and related services

* move grafana to monitoring nodes as well

* enable updating version tags for prometheus  and grafana CRDs

* Polling freq and backup job freq (#213)

* set min and max block duration to 30m

* fix typo

* clean up and making aws objects' name unique (#211)

* Enabled s3 read for loki-querier (#218)

* Enabled s3 read for loki-querier

* give minio credentials to compactor as well

* addon module support (#216)

* ffirst draft

* cleanup of optional tg module support add addons boilerplate

* cleanup inputs

* refactor stateful svcs

* rename common st resources

* fix vars for module calls

* fix missing ref

* fix app name and check length > 0

* fix typo

* Enable log deletion using compactor (#220)

* enable deletion using compactor

* add commit

* move comment message

* update compactor/shared_store

* parametrize loki_retention_enabled

* Feature/refactor istio gw for using 2 separate domains (#219)

* Initial commit for istio gw private and public zone

* adding the var map changes

* commiting unsaved :(

* another one

* changing internal domain

* including new files in kustom.yaml

* some cleaning

* Change in gitlab app for argocd oidc

* correcting locals

* correcting the local var

* changes for monitoring and vault

* Adding missed save

* fixing typo

* file name change

* Keycloak changes

* resolving commit - adding missing vars in var map

* resolving commit - Changes for ttk

* finance portal changes

* fixing the missing var

* adding missing var

* adding vnext

* correcting the ref

* resolving commit - changes for mcm

* resolving commit - mcm changes in vnext

* additional changes for mcm in vnext

* validating conditional stmt in for expressino

* adding merge changes

* additional changes for pm4ml

* removal of ory_stack_enabled flag

* correction

* Fix

* fix for the access of internal_interop_switch_fqdn

* control center change for callbackurl and short private subdomain

* Code to get the inputs

* correcting the input

* fix typo

* Getting the internal lb flag for argocd, vault and grafana

* adding try

* Correction in kuztomize file

* fix for vault and argocd oidc

* Correction in grafana oidc

* fix for 1.6.1 chart, add flag for backup job (#223)

* cleanup (#222)

* Fix/refactor igw (#228)

* fixing grafna oidc

* fixing non existing index

* Draft - Refactoring app-deploy.tf  (#229)

* update configs for performance

* update configs for performance

* first draft patch kustomization

* cleanup naming

* add istio log config

* rm values from default

* app-deploy refactoring

* fix: scale account lookup service

* Removing unwanted variable assignements

* Removing unwanted variable definition

* Inclding variable finanace_portal_ingress_internal_lb in vnext

* removing fin portal fqdn

* Removing fin_portal assignment in vnext

* Removing the var definition

* Removing the var definition from mojaloop

* Moving pm4ml_keycloak_realm_env_secret_map

* Removing local var definition from app deploy

* Removing duplicate pm4ml_var_map

* Fixing variable issues

* removing the first two from allowedurllist

* rm interop vars not needed anymore

* Removing the commented line

* cleanup internal/external lb vars

---------

Co-authored-by: Kalin Krustev <[email protected]>
Co-authored-by: David Fry <[email protected]>
Co-authored-by: David Fry <[email protected]>

* first draft override kustomization (#225)

* update configs for performance

* update configs for performance

* first draft patch kustomization

* cleanup naming

* add istio log config

* rm values from default

* fix: scale account lookup service

* rebase kustomization refactor for mojaloop (#233)

* Fix/refactor igw (#228)

* fixing grafna oidc

* fixing non existing index

* app-deploy refactoring

* Removing unwanted variable assignements

* Removing unwanted variable definition

* Inclding variable finanace_portal_ingress_internal_lb in vnext

* removing fin portal fqdn

* Removing fin_portal assignment in vnext

* Removing the var definition

* Removing the var definition from mojaloop

* Moving pm4ml_keycloak_realm_env_secret_map

* Removing local var definition from app deploy

* Removing duplicate pm4ml_var_map

* Fixing variable issues

* removing the first two from allowedurllist

* rm interop vars not needed anymore

* Removing the commented line

* cleanup internal/external lb vars

---------

Co-authored-by: Sijo George <[email protected]>
Co-authored-by: Sijo George <[email protected]>

---------

Co-authored-by: Kalin Krustev <[email protected]>
Co-authored-by: Sijo George <[email protected]>
Co-authored-by: Sijo George <[email protected]>

* Revert "Draft - Refactoring app-deploy.tf  (#229)"

This reverts commit 1b54a10.

* New PR Feature/refactor appdeploy (#236)

* update configs for performance

* update configs for performance

* first draft patch kustomization

* cleanup naming

* add istio log config

* rm values from default

* app-deploy refactoring

* fix: scale account lookup service

* Removing unwanted variable assignements

* Removing unwanted variable definition

* Inclding variable finanace_portal_ingress_internal_lb in vnext

* removing fin portal fqdn

* Removing fin_portal assignment in vnext

* Removing the var definition

* Removing the var definition from mojaloop

* Moving pm4ml_keycloak_realm_env_secret_map

* Removing local var definition from app deploy

* Removing duplicate pm4ml_var_map

* Fixing variable issues

* removing the first two from allowedurllist

* rm interop vars not needed anymore

* Removing the commented line

* cleanup internal/external lb vars

* rm bad merge

* add mojaloop-values-override.yaml

---------

Co-authored-by: Kalin Krustev <[email protected]>
Co-authored-by: Sijo George <[email protected]>
Co-authored-by: Sijo George <[email protected]>

* update versions (#237)

* Fixing typo (#238)

* Fix typo (#239)

* make tempo buckets in minio

* add tempo_data_expiry_days in terragrunt configs

* add minio_tempo_bucket variable to gitlab

* move all the resources to a single file

* fix the variable

* Increase loki and longhorn data TTL to 7 days in minio

* use 1d for longhorn data

* fix: admin portal name limit

* Fix for auth and wrong backend (#246)

* Correcting the default values (#247)

* fine tune addons module config (#240)

* reduce loki_ingester_pvc_size to 10Gi (#245)

* renamed minio_credentials_secret_name to minio_loki_credentials_secret_name (#244)

* updated references to minio_loki_credentials_secret_name

* updated value of minio_loki_credentials_secret_name

* IPROD-565: Setup tempo to use minio (#232)

* enable env variable expansion in config

* update tempo chart version

* add minio_tempo_credentials_secret_name

* update

* minio tempo credentials secert

* added tempo datasource

* replace extraArgs with args

* remove extra args

* upadte config

* fix bugs

* added extraEnvVarsSecret to remaining services

* switch to s3

* add tempo retension period

* use hours instead of days

* get minio_tempo_bucket from gitlab

* use minio api url

* use minio_tempo_credentials_secret_name variable

* refactor

---------

Co-authored-by: David Fry <[email protected]>

* typo on minio_loki_credentials_secret_name (#248)

* rm consul inject (#249)

* Increase resource limits for tempo (#250)

* feat: exposed ttk test cases tag and added ttk test cases labels (#252)

* Verify IAC deployment using eks (#255)

* Moving to a compatible version

* adding vpc cni specific version

* Upgrading to new version

* addnig vpc cni service account role

* private zone change

* ns record

* Changes for public_int_domain

* fixing zone

* fixing zone

* temprly setting the flag to true

* removing ns record

* try using defaults from self managed

* rm configmap

* cleanup and add ns record

* fix typo on ns

* fix output for eks module for int domain

* add zone for int to post config

* missed local var

* add prefix delegation and sgs

* just use primary

* adding try for taints and labels

* adding try for node pool ref

* Fixing null nodepool

* correcting the condition

* use latest cni

* revert

* go back to latest cni addon

---------

Co-authored-by: David Fry <[email protected]>

* increase resouce limit for tempo services (#259)

* IPROD-668: Update command and args of loki memcached (#254)

* update command and args of loki memcached

* add comments

* enable metrics for memcachedChunks (#260)

* enable metrics for memcachedChunks

* added memcached exporter dashboard

* update command and args of loki memcached

* add comments

* enable service monitor for memcache exporter

* Fix/node pool map (#261)

* node pool map change

* fix post config domain and asg/sgs

* reverting irsa

* setting longhorn_backup_job_enabled: false

---------

Co-authored-by: David Fry <[email protected]>

* expose minio-loki-credentails to queryfrontend and distributor (#263)

* Upgrading netmaker version

* All mojaloop grafana dashboards use same git tag (#262)

* Correcting the instance class for mysql rds

* Bringing managed services changes

* Correcting the newline

* Chaging the type of variable

* IPROD-686 : add loki-query-scheduler (#265)

* add query sceduler

* give minio access to gateway as well

* Revert "give minio access to gateway as well"

This reverts commit 3440f34.

* run two replicas of queryFrontend

* Revert "run two replicas of queryFrontend"

This reverts commit 43f9480.

* Adding bastion to k8s nm network along with cc

* Correcting the quotes

* adding changes for external ms

* Correcting the variable names

* Adding the map changes

* adding managed_db_host var in middle layers

* Passing the variable

* adding map variable for port and destination for ms

* adding map variable assignment

* correcting the syntax

* correcting the syntax

* correcting the syntax

* ading variable

* Removing the inner loop

* Passing yaml encoded value

* changing the ds to list of maps

* change in inventory map

* Adding managed kafka

* Formatting ansible tf

* IPROD-694: Enable loki metrics monitoring (#268)

* Change in ref obj

* Separating msk and rds

* adding local external_kafka_stateful_resource_instance_addresses

* Adding sg rule for kafka access

* IPROD-694: Add dashboards for monitoring loki (#269)

* feat: re-generate apps in branch pipeline (#257)

* feat: re-generate apps in branch pipeline

* remove unused property

* small fixes

* fix mocks

* set defaults

* set defaults

* including bootstrap_brokers_plaintext

* chaing the expressin and instance type

* Correcting the expression

* changing the out

* Chaging the output

* converting list to string

* change the default protocol for msk

* Finance portal override (#270)

* allow overriding variables for finance portal

* typo

* Default value to PLAINTEXT

* Setting the bastion instance type to t2.micro

* use valid yaml in default (#272)

* fix: optimize defaults (#278)

* fix: optimize defaults

* fix: optimize defaults

* IPROD-545: Enable prometheus remote write and read  (#275)

* IPROD-545: Enable remote write on client prometheus

* fix url address

* extract configs in params

* test disabling remote write

* refactor

* add remote read configs

* added default values for central monitoring configs

* remove a comment

* chore: update versions

* update

* revert ttk version

* update services

* update tkk

* bump

* bump

* update
charts

* revert quoting

* downgrade

* undo

* bump

* bump services

* downgrade quoting

---------

Co-authored-by: vijayg10 <[email protected]>
Co-authored-by: Kalin Krustev <[email protected]>
Co-authored-by: Sijo George <[email protected]>
Co-authored-by: David Fry <[email protected]>
Co-authored-by: muzammil360 <[email protected]>
Co-authored-by: David Fry <[email protected]>
Co-authored-by: Aaron Reynoza <[email protected]>
Co-authored-by: Vijay <[email protected]>
Co-authored-by: Sijo George <[email protected]>
kpoxo6op added a commit to kpoxo6op/soyspray that referenced this issue Oct 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

10 participants