Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to substitute parameters for metric #8207

Closed
motybz opened this issue Mar 22, 2022 · 9 comments · Fixed by #10489
Closed

Unable to substitute parameters for metric #8207

motybz opened this issue Mar 22, 2022 · 9 comments · Fixed by #10489

Comments

@motybz
Copy link

motybz commented Mar 22, 2022

First of all, thank you for a great framework!

Summary

We are using Argo 3.2.9
We are exporting metrics to Prometheus and it seems that part of the variables is not resolved.
The problematic metrics are {{exitCode}} and {{resourcesDuration.cpu}}
The errors are:
unable to substitute parameters for metric 'step_result_counter': failed to resolve {{exitCode}}
unable to substitute parameters for metric 'cpu_duration_gauge': failed to resolve {{resourcesDuration.cpu}}

Diagnostics

Not working metric

prometheus:
    - name: step_result_counter
      help: "Count of step execution by result status"
      labels:
        - key: name
          value: pre-processor
        - key: run
          value: "{{workflow.name}}"
        - key: status
          value: "{{status}}"
        **- key: exit_code
          value: "{{exitCode}}"**
      counter:
        value: "1"

Not working metrics

prometheus:
      - name: cpu_duration_gauge
        labels:
          - key: run
            value: "{{workflow.name}}"
          - key: name
            value: pre-processor
        help: "Step CPU duration gauge by name"
        gauge:
          value: "{{resourcesDuration.cpu}}"  

Working metric

prometheus:
    - name: step_result_counter
      help: "Count of step execution by result status"
      labels:
        - key: name
          value: pre-processor
        - key: run
          value: "{{workflow.name}}"
        - key: status
          value: "{{status}}"
      counter:
        value: "1"

Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.

@sarabala1979
Copy link
Member

@dpadhiar can you take a look?

@dpadhiar
Copy link
Member

@motybz Hi, could you provide the workflow YAML so we may reproduce the bug? Thank you.

@stale

This comment was marked as resolved.

@stale stale bot added the problem/stale This has not had a response in some time label Jun 18, 2022
@stale

This comment was marked as resolved.

@stale stale bot closed this as completed Jul 10, 2022
@nikita-akuity
Copy link

nikita-akuity commented Jan 5, 2023

The issue appears in 3.4.4.
Workflow to reproduce:

apiVersion: argoproj.io/v1alpha1
kind: WorkflowTemplate
metadata:
  name: test-runner
  namespace: test
spec:
  entrypoint: main
  podGC:
    strategy: OnWorkflowSuccess
  serviceAccountName: operate-workflow-sa
  templates:
  - name: main
    steps:
    - - continueOn:
          error: true
          failed: true
        name: runTest
        template: run-test
  - container:
      name: runner
      image: alpine
      args:
      - exit 1
      command:
      - sh
      - -c
    metrics:
      prometheus:
      - counter:
          value: "1"
        help: Count of runs by exit code
        labels:
        - key: exit_code
          value: '{{exitCode}}'
        name: runs_exit_status_counter
    name: run-test
    retryStrategy:
      limit: "1"

workflow-controller log:

time="2023-01-05T00:28:46.457Z" level=info msg="node changed" namespace=test new.message="Error (exit code 1)" new.phase=Failed new.progress=0/1 nodeID=test-runner-cc48l-3076541373 old.message= old.phase=Pending old.progress=0/1 workflow=test-runner-cc48l
time="2023-01-05T00:28:56.482Z" level=info msg="node changed" namespace=test new.message="Error (exit code 1)" new.phase=Failed new.progress=0/1 nodeID=test-runner-cc48l-3412240848 old.message= old.phase=Pending old.progress=0/1 workflow=test-runner-cc48l
time="2023-01-05T00:28:56.482Z" level=error msg="unable to substitute parameters for metric 'runs_exit_status_counter': failed to resolve {{exitCode}}" namespace=test workflow=test-runner-cc48l

When exit code is 0 or retryStrategy is omitted, then {{exitCode}} resolves successfully and the error does not appear.

@wanghong230 wanghong230 reopened this Jan 5, 2023
@stale stale bot removed the problem/stale This has not had a response in some time label Jan 5, 2023
@stale

This comment was marked as resolved.

@stale stale bot added the problem/stale This has not had a response in some time label Jan 21, 2023
@nikita-akuity

This comment was marked as resolved.

@stale stale bot removed the problem/stale This has not had a response in some time label Jan 24, 2023
@jiachengxu
Copy link
Member

I have repro the issue and will work on a fix for this.

@stale

This comment was marked as resolved.

@stale stale bot added the problem/stale This has not had a response in some time label Mar 25, 2023
terrytangyuan pushed a commit that referenced this issue Mar 27, 2023
 (#10489)

Signed-off-by: Jiacheng Xu <[email protected]>
Co-authored-by: Saravanan Balasubramanian <[email protected]>
terrytangyuan pushed a commit that referenced this issue Mar 29, 2023
 (#10489)

Signed-off-by: Jiacheng Xu <[email protected]>
Co-authored-by: Saravanan Balasubramanian <[email protected]>
JPZ13 pushed a commit to pipekit/argo-workflows that referenced this issue Jul 4, 2023
@agilgur5 agilgur5 removed the problem/stale This has not had a response in some time label Sep 8, 2023
@argoproj argoproj locked as resolved and limited conversation to collaborators Apr 25, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants