Metrics for periodic jobs are creating too many timeseries due to timestamps in the labels #4061

sevagh · 2018-03-28T16:45:04Z

Hello,
This is some sample output of the Nomad task_group_* metrics:

nomad_nomad_job_summary_complete{host="foo",job="bar/periodic-1522233600",task_group="baz"} 1
nomad_nomad_job_summary_complete{host="foo",job="bar/periodic-1522233601",task_group="baz"} 1
nomad_nomad_job_summary_complete{host="foo",job="bar/periodic-1522233602",task_group="baz"} 1
nomad_nomad_job_summary_complete{host="foo",job="bar/periodic-1522233603",task_group="baz"} 1
nomad_nomad_job_summary_complete{host="foo",job="bar/periodic-1522233604",task_group="baz"} 1
nomad_nomad_job_summary_complete{host="foo",job="bar/periodic-1522233605",task_group="baz"} 1
nomad_nomad_job_summary_complete{host="foo",job="bar/periodic-1522233606",task_group="baz"} 1
nomad_nomad_job_summary_complete{host="foo",job="bar/periodic-1522233607",task_group="baz"} 1
nomad_nomad_job_summary_complete{host="foo",job="bar/periodic-1522233608",task_group="baz"} 1

This is a misuse of labels in Prometheus: https://prometheus.io/docs/practices/naming/#labels

CAUTION: Remember that every unique combination of key-value label pairs represents a new time series, which can dramatically increase the amount of data stored. Do not use labels to store dimensions with high cardinality (many different label values), such as user IDs, email addresses, or other unbounded sets of values.

There should be a better way of representing this.

The text was updated successfully, but these errors were encountered:

dadgar · 2018-03-28T16:53:08Z

What is your suggestion?

sevagh · 2018-03-28T16:58:09Z

I have nothing right now. Can we have this ticket open for brainstorming? I'll be doing some meditation as well and can post what I come up with.

sevagh · 2018-03-28T17:01:49Z

A pattern we use (for totally unrelated metrics in Prometheus) is:

last_success_timestamp{} = 1.52225616e+09

With some elbow-grease maybe something similar can be thought up for this case?

dadgar · 2018-03-28T17:06:24Z

Sounds good. Improving metrics is always a worth while effort!

sevagh · 2018-03-29T16:15:43Z

So I'm worried about doing something like this (pseudocode mixed with Python, not valid go):

jobID := job.ID // e.g. foojob/periodic-XXXXXXX
jobName := job.Name // e.g. foojob/periodic-XXXXXX

split := jobID.split('-')
adjustedJobID := split[:-1] // e.g. foojob/periodic
timestamp = split[-1] // e.g. XXXXXXX

emit_metric{
    name: periodic_job_last_run,
    value: float32(XXXXXXX)
    labels:
         "job": "foojob/periodic",
}

This emits things but it sort of butchers the way Periodic jobs are even named in the first place within Nomad - smells funny to me.

dylan-ferreira · 2018-04-05T17:05:10Z

In the meantime, rewriting the label on ingest is a workable stopgap:

    metric_relabel_configs:
      - source_labels: ['exported_job']
        separator: ;
        regex: "^(.+/periodic)-[0-9]+$" # Drop unix-timestamp on nomad job metrics
        target_label: 'exported_job'
        replacement: '$1'
        action: replace

(in the example above, the job label is already taken by service discovery)

I do like having the indication of a periodic job, but I agree that altering the original job name can lead to confusion. Adding another label: job_type="periodic" may be helpful here.

Rather than last_success_timestamp{} we could have something like the following that could apply to any job type:

# HELP nomad_job_start_time_seconds Start time of the job since unix epoch in seconds.
# TYPE nomad_job_start_time_seconds gauge
nomad_job_start_time_seconds{job="<job_name>", job_type="<job_type>"} <unixtime64>

For periodic tasks, It would be great to have metrics like:

# HELP nomad_job_last_run_time_seconds Time in seconds the last run of the job took to complete.
# TYPE nomad_job_last_run_time_seconds gauge
nomad_job_last_run_time_seconds{job="<job_name>", job_type="<job_type>"} <seconds>
# HELP nomad_job_last_exit_code Exit code from the last run of the job.
# TYPE nomad_job_last_exit_code gauge
nomad_job_last_exit_code{job="<job_name>", job_type="<job_type>"} <exit_code>

sevagh · 2018-04-05T21:38:59Z

Great insights, thanks.

github-actions · 2022-11-29T02:18:41Z

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

dadgar added stage/waiting-reply theme/metrics labels Mar 28, 2018

burdandrei mentioned this issue Jun 7, 2018

Parametrized/periodic jobs per child tagged metric emmision #4392

Merged

preetapan closed this as completed in #4392 Jun 21, 2018

github-actions bot locked as resolved and limited conversation to collaborators Nov 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metrics for periodic jobs are creating too many timeseries due to timestamps in the labels #4061

Metrics for periodic jobs are creating too many timeseries due to timestamps in the labels #4061

sevagh commented Mar 28, 2018

dadgar commented Mar 28, 2018

sevagh commented Mar 28, 2018

sevagh commented Mar 28, 2018 •

edited

Loading

dadgar commented Mar 28, 2018

sevagh commented Mar 29, 2018 •

edited

Loading

dylan-ferreira commented Apr 5, 2018 •

edited

Loading

sevagh commented Apr 5, 2018

github-actions bot commented Nov 29, 2022

Metrics for periodic jobs are creating too many timeseries due to timestamps in the labels #4061

Metrics for periodic jobs are creating too many timeseries due to timestamps in the labels #4061

Comments

sevagh commented Mar 28, 2018

dadgar commented Mar 28, 2018

sevagh commented Mar 28, 2018

sevagh commented Mar 28, 2018 • edited Loading

dadgar commented Mar 28, 2018

sevagh commented Mar 29, 2018 • edited Loading

dylan-ferreira commented Apr 5, 2018 • edited Loading

sevagh commented Apr 5, 2018

github-actions bot commented Nov 29, 2022

sevagh commented Mar 28, 2018 •

edited

Loading

sevagh commented Mar 29, 2018 •

edited

Loading

dylan-ferreira commented Apr 5, 2018 •

edited

Loading