-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parametrized/periodic jobs per child tagged metric emmision #4392
Parametrized/periodic jobs per child tagged metric emmision #4392
Conversation
3159c4e
to
a0a9f1f
Compare
Given how high cardinality this telemetry will be, it would be nice to opt-in for it - influxdb will not like a lot of batch jobs emitting these extra labels often (like when jobs run every 5min 24/7) Influx limit by default to just 1m series - which will run out very very quick already |
@jippi, InfluxDB-wize there is no difference. |
a0a9f1f
to
ff1b0d4
Compare
ping @dadgar |
@burdandrei could you resolve the merge conflict on your branch with a rebase? One concern I have is that if the change will break existing users that somehow relied on parsing the jobid to get the parent and the dispatched child id out. If we decide to merge this we need to mention that in the changelog and in the metrics documentation as well. |
client/task_runner.go
Outdated
@@ -288,6 +284,31 @@ func NewTaskRunner(logger *log.Logger, config *config.Config, | |||
}, | |||
} | |||
|
|||
if tc.alloc.Job.ParentID != "" { | |||
tc.baseLabels = append(tc.baseLabels, metrics.Label{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed this with the team internally, and we don't want to break any existing users that rely on the full jobID for dispatched jobs. Could you add a new "parentJob" label here and below for alloc.job.ParentID and leave the existing "job" label as it is? That way it doesn't break backwards compatibility.
ff1b0d4
to
05d5523
Compare
@preetapan What about summaries? Does it look ok? |
nomad/leader.go
Outdated
} | ||
labels = append(labels, metrics.Label{ | ||
Name: "job", | ||
Value: metricJobId, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still not quite right. The "job" label should have summary.JobId in all three cases, so I would suggest adding back the code in lines 619-622, and removing 646-648. In each of your case statements above for dispatch and periodic, jobInfo[0] should be added as new label called "parentJobId".
For example, if summary.JobID is 'myjob/dispatch-23423423', this should result in the following labels:
label | value |
---|---|
"job" | myjob/dispatch-23423423 --> (this maintains backwards compatibility) |
"parentJob" | myjob |
"dispatch_id" | 23423423 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got you,
generally, I tried to avoid this, cause if I got a dashboard with dropdown by JOB ID and lot's of periodic/dispatch it's really hard to find normal jobs. But nomad-wise you are right job id is the long one. Maybe it's better to add an option to config emit/skip for parametrized/periodic jobs. But again it can add redundant operational complexity.
Let me know if you want me to provide the config option.
05d5523
to
4e352f3
Compare
thanks, @preetapan! |
Value: jobInfo[0], | ||
}, metrics.Label{ | ||
Name: "periodic_id", | ||
Value: jobInfo[1], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure that the length of jobInfo
is what is expected for safety.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this specific case, the contains check in line 642 guards against that. If the string ends with /periodic-
that slice should still have 2 elements with one being the empty string.
if strings.Contains(tc.alloc.Job.Name, "/periodic-") { | ||
tc.baseLabels = append(tc.baseLabels, metrics.Label{ | ||
Name: "periodic_id", | ||
Value: strings.Split(tc.alloc.Job.Name, "/periodic-")[1], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First check the length after splitting the string to ensure the length is as expected before assigning the second element.
if strings.Contains(tc.alloc.Job.Name, "/dispatch-") { | ||
tc.baseLabels = append(tc.baseLabels, metrics.Label{ | ||
Name: "dispatch_id", | ||
Value: strings.Split(tc.alloc.Job.Name, "/dispatch-")[1], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See below comment. Consider pulling this into a helper function to reduce code duplication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the input, I wrote helper function hour ago and now I'm trying to figure where to place it =)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, its probably fine to not introduce a helper method. This and the other place we do this, are in two different packages and we would have to put one helper method in a util package that both can import. Doesn't seem worth it this time..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That what's stopped me - 2 different packages without imports intersection.
Thanks @preetapan, and thanks @chelseakomlo for really usefull code review tips!
Value: jobInfo[0], | ||
}, metrics.Label{ | ||
Name: "dispatch_id", | ||
Value: jobInfo[1], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't assume that this array will have two elements, first check for safety.
}, metrics.Label{ | ||
Name: "periodic_id", | ||
Value: jobInfo[1], | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two points overall:
- Consider adding a helper function where there is code duplication
- Don't assume the length of the array is what is as expected, first check before indexing.
PR #4392 was merged to master *after* allocrunnerv2 was branched, so the client-specific portions must be ported from master to arv2.
PR #4392 was merged to master *after* allocrunnerv2 was branched, so the client-specific portions must be ported from master to arv2.
PR #4392 was merged to master *after* allocrunnerv2 was branched, so the client-specific portions must be ported from master to arv2.
I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions. |
continuing better metrics emission started in #3882
fixes #4061