-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use prometheus to annotate pod/node data onto Job model #723
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
jjnesbitt
force-pushed
the
job-node-data-prometheus
branch
2 times, most recently
from
January 12, 2024 16:57
0b9f9ac
to
5aa73fa
Compare
jjnesbitt
force-pushed
the
job-node-data-prometheus
branch
from
January 12, 2024 17:01
5aa73fa
to
13381d5
Compare
jjnesbitt
force-pushed
the
job-node-data-prometheus
branch
4 times, most recently
from
January 13, 2024 19:34
fe9d564
to
147e0a4
Compare
Additionally, break apart job model to hold data pertaining to the node and pod. Since a job isn't always run in the cluster, this will de-clutter the model and prevent a lot of null rows on non-aws jobs.
jjnesbitt
force-pushed
the
job-node-data-prometheus
branch
from
January 15, 2024 18:03
147e0a4
to
cdf18a1
Compare
jjnesbitt
force-pushed
the
job-node-data-prometheus
branch
from
January 15, 2024 19:27
3781335
to
0bd349b
Compare
danlamanna
reviewed
Jan 16, 2024
danlamanna
reviewed
Jan 16, 2024
danlamanna
reviewed
Jan 16, 2024
danlamanna
reviewed
Jan 16, 2024
danlamanna
reviewed
Jan 16, 2024
danlamanna
reviewed
Jan 16, 2024
danlamanna
reviewed
Jan 16, 2024
cmelone
reviewed
Jan 17, 2024
danlamanna
reviewed
Jan 17, 2024
jjnesbitt
force-pushed
the
job-node-data-prometheus
branch
from
January 17, 2024 17:57
cd8879d
to
8fbcc15
Compare
danlamanna
previously approved these changes
Jan 17, 2024
danlamanna
previously approved these changes
Jan 18, 2024
danlamanna
previously approved these changes
Jan 18, 2024
danlamanna
previously approved these changes
Jan 18, 2024
jjnesbitt
force-pushed
the
job-node-data-prometheus
branch
from
January 18, 2024 23:11
5df2ea8
to
e8b3696
Compare
I've been testing this and am seeing times of around 3-5 seconds for the |
danlamanna
approved these changes
Jan 19, 2024
This was referenced Jan 19, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Supersedes #697
This PR collects much more data every time job data is collected by the job webhook. This allows us to derive "cost per job" metrics.
The following notable changes are included in this PR:
build_timings_processor.py
has been moved/renamed to thejob_processor
folder, with several files, including one for build timings. This is because at this point, a lot more is going on than just uploading build timings.Node
model, which stores data relating to a node that a job ran on, including available cpu/memory, its spot price, etc.JobPod
model, which stores data about the pod a job ran on in the cluster, including resource requests, limits, and usage.The above two models are referenced by
Job
, but are null-able, since jobs that weren't run in the cluster (uo, etc.), as well as historical aws jobs, won't have any available data.There are a few more minor changes worth mentioning:
Job.duration
field has been changed from aFloatField
to aDurationField
, to better represent the data and make it easier to work with.Job.aws
field has been changed to be no longer null-able. The field was made null-able in Handle jobs with missing runners #724, to get around a quirk of the gitlab API. This PR circumvents that part of the API entirely, and so null values shouldn't be allowed. The amount of rows in the database that have a null value in theaws
column is only a handful at the moment, and so those rows are deleted as a part of the migration, to clean things up.