-
Notifications
You must be signed in to change notification settings - Fork 262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add metric for pending workloads, broken down by queue and cluster_queue #237
Add metric for pending workloads, broken down by queue and cluster_queue #237
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: alculquicondor The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
ce04fc5
to
04e0e93
Compare
/hold |
e0411e1
to
fba2b13
Compare
@alculquicondor: GitHub didn't allow me to assign the following users: kerthcet. Note that only kubernetes-sigs members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. In response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
fba2b13
to
ed11d11
Compare
pkg/metrics/metrics.go
Outdated
Subsystem: subsystemName, | ||
Name: "pending_workloads", | ||
Help: "Number of pending workloads, per queue and cluster_queue.", | ||
}, []string{"queue", "cluster_queue"}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: place cluster_queue first because the relation is cq -- 1:n -- q
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
/lgtm |
fcbbf4f
to
703b95f
Compare
/hold cancel |
@@ -199,6 +201,7 @@ func (m *Manager) UpdateQueue(q *kueue.Queue) error { | |||
} | |||
} | |||
qImpl.update(q) | |||
qImpl.reportPendingWorkloads() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about wrap with defer
, difference here is when return errQueueDoesNotExist
error, we will still run reportPendingWorkloads()
. (Similar to other places)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't want to report when the queue doesn't exist.
metrics.Registry.MustRegister( | ||
admissionAttempts, | ||
admissionAttemptLatency, | ||
PendingWorkloads, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about lowcase PendingWorkloads
to pendingWorkloads
and move reportPendingWorkloads
to this package and pass in the necessary parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The necessary parameters are the cluster_queue and queue keys, which are the same parameters that WithValues
requires. Then we still need a reportPendingWorkloads
to get a key for the queue. So we would need to functions. I think this is clean enough.
pkg/queue/manager.go
Outdated
@@ -315,6 +321,7 @@ func (m *Manager) RequeueWorkload(ctx context.Context, info *workload.Info, imme | |||
|
|||
func (m *Manager) DeleteWorkload(w *kueue.Workload) { | |||
m.Lock() | |||
defer m.Unlock() | |||
m.deleteWorkloadFromQueueAndClusterQueue(w, queueKeyForWorkload(w)) | |||
m.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rm m.Unlock in L326
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, I'll just revert this.
/retest-required |
Another solution for reference, counting the pending workloads in |
Change-Id: I46a509a2612597004483cb8c90bb76dcfe95742d
703b95f
to
0c4cf75
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rebased
Another solution for reference, counting the pending workloads in ClusterQueueImpl, it seems more stable
What makes you say that it would be more stable?
The source-of-truth for how many elements are in the queue is the queue implementation. Are you thinking of race conditions where 2 routines try to add/remove elements at the same time? That's not a problem because the operations take a lock.
metrics.Registry.MustRegister( | ||
admissionAttempts, | ||
admissionAttemptLatency, | ||
PendingWorkloads, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The necessary parameters are the cluster_queue and queue keys, which are the same parameters that WithValues
requires. Then we still need a reportPendingWorkloads
to get a key for the queue. So we would need to functions. I think this is clean enough.
@@ -199,6 +201,7 @@ func (m *Manager) UpdateQueue(q *kueue.Queue) error { | |||
} | |||
} | |||
qImpl.update(q) | |||
qImpl.reportPendingWorkloads() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't want to report when the queue doesn't exist.
pkg/queue/manager.go
Outdated
@@ -315,6 +321,7 @@ func (m *Manager) RequeueWorkload(ctx context.Context, info *workload.Info, imme | |||
|
|||
func (m *Manager) DeleteWorkload(w *kueue.Workload) { | |||
m.Lock() | |||
defer m.Unlock() | |||
m.deleteWorkloadFromQueueAndClusterQueue(w, queueKeyForWorkload(w)) | |||
m.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, I'll just revert this.
/lgtm |
What type of PR is this?
/kind feature
What this PR does / why we need it:
Add metric pending_workloads. We only track it when there are changes in a queue. The total number of pending workloads for a cluster_queue can be obtained by aggregation.
Which issue(s) this PR fixes:
Part of #199
Special notes for your reviewer: