Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add pause metrics for ISBService/NumaflowController rollouts #234

Merged
merged 7 commits into from
Sep 6, 2024

Conversation

dpadhiar
Copy link
Contributor

@dpadhiar dpadhiar commented Sep 6, 2024

Fixes #222

Modifications

Introduces Gauge metrics called numaflow_isbservice_paused_seconds and numaflow_controller_paused_seconds which reports the total time an ISBService or NumaflowController is requesting resources be paused when it is updating. This is in line with the previously added PipelineRollout pause metrics that we emit these when dataLossPrevention flag is set to True.

We add the PauseStatus struct to both the ISBService and NumaflowController Rollouts however we have changed the JSON name to pauseRequestStatus as we don't want users to assume the ISBService or controller themselves are being paused.

Verification

Manually testing the changes by continuously updating NumaflowController and ISBServiceRollouts with dataLossPrevention enabled to examine the pause seconds are accurate.

(no unit testing now as we don't have dataLossPrevention enabled in those cases)

Copy link
Contributor

@xdevxy xdevxy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@juliev0 and I discussed how to add a label for the metrics to distinguish if the pause request is coming from user or numaplane. However we can do as a followup.

@juliev0
Copy link
Collaborator

juliev0 commented Sep 6, 2024

I like your very complete github PR descriptions :)

@@ -36,7 +36,8 @@ type InterStepBufferService struct {

// ISBServiceRolloutStatus defines the observed state of ISBServiceRollout
type ISBServiceRolloutStatus struct {
Status `json:",inline"`
Status `json:",inline"`
PauseStatus `json:"pauseRequestStatus,omitempty"`
Copy link
Collaborator

@juliev0 juliev0 Sep 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it appears that PauseStatus is inline? I think I'd prefer that it not be. Can you change it to
PauseRequestStatus PauseStatus `json:"pauseRequestStatus,omitempty"?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't catch that the PipelineRolloutStatus.PauseStatus is the same - can that be changed as well?

controllerRollout := rollout.(*apiv1.NumaflowControllerRollout)

if paused {
if controllerRollout.Status.PauseRequestStatus.LastPauseBeginTime == metav1.NewTime(initTime) || !controllerRollout.Status.PauseRequestStatus.LastPauseBeginTime.After(controllerRollout.Status.PauseRequestStatus.LastPauseEndTime.Time) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add comments

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used same comments as PipelineRollout since they're the same logic.

@dpadhiar dpadhiar marked this pull request as ready for review September 6, 2024 23:37
@dpadhiar dpadhiar merged commit 1ff5bc7 into numaproj:main Sep 6, 2024
6 checks passed
@dpadhiar dpadhiar deleted the pause-metric-cont branch September 6, 2024 23:48
chandankumar4 pushed a commit that referenced this pull request Sep 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants