Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: don't wait for pipeline to finish updating because we can't distinguish failure from progressing #313

Merged
merged 6 commits into from
Oct 2, 2024

Conversation

juliev0
Copy link
Collaborator

@juliev0 juliev0 commented Oct 2, 2024

Modifications

This makes it so that we no longer require to be paused simply because pipeline is still reconciling.

The original process of updating was:

  1. pause pipeline
  2. wait for it to be paused
  3. apply new pipeline spec with desiredPhase=Paused
  4. wait for pipeline change to be reconciled based on pipeline.ObservedGeneration and pipeline Conditions, which indicate whether daemon, vertices, side inputs are fully reconciled
  5. set desiredPhase=Running

I have to remove step 4 because it's impossible to know based on Pipeline Conditions whether Pipeline is in fact still "progressing" or failed. This becomes clear when we actually deploy a bad pipeline.

Why I think this is okay to remove: After the Pipeline has been paused, it goes down to 0 vertices. After we apply the spec, Numaflow Controller updates the Vertex specs. Then we set it to desiredPhase=Running, and it scales those Vertices back up. My original motivation to wait for it to fully reconcile was out of paranoia that we would somehow be "Running" an older Vertex but now this seems unnecessary.

Verification

Ran e2e test about 6 times with success every time

…gives a false positive failure

Signed-off-by: Julie Vogelman <[email protected]>
Signed-off-by: Julie Vogelman <[email protected]>
Signed-off-by: Julie Vogelman <[email protected]>
Signed-off-by: Julie Vogelman <[email protected]>
Signed-off-by: Julie Vogelman <[email protected]>
@@ -110,15 +110,16 @@ func verifyPipelinePaused(namespace string, pipelineRolloutName string, pipeline
}, testTimeout).Should(Equal(metav1.ConditionTrue))

document("Verify that Pipeline is paused and fully drained")
verifyPipelineStatusEventually(Namespace, pipelineName,
verifyPipelineStatusEventually(namespace, pipelineName,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we weren't using the parameter passed in to the function

@@ -251,7 +251,7 @@ var _ = Describe("Functional e2e", Serial, func() {

verifyPipelineRolloutDeployed(pipelineRolloutName)
verifyPipelineRolloutHealthy(pipelineRolloutName)
verifyInProgressStrategy(Namespace, pipelineRolloutName, apiv1.UpgradeStrategyNoOp)
verifyInProgressStrategy(pipelineRolloutName, apiv1.UpgradeStrategyNoOp)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove unused parameter from this function

}
return false, metav1.Condition{}
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no longer used

@@ -669,15 +669,10 @@ func createPipelineRollout(isbsvcSpec numaflowv1.PipelineSpec) *apiv1.PipelineRo
}
}

func createPipelineOfSpec(spec numaflowv1.PipelineSpec, phase numaflowv1.PipelinePhase, fullyReconciled bool) *numaflowv1.Pipeline {
func createPipelineOfSpec(spec numaflowv1.PipelineSpec, phase numaflowv1.PipelinePhase) *numaflowv1.Pipeline {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not needed in the test anymore


return unhealthyOrProgressing, nil

}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no longer used

func(retrievedPipelineSpec numaflowv1.PipelineSpec, retrievedPipelineStatus numaflowv1.PipelineStatus) bool {
return retrievedPipelineStatus.Phase == numaflowv1.PipelinePhasePaused && retrievedPipelineStatus.DrainedOnPause

})
verifyPodsRunning(namespace, 0, getVertexLabelSelector(pipelineName))
// this happens too fast to verify it:
//verifyPodsRunning(namespace, 0, getVertexLabelSelector(pipelineName))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can happen too fast now to verify 0 Pods now but note that Numaflow has in fact indicated phase=Paused and drainedOnPause=true in the previous step, which it would only do if there were 0 Pods

Signed-off-by: Julie Vogelman <[email protected]>
@juliev0 juliev0 marked this pull request as ready for review October 2, 2024 04:06
@xdevxy xdevxy merged commit bd935fd into main Oct 2, 2024
8 checks passed
@xdevxy xdevxy deleted the remove-is-pipeline-updating branch October 2, 2024 20:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants