Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace custom retry logic with a single retry function #2827

Closed
phillebaba opened this issue Aug 2, 2024 · 2 comments
Closed

Replace custom retry logic with a single retry function #2827

phillebaba opened this issue Aug 2, 2024 · 2 comments
Assignees
Milestone

Comments

@phillebaba
Copy link
Member

Describe what should be investigated or refactored

We have a couple of ways to implement retry logic in Zarf. The most common are use of timers or the retry function form the helpers package. The former respects contexts while the latter does not. It would be better if we could decide on a single retry package to use in this function to simplify things.

A suggestion is to switch to using retry-go. This project has been around for a while and is pretty stable.
https://github.com/avast/retry-go

Links to any relevant code

func waitForHealthyCluster(ctx context.Context, client kubernetes.Interface) error {
const waitDuration = 1 * time.Second
timer := time.NewTimer(0)
defer timer.Stop()
for {
select {
case <-ctx.Done():
return fmt.Errorf("error waiting for cluster to report healthy: %w", ctx.Err())
case <-timer.C:
// Make sure there is at least one running Node
nodeList, err := client.CoreV1().Nodes().List(ctx, metav1.ListOptions{})
if err != nil || len(nodeList.Items) < 1 {
message.Debugf("No nodes reporting healthy yet: %v\n", err)
timer.Reset(waitDuration)
continue
}
// Get the cluster pod list
pods, err := client.CoreV1().Pods(corev1.NamespaceAll).List(ctx, metav1.ListOptions{})
if err != nil {
message.Debugf("Could not get the pod list: %v", err)
timer.Reset(waitDuration)
continue
}
// Check that at least one pod is in the 'succeeded' or 'running' state
for _, pod := range pods.Items {
if pod.Status.Phase == corev1.PodSucceeded || pod.Status.Phase == corev1.PodRunning {
return nil
}
}
message.Debug("No pods reported 'succeeded' or 'running' state yet.")
timer.Reset(waitDuration)
}
}
}

if err := helpers.Retry(func() error {

timer := time.NewTimer(0)
defer timer.Stop()
for {
select {
case <-ctx.Done():
return ctx.Err()
case <-timer.C:
_, err := c.Clientset.CoreV1().Namespaces().Get(ctx, ZarfNamespaceName, metav1.GetOptions{})
if kerrors.IsNotFound(err) {
return nil
}
if err != nil {
return err
}
timer.Reset(1 * time.Second)
}
}

saCtx, cancel := context.WithTimeout(ctx, 2*time.Minute)
defer cancel()
err = func(ctx context.Context, ns, name string) error {
timer := time.NewTimer(0)
defer timer.Stop()
for {
select {
case <-ctx.Done():
return fmt.Errorf("failed to get service account %s/%s: %w", ns, name, ctx.Err())
case <-timer.C:
_, err := c.Clientset.CoreV1().ServiceAccounts(ns).Get(ctx, name, metav1.GetOptions{})
if err != nil && !kerrors.IsNotFound(err) {
return err
}
if kerrors.IsNotFound(err) {
message.Debug("Service account %s/%s not found, retrying...", ns, name)
timer.Reset(1 * time.Second)
continue
}
return nil
}
}
}(saCtx, ZarfNamespaceName, "default")

Additional context

N/A

@AustinAbro321
Copy link
Contributor

Small note that there is a helpers.RetryWithContext function we use, though from a glance at the retry-go package it seems like a reasonable substitute for both.

if err := helpers.RetryWithContext(ctx, sc, 2, 5*time.Second, message.Warnf); err != nil {

@phillebaba
Copy link
Member Author

Was not aware of that function, that would also be an option. We just need to look at the pros and cons of bringing in a dependency to do this.

@salaxander salaxander added this to Zarf Aug 8, 2024
@github-project-automation github-project-automation bot moved this to Backlog in Zarf Aug 8, 2024
@salaxander salaxander added this to the v0.38.0 milestone Aug 8, 2024
@salaxander salaxander modified the milestones: v0.38.0, v0.39.0 Aug 15, 2024
@salaxander salaxander moved this from Backlog to In progress in Zarf Aug 15, 2024
@salaxander salaxander moved this from In progress to Done in Zarf Aug 15, 2024
@salaxander salaxander closed this as completed by moving to Done in Zarf Aug 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

No branches or pull requests

4 participants