Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not transition servers into the healthy state until at least one healthy probe is detected. #2415

Merged
merged 1 commit into from
Feb 27, 2024

Conversation

robbieknuth
Copy link
Contributor

@robbieknuth robbieknuth commented Feb 22, 2024

Fixes #2079

This fix is sort of a hack, but does not add any additional book keeping or change the health model while still yielding the desired behavior.

By simply initializing the failure count to the failure threshold, the first health probe will either keep the destination unhealthy, or it will swap to healthy just like before.

Use the destination's previous state to inform the next state if we are below the failure theshold. This will prevent new destinations with slower server startup times from erroneously tranisitioning into healthy temporarily until the unhealthy threshold is eventually reached.

  • Destinations with a state of Unknown will continue to be unknown until either the failure threshold is reached (=> Unhealthy) or a successful probe occurs (=> Healthy). This scenario is the patch.
  • Unhealthy destinations are unaffected because the failure threshold is already reached.
  • Healthy destinations maintain existing behavior and behave the same as bullet point 1.

@MihaZupan MihaZupan added this to the v.Next milestone Feb 27, 2024
Copy link
Member

@MihaZupan MihaZupan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for contributing here.
I think the behavior we want here is to keep the destination health as Unknown in this case until we see enough failures - see #2079 (comment).

We should have the necesarry bookkeeping in place already. We can pass the current destination health to EvaluateHealthState and then change the failure condition like so

-newHealth = currentFailureCount < threshold ? DestinationHealth.Healthy : DestinationHealth.Unhealthy;
+newHealth = currentFailureCount < threshold ? previousHealth : DestinationHealth.Unhealthy;

src/ReverseProxy/Utilities/AtomicCounter.cs Outdated Show resolved Hide resolved
@MihaZupan MihaZupan self-assigned this Feb 27, 2024
…ation's previous active health state as the next health state.

This has the effect of a destination with a health state of Unknown in the unknown state until the threshold is reached, at which time the destination transitions into Unhealthy.
@robbieknuth
Copy link
Contributor Author

By simply using the previous state, the change can be smaller. The changes to AtomicCounter (and those tests) are no longer required.

Copy link
Member

@MihaZupan MihaZupan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@MihaZupan MihaZupan merged commit 60acb2a into microsoft:main Feb 27, 2024
7 checks passed
@MihaZupan MihaZupan modified the milestones: v.Next, YARP 2.2.0-preview1 May 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ConsecutiveFailuresHealthPolicy would wrongly mark a new faulty destination as healthy, causing proxy failure.
2 participants