Do not transition servers into the healthy state until at least one healthy probe is detected. #2415

robbieknuth · 2024-02-22T16:51:49Z

Fixes #2079

~~This fix is sort of a hack, but does not add any additional book keeping or change the health model while still yielding the desired behavior.~~

~~By simply initializing the failure count to the failure threshold, the first health probe will either keep the destination unhealthy, or it will swap to healthy just like before.~~

Use the destination's previous state to inform the next state if we are below the failure theshold. This will prevent new destinations with slower server startup times from erroneously tranisitioning into healthy temporarily until the unhealthy threshold is eventually reached.

Destinations with a state of Unknown will continue to be unknown until either the failure threshold is reached (=> Unhealthy) or a successful probe occurs (=> Healthy). This scenario is the patch.
Unhealthy destinations are unaffected because the failure threshold is already reached.
Healthy destinations maintain existing behavior and behave the same as bullet point 1.

MihaZupan

Thank you for contributing here.
I think the behavior we want here is to keep the destination health as Unknown in this case until we see enough failures - see #2079 (comment).

We should have the necesarry bookkeeping in place already. We can pass the current destination health to EvaluateHealthState and then change the failure condition like so

-newHealth = currentFailureCount < threshold ? DestinationHealth.Healthy : DestinationHealth.Unhealthy;
+newHealth = currentFailureCount < threshold ? previousHealth : DestinationHealth.Unhealthy;

src/ReverseProxy/Utilities/AtomicCounter.cs

…ation's previous active health state as the next health state. This has the effect of a destination with a health state of Unknown in the unknown state until the threshold is reached, at which time the destination transitions into Unhealthy.

robbieknuth · 2024-02-27T16:37:15Z

By simply using the previous state, the change can be smaller. The changes to AtomicCounter (and those tests) are no longer required.

MihaZupan

Thanks

robbieknuth requested review from Tratcher and MihaZupan as code owners February 22, 2024 16:51

MihaZupan added this to the v.Next milestone Feb 27, 2024

MihaZupan reviewed Feb 27, 2024

View reviewed changes

src/ReverseProxy/Utilities/AtomicCounter.cs Outdated Show resolved Hide resolved

MihaZupan self-assigned this Feb 27, 2024

robbieknuth force-pushed the fix-2079 branch from 2bb1288 to 2d95f7c Compare February 27, 2024 16:35

MihaZupan approved these changes Feb 27, 2024

View reviewed changes

MihaZupan merged commit 60acb2a into microsoft:main Feb 27, 2024
7 checks passed

MihaZupan modified the milestones: v.Next, YARP 2.2.0-preview1 May 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do not transition servers into the healthy state until at least one healthy probe is detected. #2415

Do not transition servers into the healthy state until at least one healthy probe is detected. #2415

robbieknuth commented Feb 22, 2024 •

edited

Loading

MihaZupan left a comment

robbieknuth commented Feb 27, 2024

MihaZupan left a comment

Do not transition servers into the healthy state until at least one healthy probe is detected. #2415

Do not transition servers into the healthy state until at least one healthy probe is detected. #2415

Conversation

robbieknuth commented Feb 22, 2024 • edited Loading

MihaZupan left a comment

Choose a reason for hiding this comment

robbieknuth commented Feb 27, 2024

MihaZupan left a comment

Choose a reason for hiding this comment

robbieknuth commented Feb 22, 2024 •

edited

Loading