Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Envoy's MaxEjectionPercent and BaseEjectionTime config entries for passive health checks #15979

Merged
merged 15 commits into from
Apr 26, 2023

Conversation

analogue
Copy link
Collaborator

@analogue analogue commented Jan 13, 2023

Description

WAN federated clusters are not able to failover all traffic from one datacenter to another via passive healthchecks when the control plane is down without being able to set MaxEjectionPercent to 100%. BaseEjectionTime just provides additional configurability on the backoff interval until the max ejection interval is hit.

Testing & Reproduction steps

  • Existing passive health check unit tests updated
  • Existing passive health check integration tests updated to verify configuration
  • Open to adding a functional integration test if it would be useful, however I'm planning to add support for these config values via K8s CRDs in the consul-k8s repo. e2e tests on K8s might be more useful.

PR Checklist

  • updated test coverage
  • external facing docs updated
  • not a security concern

@github-actions github-actions bot added theme/api Relating to the HTTP API interface theme/envoy/xds Related to Envoy support type/docs Documentation needs to be created/updated/clarified labels Jan 13, 2023
@david-yu
Copy link
Contributor

@analogue I assume a consul-k8s crd update is also in order?

@analogue analogue marked this pull request as ready for review January 13, 2023 22:14
@analogue analogue requested a review from a team as a code owner January 13, 2023 22:14
@analogue
Copy link
Collaborator Author

@analogue I assume a consul-k8s crd update is also in order?

Yep, that is coming up next 😄.

Copy link
Contributor

@boruszak boruszak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed 15979.txt, service-defaults.mdx, and envoy.mdx for edits to the documentation.

Overall LGTM. Confirm that the descriptions match phrasing before merging. Let me know if you have any questions or need any additional assistance.

Approving on behalf of consul-docs.

Comment on lines +554 to +558
name: 'MaxEjectionPercent',
type: 'int: 10',
description: `Measured in percent (%), the maximum percentage of hosts that can be ejected
from a upstream cluster due to passive health check failures. If not specified, inherits
Envoy's default of 10% or at least one host.`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
name: 'MaxEjectionPercent',
type: 'int: 10',
description: `Measured in percent (%), the maximum percentage of hosts that can be ejected
from a upstream cluster due to passive health check failures. If not specified, inherits
Envoy's default of 10% or at least one host.`,
name: 'MaxEjectionPercent',
type: 'int: 10',
description: `Measured in percent (%), the maximum percentage of hosts that can be ejected
from an upstream cluster due to passive health check failures. If not specified, inherits
Envoy's default value of 10. Ejects at least one host when set to a non-zero value.`,

Two suggestions:

  1. Don't use the % in the second sentence. The type is an integer and you already say "measured in percent" - so the default is 10 nor 10%.
  2. I understand the "At least one host" statement, but the phrasing is a little muddled. The idea is that if the Max Ejection percent is set to 10 but there's only three hosts in the cluster, the percent is less than a single host makes up in the deployment. So at least one host must be ejected when the value is not 0. Feel free to adjust or reject if you disagree with my phrasing or understanding.

Comment on lines +561 to +565
name: 'BaseEjectionTime',
type: 'duration: 30s',
description: `The base time that a host is ejected for. The real time is equal to the base
time multiplied by the number of times the host has been ejected and is capped by
max_ejection_time (Default 300s). If not speficied, inherits Envoy's default value of 30s.`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
name: 'BaseEjectionTime',
type: 'duration: 30s',
description: `The base time that a host is ejected for. The real time is equal to the base
time multiplied by the number of times the host has been ejected and is capped by
max_ejection_time (Default 300s). If not speficied, inherits Envoy's default value of 30s.`,
name: 'BaseEjectionTime',
type: 'duration: 30s',
description: `The base time that a host is ejected for. The real time is equal to the base
time multiplied by the number of times the host has been ejected and is capped by
`max_ejection_time` (default 300s) in the [Envoy proxy configuration](/consul/docs/connect/proxies/envoy#ejection). If not specified, inherits Envoy's default value of 30s.`,

Added a link directly to the values.

I curious what you mean by "real time" - is that word set in stone on your end? If not, I'd be more descriptive - e.g., "The amount of time that elapses before an ejected host can rejoin the cluster is equal to the base time multiplied by..."

Comment on lines +719 to +731
name: 'MaxEjectionPercent',
type: 'int: 10',
description: `Measured in percent (%), the maximum percentage of hosts that can be ejected
from a upstream cluster due to passive health check failures. If not specified, inherits
Envoy's default of 10% or at least one host.`,
},
{
name: 'BaseEjectionTime',
type: 'duration: 30s',
description: `The base time that a host is ejected for. The real time is equal to the base
time multiplied by the number of times the host has been ejected and is capped by
max_ejection_time (Default 300s). If not speficied, inherits Envoy's default value of 30s.`,
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
name: 'MaxEjectionPercent',
type: 'int: 10',
description: `Measured in percent (%), the maximum percentage of hosts that can be ejected
from a upstream cluster due to passive health check failures. If not specified, inherits
Envoy's default of 10% or at least one host.`,
},
{
name: 'BaseEjectionTime',
type: 'duration: 30s',
description: `The base time that a host is ejected for. The real time is equal to the base
time multiplied by the number of times the host has been ejected and is capped by
max_ejection_time (Default 300s). If not speficied, inherits Envoy's default value of 30s.`,
},
name: 'MaxEjectionPercent',
type: 'int: 10',
description: `Measured in percent (%), the maximum percentage of hosts that can be ejected
from an upstream cluster due to passive health check failures. If not specified, inherits
Envoy's default value of 10. Ejects at least one host when set to a non-zero value.`,
},
{
name: 'BaseEjectionTime',
type: 'duration: 30s',
description: The base time that a host is ejected for. The real time is equal to the base
time multiplied by the number of times the host has been ejected and is capped by
`max_ejection_time` (default 300s) in the [Envoy proxy configuration](/consul/docs/connect/proxies/envoy#ejection). If not specified, inherits Envoy's default value of 30s.`,
},

Suggestion to match previous edits

Comment on lines +411 to +417
- `max_ejection_percent` - The maximum percentage of hosts that can be ejected
from a upstream cluster due to passive health check failures. If not specified,
inherits Envoy's default of 10% or at least one host.
- `base_ejection_time` - The base time that a host is ejected for. The real
time is equal to the base time multiplied by the number of times the host has
been ejected and is capped by max_ejection_time (Default 300s). If not
speficied, inherits Envoy's default value of 30s.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- `max_ejection_percent` - The maximum percentage of hosts that can be ejected
from a upstream cluster due to passive health check failures. If not specified,
inherits Envoy's default of 10% or at least one host.
- `base_ejection_time` - The base time that a host is ejected for. The real
time is equal to the base time multiplied by the number of times the host has
been ejected and is capped by max_ejection_time (Default 300s). If not
speficied, inherits Envoy's default value of 30s.
<a name="ejection"/>
- `max_ejection_percent` - The maximum percentage of hosts that can be ejected
from an upstream cluster when passive health checks fail. If not specified,
inherits Envoy's default value of 10. Ejects at least one host when set to a non-zero value.
- `base_ejection_time` - The base time that a host is ejected for. The amount of time that elapses before an ejected host can rejoin the cluster is equal to the base time multiplied by the number of times the host has
been ejected, and is capped by `max_ejection_time` (default 300s). If not
specified, inherits Envoy's default value of 30s.

@hashi-derek hashi-derek self-requested a review January 17, 2023 22:20
@github-actions
Copy link

This pull request has been automatically flagged for inactivity because it has not been acted upon in the last 60 days. It will be closed if no new activity occurs in the next 30 days. Please feel free to re-open to resurrect the change if you feel this has happened by mistake. Thank you for your contributions.

@github-actions github-actions bot added the meta/stale Automatically flagged for inactivity by stalebot label Mar 27, 2023
@david-yu david-yu added the backport/1.15 This release series is no longer active on CE. Use backport/ent/1.15. label Apr 26, 2023
@david-yu
Copy link
Contributor

Added backport/1.15 since 1.15 is now out.

@malizz malizz merged commit 5eaeb7b into main Apr 26, 2023
@malizz malizz deleted the spatel/NET-1646-add-max-ejection-percent branch April 26, 2023 22:59
@david-yu
Copy link
Contributor

Looks like this did not backport at all for 1.14 and 1.15.

@malizz malizz mentioned this pull request May 1, 2023
4 tasks
@malizz malizz mentioned this pull request May 9, 2023
3 tasks
@komapa
Copy link
Contributor

komapa commented May 25, 2023

Thank you for this feature but can we also document it on https://developer.hashicorp.com/consul/docs/v1.14.x/connect/proxies/envoy#passive_health_check?

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.15 This release series is no longer active on CE. Use backport/ent/1.15. meta/stale Automatically flagged for inactivity by stalebot theme/api Relating to the HTTP API interface theme/envoy/xds Related to Envoy support type/docs Documentation needs to be created/updated/clarified
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants