-
Notifications
You must be signed in to change notification settings - Fork 363
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
e2e: Add backend health check e2e case via active http #3677
e2e: Add backend health check e2e case via active http #3677
Conversation
Signed-off-by: Dingkang Li <[email protected]>
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #3677 +/- ##
==========================================
- Coverage 69.02% 69.01% -0.01%
==========================================
Files 176 176
Lines 21688 21688
==========================================
- Hits 14971 14969 -2
- Misses 5637 5638 +1
- Partials 1080 1081 +1 ☔ View full report in Codecov by Sentry. |
thanks for picking this one up @aoledk !
|
We can use
Footnotes |
|
} | ||
t.Logf("cluster pass health check: membership_healthy stats query count: %v", v) | ||
|
||
if v == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this be non zero ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you prob need to wait until non zero is reached
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've turned to use non-zero cluster health check stats to verify whether backend health check works, for both pass and fail cases. And undo changes in e2e/utils/prometheus.sum
to restore the behavior of only return nil
when non-zero value is queried.
# TYPE envoy_cluster_health_check_attempt counter
envoy_cluster_health_check_attempt{envoy_cluster_name="httproute/gateway-conformance-infra/http-with-health-check-active-http-fail/rule/0"} 2
envoy_cluster_health_check_attempt{envoy_cluster_name="httproute/gateway-conformance-infra/http-with-health-check-active-http-pass/rule/0"} 2
# TYPE envoy_cluster_health_check_success counter
envoy_cluster_health_check_success{envoy_cluster_name="httproute/gateway-conformance-infra/http-with-health-check-active-http-fail/rule/0"} 0
envoy_cluster_health_check_success{envoy_cluster_name="httproute/gateway-conformance-infra/http-with-health-check-active-http-pass/rule/0"} 2
# TYPE envoy_cluster_health_check_failure counter
envoy_cluster_health_check_failure{envoy_cluster_name="httproute/gateway-conformance-infra/http-with-health-check-active-http-fail/rule/0"} 2
envoy_cluster_health_check_failure{envoy_cluster_name="httproute/gateway-conformance-infra/http-with-health-check-active-http-pass/rule/0"} 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks !
/retest |
What this PR does / why we need it:
Check whether backend health check via active http works in e2e, inspired by https://github.com/projectcontour/contour/blob/main/test/e2e/httpproxy/http_health_checks_test.go.
EG use the default
50%
from Envoy as cluster panic threshold. In case of health check fail, all endpoints in cluster become unhealthy and panic mode is activated, request is distributed to all unhealthyecho-basic
endpoints and get response fromecho-basic
, with same behavior as case of health check pass.To make those two cases distinguishable, in case of health check fail, Envoy should not distribute request to unhealthy endpoint, respond with 503
no healthy upstream
directly. This requires cluster panic mode is disabled.Since EG doesn't have knob to configure cluster panic threshold currently, I use EnvoyPatchPolicy to disable cluster panic mode in this PR.
Issues about panic threshold in contour:
projectcontour/contour#579.
projectcontour/contour#1846
Which issue(s) this PR fixes:
xref #2417