-
Notifications
You must be signed in to change notification settings - Fork 272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test graceful shutdown with idle connections #3969
Conversation
This comment has been minimized.
This comment has been minimized.
CI performance tests
|
if a connection was opened to the router but never received the intial request, ti would not be considered by hyper's graceful shutdown feature, and so the router would wayt indefinitely for the connection to close. This tends to happen when a brower opens the explorer or prometheus page provided by the router, because browsers eagerly open new connections, in anticipation of future traffic, while those pages only need a single request to the router.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved with changelog suggestion
Co-authored-by: Bryn Cooke <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have the same problem with UDS. I just manually tested and confirmed that if I have a connection open to the router which never sends any data, then shutdown will hang.
We need to add the same checker for UDS.
Implement the same logic as for TLS and TCP. I manually confirmed (with: `socat - UNIX-CONNECT:/tmp/router.sock`) that the same problem existed without this fix for unix domain sockets. The problem goes away with the fix.
I merged in additional code to address Unix Domain Sockets. |
For instance in CI on ARM or MacOS. Up the timeout to 1 second.
[![Mend Renovate](https://app.renovatebot.com/images/banner.svg)](https://renovatebot.com) This PR contains the following updates: | Package | Update | Change | |---|---|---| | [apollographql/router](https://togithub.com/apollographql/router) | minor | `v1.32.0` -> `v1.33.0` | --- ### Release Notes <details> <summary>apollographql/router (apollographql/router)</summary> ### [`v1.33.0`](https://togithub.com/apollographql/router/releases/tag/v1.33.0) [Compare Source](https://togithub.com/apollographql/router/compare/v1.32.0-alpha.0...v1.33.0) #### 🚀 Features ##### Add `hasNext` to SupergraphRequest ([Issue #​4016](https://togithub.com/apollographql/router/issues/4016)) Coprocessors multi-part response support has been enhanced to include `hasNext`, allowing you to determine when a request has completed. When `stage` is `SupergraphResponse`, `hasNext` if present and `true` indicates that there will be subsequent `SupergraphResponse` calls to the co-processor for each multi-part (`@defer`/subscriptions) response. See the [coprocessor documentation](https://www.apollographql.com/docs/router/customizations/coprocessor/) for more details. By [@​BrynCooke](https://togithub.com/BrynCooke) in [https://github.com/apollographql/router/pull/4017](https://togithub.com/apollographql/router/pull/4017) ##### Expose the ability to set topology spread constraints on the helm chart ([3891](https://togithub.com/apollographql/router/issues/3891)) Give developers the ability to set topology spread constraints that can be used to guarantee that federation pods are spread out evenly across AZs. By [@​bjoernw](https://togithub.com/bjoernw) in [https://github.com/apollographql/router/pull/3892](https://togithub.com/apollographql/router/pull/3892) #### 🐛 Fixes ##### Ignore JWKS keys which aren't supported by the router ([Issue #​3853](https://togithub.com/apollographql/router/issues/3853)) If you have a JWKS which contains a key which has an algorithm (alg) which the router doesn't recognise, then the entire JWKS is disregarded even if there were other keys in the JWKS which the router could use. We have changed the JWKS processing logic so that we remove entries with an unrecognised algorithm from the list of available keys. We print a warning with the name of the algorithm for each removed entry. By [@​garypen](https://togithub.com/garypen) in [https://github.com/apollographql/router/pull/3922](https://togithub.com/apollographql/router/pull/3922) ##### Fix panic when streaming responses to co-processor ([Issue #​4013](https://togithub.com/apollographql/router/issues/4013)) Streamed responses will no longer cause a panic in the co-processor plugin. This affected defer and stream queries. By [@​BrynCooke](https://togithub.com/BrynCooke) in [https://github.com/apollographql/router/pull/4014](https://togithub.com/apollographql/router/pull/4014) ##### Only reject defer/subscriptions if actually part of a batch ([Issue #​3956](https://togithub.com/apollographql/router/issues/3956)) Fix the checking logic so that deferred queries or subscriptions will only be rejected when experimental batching is enabled and the operations are part of a batch. Without this fix, all subscriptions or deferred queries would be rejected when experimental batching support was enabled. By [@​garypen](https://togithub.com/garypen) in [https://github.com/apollographql/router/pull/3959](https://togithub.com/apollographql/router/pull/3959) ##### Fix requires selection in arrays ([Issue #​3972](https://togithub.com/apollographql/router/issues/3972)) When a field has a `@requires` annotation that selects an array, and some fields are missing in that array or some of the elements are null, the router would short circuit the selection and remove the entire array. This relaxes the condition to allow nulls in the selected array By [@​Geal](https://togithub.com/Geal) in [https://github.com/apollographql/router/pull/3975](https://togithub.com/apollographql/router/pull/3975) ##### Fix router hang when opening the explorer, prometheus or health check page ([Issue #​3941](https://togithub.com/apollographql/router/issues/3941)) The Router did not gracefully shutdown when an idle connections are made by a client, and would instead hang. In particular, web browsers make such connection in anticipation of future traffic. This is now fixed, and the Router will now gracefully shut down in a timely fashion. By [@​Geal](https://togithub.com/Geal) in [https://github.com/apollographql/router/pull/3969](https://togithub.com/apollographql/router/pull/3969) ##### Fix hang and high CPU usage when compressing small responses ([PR #​3961](https://togithub.com/apollographql/router/pull/3961)) When returning small responses (less than 10 bytes) and compressing them using gzip, the router could go into an infinite loop By [@​Geal](https://togithub.com/Geal) in [https://github.com/apollographql/router/pull/3961](https://togithub.com/apollographql/router/pull/3961) #### 📃 Configuration ##### Add `enabled` field for telemetry exporters ([PR #​3952](https://togithub.com/apollographql/router/pull/3952)) Telemetry configuration now supports `enabled` on all exporters. This allows exporters to be disabled without removing them from the configuration and in addition allows for a more streamlined default configuration. ```diff telemetry: tracing: datadog: + enabled: true jaeger: + enabled: true otlp: + enabled: true zipkin: + enabled: true ``` Existing configurations will be migrated to the new format automatically on startup. However, you should update your configuration to use the new format as soon as possible. By [@​BrynCooke](https://togithub.com/BrynCooke) in [https://github.com/apollographql/router/pull/3952](https://togithub.com/apollographql/router/pull/3952) #### 🛠 Maintenance ##### Create a replacement self-signed server certificate: 10 years lifespan ([Issue #​3998](https://togithub.com/apollographql/router/issues/3998)) This certificate is only used for testing, so 10 years lifespan is acceptable. By [@​garypen](https://togithub.com/garypen) in [https://github.com/apollographql/router/pull/4009](https://togithub.com/apollographql/router/pull/4009) #### 📚 Documentation ##### Updated documentation for deploying router ([PR #​3943](https://togithub.com/apollographql/router/pull/3943)) Updated documentation for containerized router deployments, with guides and examples for [deploying on Kubernetes](https://www.apollographql.com/docs/router/containerization/kubernetes) and [running on Docker](https://www.apollographql.com/docs/router/containerization/docker). By [@​shorgi](https://togithub.com/shorgi) in [https://github.com/apollographql/router/pull/3943](https://togithub.com/apollographql/router/pull/3943) ##### Document guidance for request and response buffering ([Issue #​3838](https://togithub.com/apollographql/router/issues/3838)) Provides specific guidance on request and response buffering within the router. By [@​garypen](https://togithub.com/garypen) in [https://github.com/apollographql/router/pull/3970](https://togithub.com/apollographql/router/pull/3970) </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Enabled. ♻ **Rebasing**: Whenever PR is behind base branch, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Mend Renovate](https://www.mend.io/free-developer-tools/renovate/). View repository job log [here](https://developer.mend.io/github/apollographql/rover). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy4xOS4yIiwidXBkYXRlZEluVmVyIjoiMzcuMTkuMiIsInRhcmdldEJyYW5jaCI6Im1haW4ifQ==--> Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Fix #3124
Fix #3941
Checklist
Complete the checklist (and note appropriate exceptions) before the PR is marked ready-for-review.
Exceptions
Note any exceptions here
Notes
Footnotes
It may be appropriate to bring upcoming changes to the attention of other (impacted) groups. Please endeavour to do this before seeking PR approval. The mechanism for doing this will vary considerably, so use your judgement as to how and when to do this. ↩
Configuration is an important part of many changes. Where applicable please try to document configuration examples. ↩
Tick whichever testing boxes are applicable. If you are adding Manual Tests, please document the manual testing (extensively) in the Exceptions. ↩