Router does not shut down until all connections are closed #3124

BrynCooke · 2023-05-22T14:27:36Z

To reproduce:

Start the router.
Open Chrome at the router URL.
Try to shut down the router
Close Chrome and see that the router immediately shuts down.

Why we need this:

Unclean shutdown is never great.
The local dev experience will partially broken. Users that are trying to terminate the router will either have to close their browser or force kill the router.

We need to:

Fix the shutdown issue. No requests are in flight, so the router should shut down immediately.
Make sure that traffic_shaping->timeout is able to terminate in flight requests if they have not completed and allow the router to shut down. New requests should be blocked once shutdown has been initiated.

The text was updated successfully, but these errors were encountered:

Geal · 2023-05-24T06:59:22Z

I took an initial look. The connection graceful shutdown is happening, and the listeners are stopped correctly, so it is happening elsewhere

Geal · 2023-05-24T07:33:34Z

revising my statement: there is indeed an issue elsewhere (the connection sender being cloned and held in the wrong place), for whichI have a fix nearly ready. But when opening the sandbox, two connections are created, and only one of them stops after getting the graceful shutdown notification

Geal · 2023-05-25T06:55:11Z

more context here:

sandbox creates one connection for a GET / request (getting the HTML)
sandbox creates one connection for a POST / request (introspection)
when sending the shutdown signal to the router, both connections get hyper's graceful shutdown applied
the first connection is closed, but not the introspection one

when debugging, it appears that the second one gets stuck on the await, so there's probably something wrong with the graceful shutdown:

router/apollo-router/src/axum_factory/listeners.rs

Lines 251 to 254 in c679c16

    
           let c = connection.as_mut(); 
        
           c.graceful_shutdown(); 
        
           let _= connection.await;

This happens with or without compression, with a content-length or chunking response. From the point of view of the client, the entire response has been received.

In graceful shutdown on HTTP 1, hyper will either wait for the response to be sent to close the connection, or if the connection is idle (waiting for a new request), it will close it immediately. Somehow here the introspection connection must not be considered idle.

Before we can get further into the investigation, I think we could add a configurable timeout on the connection shutdown. Short in dev mode, a bit longer in production.

garypen · 2023-06-23T13:32:22Z

I had a little look at this and couldn't reproduce the hang. I tried with --dev and with --hot-reload. Whenever I hit Ctrl-C the router shuts down.

Am I missing something or may this have been fixed?

BrynCooke · 2023-06-26T08:43:13Z

Looks like this got fixed at some point in the last couple of releases. Let's close for now.

o0Ignition0o · 2023-07-13T19:44:25Z

Reopening this issue since we have reproduced it again.

lleadbet · 2023-07-13T19:46:39Z

Flagging that a customer I'm working with is running into this currently;

OS: MacOS 13.4/Docker
Router: 1.24.0

Config:

supergraph:
  listen: 0.0.0.0:4000
  introspection: true
include_subgraph_errors:
  all: true 
cors:
  allow_any_origin: true
  origins:
    - https://studio.apollographql.com
sandbox:
  enabled: true
homepage:
  enabled: false

Which is very basic.

They were getting the following logs when they had the playground window open:

2023-07-13T19:29:12.737623Z  INFO shutting down
2023-07-13T19:29:12.737810Z  INFO all connections shut down
2023-07-13T19:29:12.747471Z  DEBUG deno runtime shutdown successfully
2023-07-13T19:29:12.750301Z  DEBUG terminating apollo exporter
2023-07-13T19:29:12.751759Z  DEBUG state machine event: Shutdown, transitioned from: Running to: Stopped
2023-07-13T19:29:12.751773Z  INFO stopped

Refreshing the chrome window with the playground let the router close.

smyrick · 2023-09-29T16:59:05Z

It might be related or not, but I have another customer who is reporting that starting a Router with a subgraph and then repeatedly restarting the subgraph is also causing Router resources to go up. I have yet been able to create a reproduce-able example

This is on the other side of the connection but I wonder if it is related

Fix #3124 Fix #3941  --- **Checklist** Complete the checklist (and note appropriate exceptions) before the PR is marked ready-for-review. - [ ] Changes are compatible[^1] - [ ] Documentation[^2] completed - [ ] Performance impact assessed and acceptable - Tests added and passing[^3] - [ ] Unit Tests - [ ] Integration Tests - [ ] Manual Tests **Exceptions** *Note any exceptions here* **Notes** [^1]: It may be appropriate to bring upcoming changes to the attention of other (impacted) groups. Please endeavour to do this before seeking PR approval. The mechanism for doing this will vary considerably, so use your judgement as to how and when to do this. [^2]: Configuration is an important part of many changes. Where applicable please try to document configuration examples. [^3]: Tick whichever testing boxes are applicable. If you are adding Manual Tests, please document the manual testing (extensively) in the Exceptions. --------- Co-authored-by: Bryn Cooke <[email protected]> Co-authored-by: Gary Pennington <[email protected]>

BrynCooke closed this as not planned Won't fix, can't repro, duplicate, stale Jun 26, 2023

o0Ignition0o reopened this Jul 13, 2023

lleadbet mentioned this issue Jul 19, 2023

Loglevel Trace causes Router to crash #3474

Closed

o0Ignition0o self-assigned this Sep 29, 2023

Geal mentioned this issue Oct 4, 2023

test graceful shutdown with idle connections #3969

Merged

6 tasks

garypen closed this as completed in #3969 Oct 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Router does not shut down until all connections are closed #3124

Router does not shut down until all connections are closed #3124

BrynCooke commented May 22, 2023 •

edited

Loading

Geal commented May 24, 2023

Geal commented May 24, 2023

Geal commented May 25, 2023

garypen commented Jun 23, 2023

BrynCooke commented Jun 26, 2023

o0Ignition0o commented Jul 13, 2023

lleadbet commented Jul 13, 2023

smyrick commented Sep 29, 2023

Router does not shut down until all connections are closed #3124

Router does not shut down until all connections are closed #3124

Comments

BrynCooke commented May 22, 2023 • edited Loading

Geal commented May 24, 2023

Geal commented May 24, 2023

Geal commented May 25, 2023

garypen commented Jun 23, 2023

BrynCooke commented Jun 26, 2023

o0Ignition0o commented Jul 13, 2023

lleadbet commented Jul 13, 2023

smyrick commented Sep 29, 2023

BrynCooke commented May 22, 2023 •

edited

Loading