Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Router does not shut down until all connections are closed #3124

Closed
2 tasks
BrynCooke opened this issue May 22, 2023 · 8 comments · Fixed by #3969
Closed
2 tasks

Router does not shut down until all connections are closed #3124

BrynCooke opened this issue May 22, 2023 · 8 comments · Fixed by #3969
Assignees

Comments

@BrynCooke
Copy link
Contributor

BrynCooke commented May 22, 2023

To reproduce:

  1. Start the router.
  2. Open Chrome at the router URL.
  3. Try to shut down the router
  4. Close Chrome and see that the router immediately shuts down.

Why we need this:

  • Unclean shutdown is never great.
  • The local dev experience will partially broken. Users that are trying to terminate the router will either have to close their browser or force kill the router.

We need to:

  • Fix the shutdown issue. No requests are in flight, so the router should shut down immediately.
  • Make sure that traffic_shaping->timeout is able to terminate in flight requests if they have not completed and allow the router to shut down. New requests should be blocked once shutdown has been initiated.
@Geal
Copy link
Contributor

Geal commented May 24, 2023

I took an initial look. The connection graceful shutdown is happening, and the listeners are stopped correctly, so it is happening elsewhere

@Geal
Copy link
Contributor

Geal commented May 24, 2023

revising my statement: there is indeed an issue elsewhere (the connection sender being cloned and held in the wrong place), for whichI have a fix nearly ready. But when opening the sandbox, two connections are created, and only one of them stops after getting the graceful shutdown notification

@Geal
Copy link
Contributor

Geal commented May 25, 2023

more context here:

  • sandbox creates one connection for a GET / request (getting the HTML)
  • sandbox creates one connection for a POST / request (introspection)
  • when sending the shutdown signal to the router, both connections get hyper's graceful shutdown applied
  • the first connection is closed, but not the introspection one
  • when debugging, it appears that the second one gets stuck on the await, so there's probably something wrong with the graceful shutdown:
    let c = connection.as_mut();
    c.graceful_shutdown();
    let _= connection.await;

This happens with or without compression, with a content-length or chunking response. From the point of view of the client, the entire response has been received.

In graceful shutdown on HTTP 1, hyper will either wait for the response to be sent to close the connection, or if the connection is idle (waiting for a new request), it will close it immediately. Somehow here the introspection connection must not be considered idle.

Before we can get further into the investigation, I think we could add a configurable timeout on the connection shutdown. Short in dev mode, a bit longer in production.

@garypen
Copy link
Contributor

garypen commented Jun 23, 2023

I had a little look at this and couldn't reproduce the hang. I tried with --dev and with --hot-reload. Whenever I hit Ctrl-C the router shuts down.

Am I missing something or may this have been fixed?

@BrynCooke
Copy link
Contributor Author

Looks like this got fixed at some point in the last couple of releases. Let's close for now.

@BrynCooke BrynCooke closed this as not planned Won't fix, can't repro, duplicate, stale Jun 26, 2023
@o0Ignition0o
Copy link
Contributor

Reopening this issue since we have reproduced it again.

@o0Ignition0o o0Ignition0o reopened this Jul 13, 2023
@lleadbet
Copy link
Contributor

Flagging that a customer I'm working with is running into this currently;

OS: MacOS 13.4/Docker
Router: 1.24.0

Config:

supergraph:
  listen: 0.0.0.0:4000
  introspection: true
include_subgraph_errors:
  all: true 
cors:
  allow_any_origin: true
  origins:
    - https://studio.apollographql.com
sandbox:
  enabled: true
homepage:
  enabled: false

Which is very basic.

They were getting the following logs when they had the playground window open:

2023-07-13T19:29:12.737623Z  INFO shutting down
2023-07-13T19:29:12.737810Z  INFO all connections shut down
2023-07-13T19:29:12.747471Z  DEBUG deno runtime shutdown successfully
2023-07-13T19:29:12.750301Z  DEBUG terminating apollo exporter
2023-07-13T19:29:12.751759Z  DEBUG state machine event: Shutdown, transitioned from: Running to: Stopped
2023-07-13T19:29:12.751773Z  INFO stopped

Refreshing the chrome window with the playground let the router close.

@smyrick
Copy link
Member

smyrick commented Sep 29, 2023

It might be related or not, but I have another customer who is reporting that starting a Router with a subgraph and then repeatedly restarting the subgraph is also causing Router resources to go up. I have yet been able to create a reproduce-able example

This is on the other side of the connection but I wonder if it is related

garypen added a commit that referenced this issue Oct 10, 2023
Fix #3124
Fix #3941

<!-- start metadata -->
---

**Checklist**

Complete the checklist (and note appropriate exceptions) before the PR
is marked ready-for-review.

- [ ] Changes are compatible[^1]
- [ ] Documentation[^2] completed
- [ ] Performance impact assessed and acceptable
- Tests added and passing[^3]
    - [ ] Unit Tests
    - [ ] Integration Tests
    - [ ] Manual Tests

**Exceptions**

*Note any exceptions here*

**Notes**

[^1]: It may be appropriate to bring upcoming changes to the attention
of other (impacted) groups. Please endeavour to do this before seeking
PR approval. The mechanism for doing this will vary considerably, so use
your judgement as to how and when to do this.
[^2]: Configuration is an important part of many changes. Where
applicable please try to document configuration examples.
[^3]: Tick whichever testing boxes are applicable. If you are adding
Manual Tests, please document the manual testing (extensively) in the
Exceptions.

---------

Co-authored-by: Bryn Cooke <[email protected]>
Co-authored-by: Gary Pennington <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants