Simplified worker monitoring and restart, without gunicorn. #1942

gnat · 2023-04-13T11:01:06Z

Drastically simplifies app deployment for Starlette and FastAPI for many users.

Survive worker crashes directly in Uvicorn- go ahead and run your ffmpeg and machine learning tasks. Uvicorn will auto restart your workers up to the desired --workers count, reliably, even under heavy load.

For many users, this removes gunicorn as a dependency. @tomchristie @tiangolo
- Master process should restart expired workers. #517 (comment)
- Simplifies production documentation for many scenarios.
Tested, benchmarked on both Windows 11 🪟 and Linux! @euri10 @Kludex
Under 20 lines, backwards compatible, no effect on uvicorn performance.
You can now "reload" your app by killing all of the child processes. 😛

Related issues

Closes: Master process should restart expired workers. #517
Good resolution for: How to deploy starlette in production? starlette#671
Many tangentially related FastAPI issues which would have never been reported if this was included in Uvicorn.

Deployments right now...

caddy/nginx ➡️ gunicorn ➡️ uvicorn ➡️ starlette/fastapi

For many users can be simplified to...

caddy/nginx ➡️ uvicorn ➡️ starlette/fastapi

Stress tested to 100,000's of r/s using using hey.

Thoughts, feedback appreciated.

I realise this isn't as "feature rich" as some may want (handling various signals, etc), but we're eliminating an entire dependency with only a few lines. It works well and is a relatively tiny change for big benefits, and can be easily re-factored out if something better is implemented in the future.

humrochagf · 2023-04-13T11:56:00Z

There's a behavior change that may be unexpected for those running on k8s and expect the pod to be restarted during a crash. To cover those, I recommend having an unmanaged option

gnat · 2023-04-13T12:11:31Z

I would agree, but I'm not sure it matters because this new functionality is only present if --workers=2 or greater. (multiprocess.py only)

Unless I'm mistaken, k8s pods are running single worker mode (1 worker = 1 pod), because k8s is being used as the "worker manager".

In the current multiprocess.py behavior: the parent uvicorn process awkwardly stays alive (and unusable) even when all workers are dead: #517 This mode has never been suitable for use with an external worker manager, AFAIK.

Kludex · 2023-04-15T14:43:30Z

uvicorn/supervisors/multiprocess.py

+                break
+            # Restart expired workers.
+            for process in self.processes:
+                if not process.is_alive():


This is not reliable. We can't know if the process is stuck. It will only give the false impression of a good process manager.

I've considered this but I don't think it's reasonable to expect Uvicorn to be so involved, because determining "stuck" and the handling of "stuck" could be quite different for every project: It's up to the developer to decide if/when/how to kill a "stuck" worker.

Nor is it the point of this PR.

We want Uvicorn to just mimic k8s here- and only become involved when the worker is dead/crashed/exited, which is what is_alive() does perfectly.

This is a good change. I agree with @gnat. I think the feature that @Kludex is after is a slightly different scope.

On the description it says: "we're eliminating an entire dependency with only a few lines.", but gunicorn does check if the process is stuck, so it's the same scope, and this is not a replacement for it.

humrochagf · 2023-04-17T19:04:57Z

I would agree, but I'm not sure it matters because this new functionality is only present if --workers=2 or greater. (multiprocess.py only)

Unless I'm mistaken, k8s pods are running single worker mode (1 worker = 1 pod), because k8s is being used as the "worker manager".

In the current multiprocess.py behavior: the parent uvicorn process awkwardly stays alive (and unusable) even when all workers are dead: #517 This mode has never been suitable for use with an external worker manager, AFAIK.

Sorry my bad, you are right, the multiprocess isn't used in the single worker case

uvicorn/supervisors/multiprocess.py

Co-authored-by: Zanie Adkins <[email protected]>

Kludex

I don't think we should include this change given the "simplicity" and how it "lies" to users about being reliable.

But... If we want this to get in, at least those should happen:

Add tests.
Answer the question: why this shouldn't be an opt-in feature?

Kludex · 2023-04-22T15:52:45Z

.github/workflows/test-suite.yml

@@ -6,6 +6,7 @@ on:
    branches: ["master"]
  pull_request:
    branches: ["master"]
+  workflow_dispatch:


Why did you add this?

This enables dev branches to manually run the test suite.

Without creating a PR you mean?

Yes, that's correct. It allows for running the test suite manually on your own fork or branch prior to creating a pull request.

Kludex · 2023-04-22T15:57:19Z

uvicorn/supervisors/multiprocess.py

+                break
+            # Restart expired workers.
+            for process in self.processes:
+                if not process.is_alive():


On the description it says: "we're eliminating an entire dependency with only a few lines.", but gunicorn does check if the process is stuck, so it's the same scope, and this is not a replacement for it.

zanieb · 2023-04-23T19:57:51Z

Answer the question: why this shouldn't be an opt-in feature?

It seems like advanced monitoring should be opt-in and simple "is alive" monitoring should be the default since Python doesn't let you "opt-out" of dependencies.

gnat · 2023-04-24T01:25:25Z

Answer the question: why this shouldn't be an opt-in feature?

The current behavior is broken: multiprocess.py becomes a zombie process when all the workers are dead.

multiprocess.py is not used when running under k8s or gunicorn, so it's not obvious that this is broken.

Kludex · 2024-03-02T13:39:00Z

We're not doing this. I prefer to have reliable restart, as I said on #1942 (comment).

I'll make @abersheeran 's PR happen instead: #2183 👍

gnat added 6 commits April 13, 2023 01:56

Simplified worker monitoring and reload, without gunicorn.

365bc8e

Enable manual invocation of test suite.

5db1e47

Linting.

1672371

Adjusted required test coverage.

dbf1eb7

Adjusted required test coverage for Windows.

750d809

Adjusted required test coverage for Windows.

a47cfb8

Kludex requested a review from ahopkins April 13, 2023 11:09

humrochagf enabled auto-merge (squash) April 13, 2023 11:15

humrochagf disabled auto-merge April 13, 2023 11:16

Merge branch 'encode:master' into simple_worker_monitoring

dcadc64

Kludex requested a review from euri10 April 15, 2023 14:41

Kludex added the hold Don't merge yet label Apr 15, 2023

Kludex reviewed Apr 15, 2023

View reviewed changes

zanieb reviewed Apr 18, 2023

View reviewed changes

uvicorn/supervisors/multiprocess.py Show resolved Hide resolved

gnat and others added 3 commits April 20, 2023 02:56

Update uvicorn/supervisors/multiprocess.py

fd73701

Co-authored-by: Zanie Adkins <[email protected]>

Merge branch 'encode:master' into simple_worker_monitoring

e4ffbae

Update multiprocess.py

e62c96e

Kludex reviewed Apr 22, 2023

View reviewed changes

zanieb mentioned this pull request Jun 28, 2023

Add worker process guard. #2020

Closed

2 tasks

gnat mentioned this pull request Feb 25, 2024

Master process should restart expired workers. #517

Closed

Kludex closed this Mar 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplified worker monitoring and restart, without gunicorn. #1942

Simplified worker monitoring and restart, without gunicorn. #1942

gnat commented Apr 13, 2023 •

edited

Loading

humrochagf commented Apr 13, 2023

gnat commented Apr 13, 2023 •

edited

Loading

Kludex Apr 15, 2023

gnat Apr 15, 2023 •

edited

Loading

ahopkins Apr 17, 2023

Kludex Apr 22, 2023

humrochagf commented Apr 17, 2023

Kludex left a comment

Kludex Apr 22, 2023

gnat Apr 22, 2023

Kludex Apr 22, 2023

gnat Apr 22, 2023

Kludex Apr 22, 2023

zanieb commented Apr 23, 2023

gnat commented Apr 24, 2023 •

edited

Loading

Kludex commented Mar 2, 2024

Simplified worker monitoring and restart, without gunicorn. #1942

Simplified worker monitoring and restart, without gunicorn. #1942

Conversation

gnat commented Apr 13, 2023 • edited Loading

Related issues

Deployments right now...

For many users can be simplified to...

Stress tested to 100,000's of r/s using using hey.

humrochagf commented Apr 13, 2023

gnat commented Apr 13, 2023 • edited Loading

Choose a reason for hiding this comment

gnat Apr 15, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

humrochagf commented Apr 17, 2023

Kludex left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zanieb commented Apr 23, 2023

gnat commented Apr 24, 2023 • edited Loading

Kludex commented Mar 2, 2024

gnat commented Apr 13, 2023 •

edited

Loading

gnat commented Apr 13, 2023 •

edited

Loading

gnat Apr 15, 2023 •

edited

Loading

gnat commented Apr 24, 2023 •

edited

Loading