Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplified worker monitoring and restart, without gunicorn. #1942

Closed
wants to merge 10 commits into from

Conversation

gnat
Copy link

@gnat gnat commented Apr 13, 2023

Drastically simplifies app deployment for Starlette and FastAPI for many users.

Survive worker crashes directly in Uvicorn- go ahead and run your ffmpeg and machine learning tasks. Uvicorn will auto restart your workers up to the desired --workers count, reliably, even under heavy load.

Related issues

Deployments right now...

caddy/nginx ➡️ gunicorn ➡️ uvicorn ➡️ starlette/fastapi

For many users can be simplified to...

caddy/nginx ➡️ uvicorn ➡️ starlette/fastapi

Stress tested to 100,000's of r/s using using hey.

Thoughts, feedback appreciated.

I realise this isn't as "feature rich" as some may want (handling various signals, etc), but we're eliminating an entire dependency with only a few lines. It works well and is a relatively tiny change for big benefits, and can be easily re-factored out if something better is implemented in the future.

@Kludex Kludex requested a review from ahopkins April 13, 2023 11:09
@humrochagf humrochagf enabled auto-merge (squash) April 13, 2023 11:15
@humrochagf humrochagf disabled auto-merge April 13, 2023 11:16
@humrochagf
Copy link
Contributor

There's a behavior change that may be unexpected for those running on k8s and expect the pod to be restarted during a crash. To cover those, I recommend having an unmanaged option

@gnat
Copy link
Author

gnat commented Apr 13, 2023

I would agree, but I'm not sure it matters because this new functionality is only present if --workers=2 or greater. (multiprocess.py only)

Unless I'm mistaken, k8s pods are running single worker mode (1 worker = 1 pod), because k8s is being used as the "worker manager".

In the current multiprocess.py behavior: the parent uvicorn process awkwardly stays alive (and unusable) even when all workers are dead: #517 This mode has never been suitable for use with an external worker manager, AFAIK.

@Kludex Kludex requested a review from euri10 April 15, 2023 14:41
@Kludex Kludex added the hold Don't merge yet label Apr 15, 2023
break
# Restart expired workers.
for process in self.processes:
if not process.is_alive():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not reliable. We can't know if the process is stuck. It will only give the false impression of a good process manager.

Copy link
Author

@gnat gnat Apr 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've considered this but I don't think it's reasonable to expect Uvicorn to be so involved, because determining "stuck" and the handling of "stuck" could be quite different for every project: It's up to the developer to decide if/when/how to kill a "stuck" worker.

Nor is it the point of this PR.

We want Uvicorn to just mimic k8s here- and only become involved when the worker is dead/crashed/exited, which is what is_alive() does perfectly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good change. I agree with @gnat. I think the feature that @Kludex is after is a slightly different scope.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the description it says: "we're eliminating an entire dependency with only a few lines.", but gunicorn does check if the process is stuck, so it's the same scope, and this is not a replacement for it.

@humrochagf
Copy link
Contributor

I would agree, but I'm not sure it matters because this new functionality is only present if --workers=2 or greater. (multiprocess.py only)

Unless I'm mistaken, k8s pods are running single worker mode (1 worker = 1 pod), because k8s is being used as the "worker manager".

In the current multiprocess.py behavior: the parent uvicorn process awkwardly stays alive (and unusable) even when all workers are dead: #517 This mode has never been suitable for use with an external worker manager, AFAIK.

Sorry my bad, you are right, the multiprocess isn't used in the single worker case

Copy link
Member

@Kludex Kludex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should include this change given the "simplicity" and how it "lies" to users about being reliable.

But... If we want this to get in, at least those should happen:

  1. Add tests.
  2. Answer the question: why this shouldn't be an opt-in feature?

@@ -6,6 +6,7 @@ on:
branches: ["master"]
pull_request:
branches: ["master"]
workflow_dispatch:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you add this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This enables dev branches to manually run the test suite.

image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without creating a PR you mean?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's correct. It allows for running the test suite manually on your own fork or branch prior to creating a pull request.

break
# Restart expired workers.
for process in self.processes:
if not process.is_alive():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the description it says: "we're eliminating an entire dependency with only a few lines.", but gunicorn does check if the process is stuck, so it's the same scope, and this is not a replacement for it.

@zanieb
Copy link

zanieb commented Apr 23, 2023

  1. Answer the question: why this shouldn't be an opt-in feature?

It seems like advanced monitoring should be opt-in and simple "is alive" monitoring should be the default since Python doesn't let you "opt-out" of dependencies.

@gnat
Copy link
Author

gnat commented Apr 24, 2023

  1. Answer the question: why this shouldn't be an opt-in feature?

The current behavior is broken: multiprocess.py becomes a zombie process when all the workers are dead.

multiprocess.py is not used when running under k8s or gunicorn, so it's not obvious that this is broken.

@Kludex
Copy link
Member

Kludex commented Mar 2, 2024

We're not doing this. I prefer to have reliable restart, as I said on #1942 (comment).

I'll make @abersheeran 's PR happen instead: #2183 👍

@Kludex Kludex closed this Mar 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hold Don't merge yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Master process should restart expired workers.
5 participants