Using UvicornWorkers in Gunicorn cause OOM on K8s #1226

KiraPC · 2021-10-21T10:04:52Z

Checklist

The bug is reproducible against the latest release and/or master.
There are no similar issues or pull requests to fix it yet.

Describe the bug

I'm developing a FastApi application deployed on a Kubernetes cluster using gunicorn as process manager.

I'm also using UvicornWorkers for sure, because of the async nature of fastapi.

After the application deployment I can see the memory growing up at rest, until OOM.

This happen just when I use UvicornWorker.

Tests made by me:

Comment all my code to ensure is not a my application mem leak (leak present);
Start the application using uvicorn instead of gunicorn (no leak present);
Start application using gunicorn sync workers (no leak present);
Start application using gunicorn + UvicornWorker (leak present);
Start application using gunicorn + UvicornWorker + max_requests (leak present);

Plus, this happens just on the Kubernetes cluster, when I run my application locally (MacBook pro 16) (is the same docker image used on k8s) the leak is not present.

Anyone else had a similar problem?

Important

We're using Polar.sh so you can upvote and help fund this issue.
We receive the funding once the issue is completed & confirmed by you.
Thank you in advance for helping prioritize & fund our backlog.

The text was updated successfully, but these errors were encountered:

KiraPC · 2021-10-22T09:22:23Z

I guess I figured out. Thanks to @Reikun85.

The should be caused by the tcp connection. That is why on my local I did not have the problem but on K8s yes. On the cluster there are some tcp ping from loadbalancer and so on.

So I replicated locally with tcping, and the leak appeared also on my PC.

I also tested the problem with different uvicorn version, and the leak appear from uvicorn>=0.14.0, no problem with 0.13.4.

Also noticed that the leak is present just using the "standard" version of uvicorn and not the full one. NB: the standard version is the most used when using gunicorn as a process manager to run uvicorn workers.

To replicate the problem I create an example gist, just build the img with both version, using the two provided dockerfile, than ping the application with a tcp ping tool, you can verify that app memory will grow up never stopping.

Here the gist: https://gist.github.com/KiraPC/5016ecee2ae1dd6e860b4494415dbd49

Let me know in case of more information or if something is not clear.

Kludex · 2021-10-22T09:33:23Z

Next step would be to play with --loop and --http CLI parameters.

Kludex · 2021-10-22T15:48:00Z

What tool did you use to ping? I'm not able to reproduce the issue with the gist.

KiraPC · 2021-10-22T15:49:45Z

I used tcping for MacOs installed via brew

Kludex · 2021-10-22T15:51:31Z

I've used this: https://neoctobers.readthedocs.io/en/latest/linux/tcpping_on_ubuntu.html

I'm on Ubuntu 20.04.

KiraPC · 2021-10-22T16:27:08Z

Mmmm, I'm from mobile now. I used tcping in a while loop in a bash script.

KiraPC · 2021-10-22T19:09:19Z

@Kludex I used this https://github.com/mkirchner/tcping

abersheeran · 2021-10-28T01:30:33Z

Another issue #1030 is suspected of TCP leaks.

adamantike · 2021-11-05T15:43:41Z

@KiraPC, thanks for the detailed explanation on how to reproduce it! I was able to test it locally, and bisected the library to find the commit where Uvicorn started leaking memory.

The memory leak starts being reproducible on this commit: 960d465 (#869).

I tested #1192 applied to the current master branch, and the memory leak seems to be still happening, but is notoriously slower. I will keep researching to find the root cause.

KiraPC · 2021-11-05T15:52:10Z

@adamantike I am very happy that he was able to help. It is the minimum for the community.

In next days, if I'lll be able to find some free times, I'll try to have a look.

Kludex · 2021-11-05T17:48:29Z

The memory leak starts being reproducible on this commit: 960d465 (#869).

@florimondmanca Pinging you in case you have an idea. 😗

adamantike · 2021-11-05T21:13:15Z

A few stats that could allow a faster review and merge for PR #1192. All scenarios running for 10 minutes in the same testing environment:

Uvicorn 0.13.4: 0 MiB memory increase (initial: 22 MiB, after tested period: 22 MiB)
Uvicorn on current master: 78 MiB memory increase (initial: 22 MiB, after tested period: 100 MiB)
Uvicorn on current master, with PR Fix keepalive timeout task will not execute to close transport #1192 applied: 5 MiB memory increase (initial: 22 MiB, after tested period: 27 MiB)

euri10 · 2021-11-07T09:32:34Z

cant reproduce using the gist provided, I'm running the dockerfile then sending while true; do nc -zv localhost 8000;done to the exposed port, what am I missing ?
is it reproducible without k8s ?

euri10 · 2021-11-07T11:52:12Z

cant reproduce either with tcping,
running while true; do ./tcping localhost 8000;done I get localhost port 8000 open. messages

mem usage seems stable

2021-11-07 11:46:15,116 Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)
2021-11-07 11:46:15,120 Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)
2021-11-07 11:46:18,121 [36.62890625, 36.62890625, 36.62890625, 36.62890625, 36.62890625]
2021-11-07 11:46:18,131 Number of unreachable objects 0
2021-11-07 11:46:21,135 [36.828125, 36.828125, 36.828125, 36.828125, 36.828125]
...
...
2021-11-07 11:51:16,567 [36.828125, 36.828125, 36.828125, 36.828125, 36.828125]
2021-11-07 11:51:16,577 Number of unreachable objects 0

adamantike · 2021-11-07T16:56:28Z

I have created a gist with the configuration I used to bisect the memory leak, based on what @KiraPC provided, but using tcping within Docker, and docker stats to avoid any other dependencies that could affect the memory measurements.

https://gist.github.com/adamantike/d2af0f0fda5893789d0a1ab71565de48

euri10 · 2021-11-07T18:28:42Z

ok I tried that and it's stable after 5 min with 5 tcping terminals open

2021-11-07 18:22:42,879 Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)
2021-11-07 18:22:45,879 [36.47265625, 36.47265625, 36.47265625, 36.47265625, 36.47265625]
2021-11-07 18:22:45,889 Number of unreachable objects 0
2021-11-07 18:22:48,892 [36.66796875, 36.66796875, 36.66796875, 36.66796875, 36.66796875]

...
...

2021-11-07 18:28:14,873 [36.66796875, 36.66796875, 36.66796875, 36.66796875, 36.66796875]
2021-11-07 18:28:14,882 Number of unreachable objects 0

CONTAINER ID   NAME               CPU %     MEM USAGE / LIMIT     MEM %     NET I/O          BLOCK I/O     PIDS
09a8e30c429f   fervent_robinson   1.10%     32.91MiB / 62.87GiB   0.05%     102kB / 66.6kB   36.8MB / 0B   5

Kludex · 2021-11-08T12:34:28Z

I'm going to post my results later today. But as a quick look, I see that your machine is far more resourceful than mine, so maybe that's why you can't reproduce the issue.

Like, I've noticed that if I run the script for 100 iterations, I don't have any problems.

florimondmanca · 2021-11-08T14:30:14Z

@KiraPC @adamantike Thanks so much for the detailed reports and debugging so far.

IIUC this seems to be related to #869 introducing some kind of connection leak that's visible if we let a server hit by health checks run for a few minutes.

The keepalive timeout exploration in #1192 is interesting. I wonder if the keepalive task is being properly cleaned up too? We use a separate connection_lost future now. Maybe we need to pass it to the keepalive task?

florimondmanca · 2021-11-08T15:11:55Z

Opened #1244 with a possible fix — at least on my machine. Happy for you to try it out @KiraPC @adamantike.

Edit: meh, #1244 seems to break a bunch of fundamental functionality, at least with the test HTTPX client. Needs refining…

adamantike · 2021-11-08T20:09:25Z

@euri10, did you try out the gist I shared, and used something like docker-compose up --build --scale tcping=20 to speed up the memory leak?

leandrorebelo · 2021-11-08T20:24:08Z

I have the same issue in version 0.15.0

Picture from pods

The app has only the health check because We are removing the Gunicorn + UvicornWorker and I trying to run the app without restarting the worker with the Uvicorn (Tests).

Closes #35. This reverts commit ba043be.

This may also fix a memory leak: encode/uvicorn#1226 Closes #35.

Tinche · 2021-11-18T16:37:54Z

Tangentially related, if you're already using Kubernetes there's no reason (unless you have something specific to your project) to use Gunicorn as a process manager. Kubernetes should be running your workers directly. Less layers, less complexity and your readiness/liveness probes will be more correct.

euri10 · 2021-11-20T11:50:20Z

@euri10, did you try out the gist I shared, and used something like docker-compose up --build --scale tcping=20 to speed up the memory leak?

I could only reproduce in docker which I find super strange, locally there's no leak at all I can trigger even by spawning 20 tcping (for i in {1..20};do tcping --port 8080 --count 10000 localhost &;done ) processes which is what the scale compose does.

I amended the app to get a sense of where it could happen (https://gist.github.com/euri10/cfd5f0503fdb7b423db7d6d4d76d5e8e)

here's the log from the docker api
https://gist.github.com/euri10/0731a5487c349040a2586ce6a9ba71a7

euri10 · 2021-11-20T12:22:59Z

edit: discard that
seems like a docker thing to me, with the app running locally and the tcping compose scaled to 100 I cant reproduce.
so there might be something network related within the api container that causes the oom issue

euri10 · 2021-11-20T15:47:09Z

can you check @florimondmanca PR in #1244 with the small change I made ?

evaldask · 2021-11-22T14:58:49Z

EDIT: it seems that I cannot reproduce the numbers.

Hey, I encounter the same issue. My code snippet to reproduce the issue:

async def app(scope, receive, send):
    assert scope['type'] == 'http'

    data = [0] * 10_000_000

    await send({
        'type': 'http.response.start',
        'status': 200,
        'headers': [
            [b'content-type', b'text/plain'],
        ],
    })
    await send({
        'type': 'http.response.body',
        'body': b'Hello, world!',
    })

Launched using uvicorn main:app --host 0.0.0.0 --port 8000. After sending ~10k requests to the API, memory usage goes from ~18MB to 39MB. The same thing happens with startlette and apidaora (launching them with uvicorn) and the memory usage patterns looks quite similar. I also tested flask with gunicorn, sanic and their memory usage stayed +- the same. I also checked PR#1244, but the issue persists.

Library	ASGI	memory (start)	memory (end)	requests
uvicorn	uvicorn	18 MB	38 MB	11k
starlette	uvicorn	19 MB	39 MB	10k
apidoara	uvicorn	18 MB	39 MB	12k
uvicorn PR#1244	uvicorn	18 MB	39 MB	10k
quart	hypercorn	26 MB	63 MB	10k
apidoara	hypercorn	16 MB	42 MB	10k
starlette	daphne	34 MB	46 MB	10k
apidoara	daphne	32 MB	45 MB	10k
flask	gunicorn	35 MB	35 MB	10k
sanic	sanic	17 MB	22 MB	10k

evaldask · 2021-11-23T11:16:31Z

I ran some more tests with images to grow memory usage faster. I arrived to 2 conclusions:

the issue is with asyncio and not uvicorn
the memory stops growing after a while. I cannot tell when, but I saw that memory growth stabilized after some time.

Code to reproduce the memory consumption with asyncio only:

import httpx
import asyncio

async def main( host, port):
    async def handler(reader: asyncio.StreamReader, writer: asyncio.StreamWriter) -> None:
        async with httpx.AsyncClient() as client:
            r = await client.get('https://raw.githubusercontent.com/tomchristie/uvicorn/master/docs/uvicorn.png')
            data = await r.aread()

        body = "Hello World!"
        response = "HTTP/1.1 200\r\n"
        response += f"Content-Length: {len(body)}\r\n"
        response += f"Content-Type: text/html; charset=utf-8\r\n"
        response += "Connection: close\r\n"
        response += "\r\n"
        response += body
        writer.write(response.encode("utf-8"))
        await writer.drain()
        writer.close()

    server = await asyncio.start_server(handler, host=host, port=port)

    async with server:
        await server.serve_forever()

asyncio.run(main("0.0.0.0", "8000"))

This sample code started ~20MB of ram usage and after 30k requests used ~280MB.

euri10 · 2021-11-23T11:36:05Z

was reading quickly and about to say there is no drain and there is in fact, would that explain why any framework / server combination you tried has a growing memory usage ?

…

On Tue, Nov 23, 2021 at 12:16 PM Evaldas Kazlauskis ***@***.***> wrote: I ran some more tests with images to grow memory usage faster. I arrived to 2 conclusions: the issue is with asyncio and not uvicorn the memory stops growing after a while. I cannot tell when, but I saw that memory growth stabilized after some time. Code to reproduce the memory consumption with asyncio only: import httpx import asyncio async def main( host, port): async def handler(reader: asyncio.StreamReader, writer: asyncio.StreamWriter) -> None: async with httpx.AsyncClient() as client: r = await client.get('https://raw.githubusercontent.com/tomchristie/uvicorn/master/docs/uvicorn.png') data = await r.aread() body = "Hello World!" response = "HTTP/1.1 200\r\n" response += f"Content-Length: {len(body)}\r\n" response += f"Content-Type: text/html; charset=utf-8\r\n" response += "Connection: close\r\n" response += "\r\n" response += body writer.write(response.encode("utf-8")) await writer.drain() writer.close() server = await asyncio.start_server(handler, host=host, port=port) async with server: await server.serve_forever() asyncio.run(main("0.0.0.0", "8000")) This sample code started ~20MB of ram usage and after 30k requests used ~280MB. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

evaldask · 2021-11-23T11:40:43Z

Yes, I think this would explain why all framework/server combinations saw memory consumption growth. I would say this is an issue or inefficiency of python itself.

adamantike · 2021-11-23T13:08:09Z

@evalkaz, actually, the issue you are seeing seems related to encode/httpx#978 (comment), which root cause is the creation of new httpx.AsyncClient on each request. If you change your implementation to the following, you should stop experiencing that memory leak:

import httpx
import asyncio

async_client = httpx.AsyncClient()

async def main( host, port):
    async def handler(reader: asyncio.StreamReader, writer: asyncio.StreamWriter) -> None:
        r = await async_client.get('https://raw.githubusercontent.com/tomchristie/uvicorn/master/docs/uvicorn.png')
        # ...

It seems the httpx code snippets weren't updated to reflect that best practice.

evaldask · 2021-11-23T16:55:18Z

@adamantike thanks for a tip, but even if I initialize httpx.AsyncClient() outside main() my memory usage grows. Started with ~22MB and after sending it 40k requests the app used ~884MB memory.

adamantike · 2021-11-23T17:26:35Z

I don't think it's a good idea to add potential issues and memory leaks from a different third-party library to the mix (in this case, httpx). Are you able to reproduce that memory leak with only asyncio (plus only stdlib)?

Regarding your memory usage table per library/ASGI server, can you get some numbers after a greater amount of requests? Going from 18MB to 39MB could seem like a lot, but is not enough of an indication for a memory leak if it stabilizes at that number (because of caches, initializations, and anything lazily created that is not there yet when the server hasn't received any requests).

evaldask · 2021-11-29T10:12:21Z

It seems that I cannot reproduce the memory increase anymore 🤔 I ran my experiments in docker and taken notes about usage, but as of today the memory stays +- the same after 10k requests. I guess base docker image or any of the python dependencies changed, but I have no idea what happened.

Kludex · 2021-12-08T12:30:41Z

@florimondmanca Do you have a suggestion on how to handle this issue? Should we revert the changes and then retry to introduce it later on?

Kludex · 2022-01-28T10:16:12Z

This should be solved by #1332. Can someone else confirm?

adamantike · 2022-01-28T15:49:56Z

I tested again using the instructions from #1226 (comment), with 20 concurrent tcping invocations over 10 minutes.

Uvicorn 0.17.0: 184 MiB memory increase (initial: 22 MiB, after tested period: 206 MiB)
Uvicorn 0.17.1: 0 MiB memory increase (initial: 22 MiB, after tested period: 22 MiB) 🎉

Kludex · 2022-01-28T15:53:57Z

Thanks @adamantike :)

Kludex · 2022-01-28T15:54:19Z

For the record: Issue was solved by uvicorn 0.17.1.

psychomelet · 2022-02-08T03:02:55Z

@Kludex is the leak still fixed now that #1332 reverted?

Kludex · 2022-02-09T12:32:31Z

We reverted the changes on 0.17.1 (reverted #1332) as it caused other issues. We've taken another approach, which reverted #869.

There were a sequence of small patches on 0.17.x, and we recommend to use 0.17.4 (the latest at the moment of this comment).

I'll be locking this issue as I think further comments here will mislead other users. In case you feel like you've found a similar issue, or any other bug, feel free to create a GitHub issue or talk with us over Gitter.

Kludex added the need confirmation label Oct 21, 2021

KiraPC mentioned this issue Oct 21, 2021

The memory usage piles up over the time and leads to OOM fastapi/fastapi#1624

Closed

9 tasks

KiraPC mentioned this issue Oct 22, 2021

New uvicorn[standard] version cause memory leak due to tcp ping tiangolo/uvicorn-gunicorn-docker#98

Closed

Kludex added bug help wanted and removed need confirmation labels Nov 3, 2021

Kludex mentioned this issue Nov 7, 2021

Leak issue #1239

Closed

florimondmanca mentioned this issue Nov 8, 2021

Ensure HTTP connections are closed in case of data-less TCP pings #1244

Closed

rouge8 referenced this issue in rouge8/wanikani-apprentice Nov 16, 2021

Check application health over HTTPS instead of TCP

a3cd5ff

Closes #35. This reverts commit ba043be.

rouge8 added a commit to rouge8/wanikani-apprentice that referenced this issue Nov 16, 2021

Check application health over HTTP instead of TCP

cc1a4fa

This may also fix a memory leak: encode/uvicorn#1226 Closes #35.

This comment has been minimized.

Sign in to view

Kludex mentioned this issue Jan 6, 2022

Revert stream interface #1307

Closed

NargiT mentioned this issue Jan 12, 2022

Support Binding to Multiple Ports #571

Closed

pacoyang mentioned this issue Jan 17, 2022

Fix ensure http connection is closed #1332

Merged

Kludex closed this as completed Jan 28, 2022

encode locked as resolved and limited conversation to collaborators Feb 9, 2022

Using UvicornWorkers in Gunicorn cause OOM on K8s #1226

Using UvicornWorkers in Gunicorn cause OOM on K8s #1226

Comments

KiraPC commented Oct 21, 2021 • edited by polar-sh bot Loading

Checklist

Describe the bug

KiraPC commented Oct 22, 2021

Kludex commented Oct 22, 2021

Kludex commented Oct 22, 2021

KiraPC commented Oct 22, 2021

Kludex commented Oct 22, 2021

KiraPC commented Oct 22, 2021

KiraPC commented Oct 22, 2021

abersheeran commented Oct 28, 2021

adamantike commented Nov 5, 2021

KiraPC commented Nov 5, 2021

Kludex commented Nov 5, 2021

adamantike commented Nov 5, 2021

euri10 commented Nov 7, 2021 • edited Loading

euri10 commented Nov 7, 2021

adamantike commented Nov 7, 2021

euri10 commented Nov 7, 2021

Kludex commented Nov 8, 2021

florimondmanca commented Nov 8, 2021 • edited Loading

florimondmanca commented Nov 8, 2021 • edited Loading

adamantike commented Nov 8, 2021

leandrorebelo commented Nov 8, 2021

This comment has been minimized.

Tinche commented Nov 18, 2021

euri10 commented Nov 20, 2021 • edited Loading

euri10 commented Nov 20, 2021 • edited Loading

euri10 commented Nov 20, 2021

evaldask commented Nov 22, 2021 • edited Loading

evaldask commented Nov 23, 2021

euri10 commented Nov 23, 2021 via email

evaldask commented Nov 23, 2021

adamantike commented Nov 23, 2021

evaldask commented Nov 23, 2021

adamantike commented Nov 23, 2021

evaldask commented Nov 29, 2021

Kludex commented Dec 8, 2021

Kludex commented Jan 28, 2022

adamantike commented Jan 28, 2022 • edited Loading

Kludex commented Jan 28, 2022

Kludex commented Jan 28, 2022 • edited Loading

psychomelet commented Feb 8, 2022

Kludex commented Feb 9, 2022

KiraPC commented Oct 21, 2021 •

edited by polar-sh bot

Loading

euri10 commented Nov 7, 2021 •

edited

Loading

florimondmanca commented Nov 8, 2021 •

edited

Loading

florimondmanca commented Nov 8, 2021 •

edited

Loading

euri10 commented Nov 20, 2021 •

edited

Loading

euri10 commented Nov 20, 2021 •

edited

Loading

evaldask commented Nov 22, 2021 •

edited

Loading

adamantike commented Jan 28, 2022 •

edited

Loading

Kludex commented Jan 28, 2022 •

edited

Loading