-
-
Notifications
You must be signed in to change notification settings - Fork 755
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using UvicornWorkers in Gunicorn cause OOM on K8s #1226
Comments
I guess I figured out. Thanks to @Reikun85. The should be caused by the tcp connection. That is why on my local I did not have the problem but on K8s yes. On the cluster there are some tcp ping from loadbalancer and so on. So I replicated locally with tcping, and the leak appeared also on my PC. I also tested the problem with different uvicorn version, and the leak appear from uvicorn>=0.14.0, no problem with 0.13.4. Also noticed that the leak is present just using the "standard" version of uvicorn and not the full one. NB: the standard version is the most used when using gunicorn as a process manager to run uvicorn workers. To replicate the problem I create an example gist, just build the img with both version, using the two provided dockerfile, than ping the application with a tcp ping tool, you can verify that app memory will grow up never stopping. Here the gist: https://gist.github.com/KiraPC/5016ecee2ae1dd6e860b4494415dbd49 Let me know in case of more information or if something is not clear. |
Next step would be to play with |
What tool did you use to ping? I'm not able to reproduce the issue with the gist. |
I used tcping for MacOs installed via brew |
I've used this: https://neoctobers.readthedocs.io/en/latest/linux/tcpping_on_ubuntu.html I'm on Ubuntu 20.04. |
Mmmm, I'm from mobile now. I used tcping in a while loop in a bash script. |
@Kludex I used this https://github.com/mkirchner/tcping |
Another issue #1030 is suspected of TCP leaks. |
@KiraPC, thanks for the detailed explanation on how to reproduce it! I was able to test it locally, and bisected the library to find the commit where Uvicorn started leaking memory. The memory leak starts being reproducible on this commit: 960d465 (#869). I tested #1192 applied to the current |
@adamantike I am very happy that he was able to help. It is the minimum for the community. In next days, if I'lll be able to find some free times, I'll try to have a look. |
@florimondmanca Pinging you in case you have an idea. 😗 |
A few stats that could allow a faster review and merge for PR #1192. All scenarios running for 10 minutes in the same testing environment:
|
cant reproduce using the gist provided, I'm running the dockerfile then sending |
cant reproduce either with tcping, mem usage seems stable
|
I have created a gist with the configuration I used to bisect the memory leak, based on what @KiraPC provided, but using https://gist.github.com/adamantike/d2af0f0fda5893789d0a1ab71565de48 |
ok I tried that and it's stable after 5 min with 5 tcping terminals open
|
I'm going to post my results later today. But as a quick look, I see that your machine is far more resourceful than mine, so maybe that's why you can't reproduce the issue. Like, I've noticed that if I run the script for 100 iterations, I don't have any problems. |
@KiraPC @adamantike Thanks so much for the detailed reports and debugging so far. IIUC this seems to be related to #869 introducing some kind of connection leak that's visible if we let a server hit by health checks run for a few minutes. The keepalive timeout exploration in #1192 is interesting. I wonder if the keepalive task is being properly cleaned up too? We use a separate |
Opened #1244 with a possible fix — at least on my machine. Happy for you to try it out @KiraPC @adamantike. Edit: meh, #1244 seems to break a bunch of fundamental functionality, at least with the test HTTPX client. Needs refining… |
@euri10, did you try out the gist I shared, and used something like |
This may also fix a memory leak: encode/uvicorn#1226 Closes #35.
This comment has been minimized.
This comment has been minimized.
Tangentially related, if you're already using Kubernetes there's no reason (unless you have something specific to your project) to use Gunicorn as a process manager. Kubernetes should be running your workers directly. Less layers, less complexity and your readiness/liveness probes will be more correct. |
I amended the app to get a sense of where it could happen (https://gist.github.com/euri10/cfd5f0503fdb7b423db7d6d4d76d5e8e) here's the log from the docker api |
edit: discard that |
can you check @florimondmanca PR in #1244 with the small change I made ? |
EDIT: it seems that I cannot reproduce the numbers.
Hey, I encounter the same issue. My code snippet to reproduce the issue:
async def app(scope, receive, send):
assert scope['type'] == 'http'
data = [0] * 10_000_000
await send({
'type': 'http.response.start',
'status': 200,
'headers': [
[b'content-type', b'text/plain'],
],
})
await send({
'type': 'http.response.body',
'body': b'Hello, world!',
}) Launched using
|
I ran some more tests with images to grow memory usage faster. I arrived to 2 conclusions:
Code to reproduce the memory consumption with import httpx
import asyncio
async def main( host, port):
async def handler(reader: asyncio.StreamReader, writer: asyncio.StreamWriter) -> None:
async with httpx.AsyncClient() as client:
r = await client.get('https://raw.githubusercontent.com/tomchristie/uvicorn/master/docs/uvicorn.png')
data = await r.aread()
body = "Hello World!"
response = "HTTP/1.1 200\r\n"
response += f"Content-Length: {len(body)}\r\n"
response += f"Content-Type: text/html; charset=utf-8\r\n"
response += "Connection: close\r\n"
response += "\r\n"
response += body
writer.write(response.encode("utf-8"))
await writer.drain()
writer.close()
server = await asyncio.start_server(handler, host=host, port=port)
async with server:
await server.serve_forever()
asyncio.run(main("0.0.0.0", "8000")) This sample code started ~20MB of ram usage and after 30k requests used ~280MB. |
was reading quickly and about to say there is no drain and there is in fact,
would that explain why any framework / server combination you tried
has a growing memory usage ?
…On Tue, Nov 23, 2021 at 12:16 PM Evaldas Kazlauskis ***@***.***> wrote:
I ran some more tests with images to grow memory usage faster. I arrived to 2 conclusions:
the issue is with asyncio and not uvicorn
the memory stops growing after a while. I cannot tell when, but I saw that memory growth stabilized after some time.
Code to reproduce the memory consumption with asyncio only:
import httpx
import asyncio
async def main( host, port):
async def handler(reader: asyncio.StreamReader, writer: asyncio.StreamWriter) -> None:
async with httpx.AsyncClient() as client:
r = await client.get('https://raw.githubusercontent.com/tomchristie/uvicorn/master/docs/uvicorn.png')
data = await r.aread()
body = "Hello World!"
response = "HTTP/1.1 200\r\n"
response += f"Content-Length: {len(body)}\r\n"
response += f"Content-Type: text/html; charset=utf-8\r\n"
response += "Connection: close\r\n"
response += "\r\n"
response += body
writer.write(response.encode("utf-8"))
await writer.drain()
writer.close()
server = await asyncio.start_server(handler, host=host, port=port)
async with server:
await server.serve_forever()
asyncio.run(main("0.0.0.0", "8000"))
This sample code started ~20MB of ram usage and after 30k requests used ~280MB.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
Yes, I think this would explain why all framework/server combinations saw memory consumption growth. I would say this is an issue or inefficiency of python itself. |
@evalkaz, actually, the issue you are seeing seems related to encode/httpx#978 (comment), which root cause is the creation of new import httpx
import asyncio
async_client = httpx.AsyncClient()
async def main( host, port):
async def handler(reader: asyncio.StreamReader, writer: asyncio.StreamWriter) -> None:
r = await async_client.get('https://raw.githubusercontent.com/tomchristie/uvicorn/master/docs/uvicorn.png')
# ... It seems the |
@adamantike thanks for a tip, but even if I initialize |
I don't think it's a good idea to add potential issues and memory leaks from a different third-party library to the mix (in this case, Regarding your memory usage table per library/ASGI server, can you get some numbers after a greater amount of requests? Going from 18MB to 39MB could seem like a lot, but is not enough of an indication for a memory leak if it stabilizes at that number (because of caches, initializations, and anything lazily created that is not there yet when the server hasn't received any requests). |
It seems that I cannot reproduce the memory increase anymore 🤔 I ran my experiments in docker and taken notes about usage, but as of today the memory stays +- the same after 10k requests. I guess base docker image or any of the python dependencies changed, but I have no idea what happened. |
@florimondmanca Do you have a suggestion on how to handle this issue? Should we revert the changes and then retry to introduce it later on? |
This should be solved by #1332. Can someone else confirm? |
I tested again using the instructions from #1226 (comment), with 20 concurrent tcping invocations over 10 minutes.
|
Thanks @adamantike :) |
For the record: Issue was solved by uvicorn 0.17.1. |
We reverted the changes on 0.17.1 (reverted #1332) as it caused other issues. We've taken another approach, which reverted #869. There were a sequence of small patches on I'll be locking this issue as I think further comments here will mislead other users. In case you feel like you've found a similar issue, or any other bug, feel free to create a GitHub issue or talk with us over Gitter. |
Checklist
master
.Describe the bug
I'm developing a FastApi application deployed on a Kubernetes cluster using gunicorn as process manager.
I'm also using UvicornWorkers for sure, because of the async nature of fastapi.
After the application deployment I can see the memory growing up at rest, until OOM.
This happen just when I use UvicornWorker.
Tests made by me:
Plus, this happens just on the Kubernetes cluster, when I run my application locally (MacBook pro 16) (is the same docker image used on k8s) the leak is not present.
Anyone else had a similar problem?
Important
The text was updated successfully, but these errors were encountered: