-
Notifications
You must be signed in to change notification settings - Fork 801
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug: memory leak when I am using bentoml>=1.2 #4760
Comments
Can you observe the same when running locally? |
I am not using bentoml in local, because I am using it in production level now. even if I only change the version from 1.1 to 1.2 with same environment, it occurs. do you have any idea why the memory is going up!? |
Can you reproduce it without containerizing? Run |
I think it might be difficult to monitor memories with out containerizing, because of other systems. I will try in empty ec2 instance. But just for now, it is bug for me because I'm using conatainerized bentoml. Could you check containerized image first? |
200k reqs and the memory usage doesn't change too much. To rule out other issues, can you first upgrade Python to 3.9.18(which i am using)? |
same..
Environment variableBENTOML_DEBUG=''
BENTOML_QUIET=''
BENTOML_BUNDLE_LOCAL_BUILD=''
BENTOML_DO_NOT_TRACK=''
BENTOML_CONFIG=''
BENTOML_CONFIG_OPTIONS=''
BENTOML_PORT=''
BENTOML_HOST=''
BENTOML_API_WORKERS='' System information
|
Can't reproduce either, can you use a memory profiler to figure it out? I recommend memray |
encode/httpx#978 (comment) Would that be related? (It is surprising that it exists in both client libraries) |
Would you mind to follow the step?
|
I can reproduce it locally. you can see the I think it the glibc malloc issue, the memory which is requested from Python is too small and to split to be merged into a big chunk. So after when we call Here's two ways to solve this
|
Interesting You are using ARM service? |
The test was in my local (m1 max), but it was same in ec2 instance. ( |
Can reproduced locally, new issue lol let me find it out Assign this to me plz cc @frostming |
@gusghrlrl101 Try upgrading the dependencies by |
After debug, @frostming and me confirmed that this bug has been introduced into codebase in #4337 TL;DR; In #4337 , @frostming made a new feature: make a tmp directory per request and use the tmp directory to cache all necessary files during the request with tempfile.TemporaryDirectory(prefix="bentoml-request-") as temp_dir:
dir_token = request_directory.set(temp_dir)
try:
yield self
finally:
self._request_var.reset(request_token)
self._response_var.reset(response_token)
request_directory.reset(dir_token) But there is a problem, when we make a new directory, the process may trigger a page cache action in kernel. The cache may be not released in time. This means that we will have a lot cache here. The So you will see the memory will continue rowing up util the os refresh the cache page We can use bpftrace to verify this
The results here
We can see the process allocate a lots of page cahe and just release a little bit. |
For now, I think this is not a bug. It could be treated as a normal action. You can set the memory limit for your container. and the page cache would be release automaticly when the container memory usage is high under your limit |
Thank you. |
In most circumstances, it's not necessary to flush the page cache manually. If you want to do this, run
Emmmm, I'm not sure about this, cc @frostming |
Fixes bentoml#4760 Signed-off-by: Frost Ming <[email protected]>
Fixes bentoml#4760 Signed-off-by: Frost Ming <[email protected]>
@frostming Hello. How is it going? |
* fix: bug: memory leak when using bentoml>=1.2 Fixes #4760 Signed-off-by: Frost Ming <[email protected]> * add docstring Signed-off-by: Frost Ming <[email protected]> * fix: access attribute Signed-off-by: Frost Ming <[email protected]> * fix: typo Signed-off-by: Frost Ming <[email protected]> * fix: use a directory pool Signed-off-by: Frost Ming <[email protected]> * typo Signed-off-by: Frost Ming <[email protected]> * fix: clean dir out of mutex Signed-off-by: Frost Ming <[email protected]> --------- Signed-off-by: Frost Ming <[email protected]>
@gusghrlrl101 the PR has just been merged, will be available in the next release. Thank you for reporting this issue! It would be great if you could help verify the fix using the main branch or with the next release 🙏 |
Describe the bug
Hello!
It seems that I am experiencing a memory leak issue when using BentoML version 1.2.
I even tested it with an API that contains no logic.
I deployed it in a Kubernetes environment using containerization and did performance testing with Locust, but the memory keeps increasing.
Has no one else experienced this issue?
I can't figure out what's going wrong.
It was fine with version 1.1.
To reproduce
Expected behavior
No response
Environment
Environment variable
System information
bentoml
: 1.2.16python
: 3.9.6platform
: Linux-5.10.162-141.675.amzn2.x86_64-x86_64-with-glibc2.26uid_gid
: 1000:1000pip_packages
The text was updated successfully, but these errors were encountered: