Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] - Memory leaks during environment creation tasks #848

Closed
peytondmurray opened this issue Jul 9, 2024 · 9 comments
Closed

[BUG] - Memory leaks during environment creation tasks #848

peytondmurray opened this issue Jul 9, 2024 · 9 comments
Assignees
Labels
area: user experience 👩🏻‍💻 Items impacting the end-user experience impact: high 🟥 This issue affects most of the conda-store users or is a critical issue needs: investigation 🔎 Someone in the team needs to look into this issue before scoping type: bug 🐛 Something isn't working type: maintenance 🛠

Comments

@peytondmurray
Copy link
Contributor

peytondmurray commented Jul 9, 2024

Describe the bug

See nebari-dev/nebari#2418 and #840 for context. TLDR: action_add_conda_prefix_packages is leaking memory, causing problems on various nebari deployments.

edit by @trallard: It seems action_add_conda_prefix_packages is not the main or at least not the sole culprit of memory leaks so I adjusted the title

Memray flamegraph:

image

Expected behavior

No memory leaks.

How to Reproduce the problem?

See nebari-dev/nebari#2418 for a description.

Output

No response

Versions and dependencies used.

No response

Anything else?

No response

@peytondmurray peytondmurray added type: bug 🐛 Something isn't working impact: high 🟥 This issue affects most of the conda-store users or is a critical issue area: user experience 👩🏻‍💻 Items impacting the end-user experience type: maintenance 🛠 needs: investigation 🔎 Someone in the team needs to look into this issue before scoping labels Jul 9, 2024
@peytondmurray peytondmurray self-assigned this Jul 9, 2024
@Adam-D-Lewis
Copy link
Contributor

Adam-D-Lewis commented Jul 9, 2024

  • The way I generated the flamegraph was to run docker compose up --build -d to start up the docker containers after I made a few changes including adding memray to the docker container and setting concurrency=1 in conda-store/tests/assets/conda_store_config.py.
  • I then ran docker compose exec -it conda-store-worker bash to get a shell in the container. I waited until the worker seemed idle then I attached memray to the running process with memray attach --trace-python-allocators --native <pid> -o myfile.bin to the process which seemed to be growing the most in memory.
  • I then logged into the UI at localhost:8080/conda-store/admin and created a bunch of environments with very little in them (just the nothing package installed by pip used in some of the tests). I watched as the memory raises with each new build.
  • I then copied the myfile.bin out of the docker container with docker cp <container_id>:/path/to/myfile.bin ..
  • I then generated the flamegraph with memray flamegraph --leaks myfile.bin.

The conda store worker docker container usage rises by ~18Mb with each new env build according to docker compose stats, but the memory usage at the top of the flamegraph (see below) seems to show an increase of about 7Mb with each new env build (each spike is a new conda env build). I then ran a quick test bypassing list_conda_prefix_packages and the memory seemed to only increase by about ~11Mb with each new env so it seems likely that there may be multiple sources of memory growth (perhaps the rest is in one of the other subprocesses started by conda store that I didn't track), but I'm not sure where the other increase(s) are currently.

image

@trallard
Copy link
Collaborator

Per @Adam-D-Lewis's comment above about seeing memory increases even after bypassing the initially flagged action, we need to do a more in-depth profiling analysis.

@trallard trallard changed the title [BUG] - action_add_conda_prefix_packages leaks memory [BUG] - Memory leaks during environment creation tasks Jul 11, 2024
@Adam-D-Lewis
Copy link
Contributor

I tried restricting the docker container to 1 GiB of memory usage, and the memory growth per build was then less than 18Mb per build, but did increase, and eventually the celery worker was restarted due to memory usage. I'm wondering if the excess 11Mb of memory growth in the docker container when I had no memory limit specified were just due to python being a memory managed language and that the memory usage from garbage collected objects is not always given back to the OS.

That being said, I think it should still be possible to write a test showing the memory usage grows to the point where a celery worker runs out of memory.

@trallard
Copy link
Collaborator

@peytondmurray is this still relevant. IIRC we could not demonstrate that there was in fact a leak happening but we will be doing some profiling soon too.
Shall we close this issue

@peytondmurray
Copy link
Contributor Author

That sounds fine; closing now. Anyway, even if there was a leak we are now restarting workers regularly so the symptom is in principle avoided even if any underlying leak (if any) isn't solved.

@github-project-automation github-project-automation bot moved this from New 🚦 to Done 💪🏾 in conda-store 🐍 Sep 10, 2024
@kcpevey
Copy link
Contributor

kcpevey commented Sep 13, 2024

I'm running conda-store-ui locally with a fresh build. My creation of an env with python and rich is making the conda-store worker eat 10+GB of RAM. Even if the workers get restarted, this doesn't seem right?

@trallard
Copy link
Collaborator

Are you running this in Docker or standalone?
@peytondmurray could not reproduce any mem leeks IIRC and neither could I last time I looked into this.

@peytondmurray
Copy link
Contributor Author

I'm running conda-store-ui locally with a fresh build. My creation of an env with python and rich is making the conda-store worker eat 10+GB of RAM. Even if the workers get restarted, this doesn't seem right?

If you're using docker compose up --build, it would also be good to know whether this happens after you've freshly started the services. From your description it feels like something the worker is doing is actually eating memory, rather than leaking it slowly over time.

@kcpevey
Copy link
Contributor

kcpevey commented Sep 17, 2024

I was using docker compose up --build. And yes, it does seem like something the worker is doing rather than a slow leak.

I was trying to look at the logs but I dont know what all the output means. For example, it seems like a "bulk save" task is rather time consuming and unrelated to any env build I'm doing, but I don't know what that is or what it does.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: user experience 👩🏻‍💻 Items impacting the end-user experience impact: high 🟥 This issue affects most of the conda-store users or is a critical issue needs: investigation 🔎 Someone in the team needs to look into this issue before scoping type: bug 🐛 Something isn't working type: maintenance 🛠
Projects
Archived in project
Development

No branches or pull requests

4 participants