Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: running chromadb/chroma container in docker - RAM memory of container grows endlessly while quering collection #1908

Open
pilotofbalance opened this issue Mar 21, 2024 · 6 comments
Labels
bug Something isn't working

Comments

@pilotofbalance
Copy link

What happened?

I'm running chroma in docker, with their chromadb/chroma official image.
I'm ingesting only embeddings and indexes to the collection.
Then I'm using k6 to make some load tests and run queries against this collection. (k6 scenario: "loadTest": { "executor": "shared-iterations", "iterations": 200, "vus": 100 } )
In docker stats there is a constant grow of memory while running a tests and it's not released at the end of tests for some reason, moreover if I'm running load test again, memory continue to grow...this happen till it reach container memory limit and crushes.

Versions

chroma 0.4.24

Relevant log output

No response

@pilotofbalance pilotofbalance added the bug Something isn't working label Mar 21, 2024
@tazarov
Copy link
Contributor

tazarov commented Mar 26, 2024

@pilotofbalance, thank you for reporting this. I do not claim to know how your tests are set up, but generally, if the test(s) create and insert data throughout the test run, memory is expected to grow. Do you have any pre/post-test clean-up?

Is there any way that this can be reproduced?

@pilotofbalance
Copy link
Author

@tazarov
sure,
follow those steps:

  1. docker pull chromadb/chroma
  2. docker run -p 8081:8000 chromadb/chroma
  3. load vectors to db, I used some generated embedings, you can fing more info here https://github.com/pilotofbalance/vector_db_benchmarking/blob/main/chromadb/README.md
  4. run tests k6 run script.js, here is what I used https://github.com/pilotofbalance/vector_db_benchmarking/blob/main/k6/script.js
  5. docker stats

In general you don't have to reproduce exact my case, just load vectors, even random, and run query.
look at the docker stats RAM memory metrics, the problem is that after each query chroma increases her memory and it is not released after that.

@jczic
Copy link

jczic commented Apr 14, 2024

I seem to have the same problem in a Python virtual environment.
By using collections, the memory keeps increasing until it uses all the server RAM, no matter how much I use a gc.collect()...
Any idea ?

@liuhetian
Copy link

I had the same problem. I had to restart docker every two or three days, otherwise it would fill up the server's memory.

0.5.4.dev33

@xiel0325
Copy link

xiel0325 commented Aug 5, 2024

I had the same problem too. because of this problem, it cannot be used in a production environment...

@tazarov
Copy link
Contributor

tazarov commented Aug 6, 2024

@xiel0325 @liuhetian, the issue you are facing was due to a connection leak that leaked FDs that, over time, grew quite a bit. This problem was discovered a while ago in #1379, but was only fixed about two weeks ago with #2014. The fix has not yet made it to an official release, but you can use the latest from main or, if you are using docker, use the latest builds e.g. docker pull chromadb/chroma:0.5.6.dev35

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants