-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: type errors in chromadb/utils/embedding_functions.py #1169
Labels
Comments
Sir, I want to go ahead with this issue and want to give it try. I will give my best, please allow mw to do so. |
Thanks Shubhrajit! Feel free to assign me as a reviewer on the PR whenever it's ready. Thank you. |
beggers
added a commit
that referenced
this issue
Sep 22, 2023
## Description of changes This PR accomplishes two things: - Adds batching to metrics to decrease load to Posthog - Adds more metric instrumentation Each `TelemetryEvent` type now has a `batch_size` member defining how many of that Event to include in a batch. `TelemetryEvent`s with `batch_size > 1` must also define `can_batch()` and `batch()` methods to do the actual batching -- our posthog client can't do this itself since different `TelemetryEvent`s use different count fields. The Posthog client combines events until they hit their `batch_size` then fires them off as one event. NB: this means we can drop up to `batch_size` events -- since we only batch `add()` calls right now this seems fine, though we may want to address it in the future. As for the additional telemetry, I pretty much copied Anton's draft #859 with some minor changes. Other considerations: Maybe we should implement `can_batch()` and `batch()` on all events, even those which don't currently use them? I'd prefer not to leave dead code hanging around but happy to go either way. I created a ticket for the type ignores: #1169 ## Test plan pytest passes modulo a couple unrelated failures With `print(event.properties)` in posthog client's `_direct_capture()`: ``` >>> import chromadb >>> client = chromadb.Client() {'batch_size': 1} >>> collection = client.create_collection("sample_collection") {'batch_size': 1, 'collection_uuid': 'bb19d790-4ec7-436c-b781-46dab047625d', 'embedding_function': 'ONNXMiniLM_L6_V2'} >>> collection.add( ... documents=["This is document1", "This is document2"], # we embed for you, or bring your own ... metadatas=[{"source": "notion"}, {"source": "google-docs"}], # filter on arbitrary metadata! ... ids=["doc1", "doc2"], # must be unique for each doc ... ) {'batch_size': 1, 'collection_uuid': 'bb19d790-4ec7-436c-b781-46dab047625d', 'add_amount': 2, 'with_documents': 2, 'with_metadata': 2} >>> for i in range(50): ... collection.add(documents=[str(i)], ids=[str(i)]) ... {'batch_size': 20, 'collection_uuid': 'bb19d790-4ec7-436c-b781-46dab047625d', 'add_amount': 20, 'with_documents': 20, 'with_metadata': 0} {'batch_size': 20, 'collection_uuid': 'bb19d790-4ec7-436c-b781-46dab047625d', 'add_amount': 20, 'with_documents': 20, 'with_metadata': 0} >>> for i in range(50): ... collection.add(documents=[str(i) + ' ' + str(n) for n in range(20)], ids=[str(i) + ' ' + str(n) for n in range(20)]) ... {'batch_size': 20, 'collection_uuid': 'bb19d790-4ec7-436c-b781-46dab047625d', 'add_amount': 210, 'with_documents': 210, 'with_metadata': 0} {'batch_size': 20, 'collection_uuid': 'bb19d790-4ec7-436c-b781-46dab047625d', 'add_amount': 400, 'with_documents': 400, 'with_metadata': 0} {'batch_size': 20, 'collection_uuid': 'bb19d790-4ec7-436c-b781-46dab047625d', 'add_amount': 400, 'with_documents': 400, 'with_metadata': 0} ``` ## Documentation Changes chroma-core/docs#139 chroma-core/docs@a4fd57d
if this issue still open, i would like to work on it |
@DevMadhav13 I believe it's still open! |
DevMadhav13
added a commit
to DevMadhav13/chroma
that referenced
this issue
Dec 13, 2023
1 task
@beggers PR Raised for this issue. Please review. |
beggers
added a commit
that referenced
this issue
Jan 11, 2024
## Description of changes - Added correct type annotations for these methods #1169 ## Test plan *How are these changes tested?* - [ ] Tests pass locally with `pytest` for python, `yarn test` for js ## Documentation Changes *Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the [docs repository](https://github.com/chroma-core/docs)?* --------- Co-authored-by: Ran <[email protected]> Co-authored-by: Ben Eggers <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
What happened?
mypy
complains about type errors inchromadb/utils/embedding_functions.py
, specifically with some__call__
and_normalize
methods. See some of the#type: ignore
s in #1163 for specifics.It's worth figuring out the correct type annotations for these methods and the methods they call into so that our type checking works as intended.
Versions
Chroma latest, python 3.9.6, macOS
Relevant log output
No response
The text was updated successfully, but these errors were encountered: