-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PERF]: Improved server-side serialization with orjson + async #1695
[PERF]: Improved server-side serialization with orjson + async #1695
Conversation
Reviewer ChecklistPlease leverage this checklist to ensure your code review is thorough before approving Testing, Bugs, Errors, Logs, Documentation
System Compatibility
Quality
|
@HammadB, can you rerun failed tests? |
@tazarov yep - reran! |
8314c50
to
ae4733c
Compare
- Added orjson serialization improving serialization performance especially on large batches 100+ docs 2x faster (tested with locust) - Added async body serialization further improving performance (tested with locust) - Added async handling with AnyIO of the more impactful server queries further reducing concurrent request response times Future work: Add orjson serialization at client-side
…PI Server thread pool connections
- Removed commented out code - Added clarification for to_thread usage
- Configurable via `chroma_server_thread_pool_size` setting
d3b442b
to
d844c2f
Compare
@HammadB, rebased and removed the to_thread offloading. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HammadB, I think this is ready now.
fd0b2a0
to
d844c2f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HammadB, I've started working through our discussion yesterday, and I think I'm making some assumptions as to the Auth (and other middlewares, e.g. rate-limit), which will not work well as they are right now. I think both Rate limiting and authz need to be made async which they are not rn.
The bottom line is that I can offload async body reads to a thread-pool (required as I want to still keep the methods sync) and then continue with the sync processing as it is now, until we have all parts ready to make FastAPI methods async.
…f the main loop TODO: Telemetry/Authz has the potentially to negatively impact things - needs async refactor
@HammadB, I think this is good to go, but we need to have a quick follow-up about the following:
|
can we stack those changes on top and we can merge it all down coherently |
This is PR #1695 migrated to Chroma repo for stacking the outstanding changes. ## Description of changes *Summarize the changes made by this PR.* - Improvements & Bug fixes - Added orjson serialization improving serialization performance especially on large batches 100+ docs 2x faster (tested with locust) - Added async body serialization further improving performance (tested with locust) -⚠️ Fixed an issue with SQLite lack of case_sensitive_like for FastAPI Server thread pool connections ## Test plan *How are these changes tested?* - [x] Tests pass locally with `pytest` for python, `yarn test` for js - [x] Locust performance tests For performance tests, **5** concurrent clients were querying, and **1** was adding pre-computed OpenAI (ada-02) embeddings (runtime **~1m**) **Batch size**: 100 (1000 also attached in a zip) Tests with existing `main` codebase: ![Screenshot 2024-02-03 at 17 52 39](https://github.com/chroma-core/chroma/assets/1157440/5d87b4d5-4dae-48fe-908c-7c09db2a5abc) Tests with orjson + async: ![Screenshot 2024-02-03 at 17 53 17](https://github.com/chroma-core/chroma/assets/1157440/d9818fdd-11c3-45c9-81dd-8baecbb638cf) [1m-100batch-5concur.zip](https://github.com/chroma-core/chroma/files/14152062/1m-100batch-5concur.zip) [1m-1000batch-5concur.zip](https://github.com/chroma-core/chroma/files/14152063/1m-1000batch-5concur.zip) ## Documentation Changes *Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the [docs repository](https://github.com/chroma-core/docs)?* ## Refs - https://showmax.engineering/articles/json-python-libraries-overview - https://github.com/ijl/orjson - https://catnotfoundnear.github.io/finding-the-fastest-python-json-library-on-all-python-versions-8-compared.html --------- Co-authored-by: Ben Eggers <[email protected]>
Future work: Add orjson serialization at client-side
Description of changes
Summarize the changes made by this PR.
Test plan
How are these changes tested?
pytest
for python,yarn test
for jsFor performance tests, 5 concurrent clients were querying, and 1 was adding pre-computed OpenAI (ada-02) embeddings (runtime ~1m)
Batch size: 100 (1000 also attached in a zip)
Tests with existing
main
codebase:Tests with orjson + async:
1m-100batch-5concur.zip
1m-1000batch-5concur.zip
Documentation Changes
Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the docs repository?
Refs