-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python: feat: add chroma memory store #449
Python: feat: add chroma memory store #449
Conversation
Tried this out with the memory notebook and I have a couple questions. By default, if you replace the VolatileMemoryStore() with ChromaMemoryStore(), the memory store uses embedded DuckDB and I'm pretty confident that it's downloading an embedding model from sentence-transformers - ie the embeddings are not coming from the Kernel, but being generated by chroma infrastructure. Presumably this behavior can be changed using a chromadb settings file? At minimum, we should make sure to document the default behavior, but ideally the base ChromaMemoryStore() without additional settings should use embeddings from the kernel. |
Agree with documentation! If you don't mind me writing it, and tell me what you want it to contain, I can write it :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Can you please add *.chroma/ to the .gitignore under
# Persistant storage
(which you could rename from# Persistant storage
-># Persistent storage
while you're at it :) - See Code Comment - Found the Root Issue ---There's some incorrect logic in this PR that I'm investigating. I originally thought it might be the embeddings not being generated by the Kernel's registered AI service, but your code is consistent with ChromaDB docs for 'bringing your own embeddings." Here's a printout of the difference between memory stores (they should be the same). I will comment further if I find the issue.
With the VolatileMemoryStore:
======================
Query: I love Jupyter notebooks, how should I get started?Result {++i}:
URL: : https://github.com/microsoft/semantic-kernel/blob/main/samples/notebooks/dotnet/00-getting-started.ipynb
Title : Jupyter notebook describing how to get started with the Semantic Kernel
Relevance: 0.8677540305183273Result {++i}:
URL: : https://github.com/microsoft/semantic-kernel/blob/main/samples/notebooks/dotnet/02-running-prompts-from-file.ipynb
Title : Jupyter notebook describing how to pass prompts from a file to a semantic skill or function
Relevance: 0.8164925932277455Result {++i}:
URL: : https://github.com/microsoft/semantic-kernel/blob/main/README.md
Title : README: Installation, getting started, and how to contribute
Relevance: 0.8086237941419581
With ChromaMemoryStore:
===========================
Query: I love Jupyter notebooks, how should I get started?Result {++i}:
URL: : https://github.com/microsoft/semantic-kernel/blob/main/samples/notebooks/dotnet/00-getting-started.ipynb
Title : Jupyter notebook describing how to get started with the Semantic Kernel
Relevance: 0.2644919753074646Result {++i}:
URL: : https://github.com/microsoft/semantic-kernel/blob/main/samples/notebooks/dotnet/02-running-prompts-from-file.ipynb
Title : Jupyter notebook describing how to pass prompts from a file to a semantic skill or function
Relevance: 0.3668723404407501Result {++i}:
URL: : https://github.com/microsoft/semantic-kernel/blob/main/README.md
Title : README: Installation, getting started, and how to contribute
Relevance: 0.38269340991973877Result {++i}:
URL: : https://github.com/microsoft/semantic-kernel/tree/main/samples/apps/chat-summary-webapp-react/README.md
Title : README: README associated with a sample starter react-based chat summary webapp
Relevance: 0.47094565629959106Result {++i}:
URL: : https://github.com/microsoft/semantic-kernel/tree/main/samples/skills/ChatSkill/ChatGPT
Title : Sample demonstrating how to create a chat skill interfacing with ChatGPT
Relevance: 0.5258410573005676
We are pro-documentation! Can you add a description of how ChromaMemoryStore works at the top of ChromaMemoryStore.py. Mentioning things like the kernel generates the embeddings and the decision to custom calculate similarity instead of using ChromaDB nearest neighbor implementation are great things to add. @mkarle @alexchaomander would you agree that this is a good place to add implementation detail comments? |
Yup +1 to putting directly in the comments the implementation details if we have to make certain choices on how to use Chroma with the SK. |
…m-snu/semantic-kernel into feat--add-chroma-memory
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple grammar changes but overall looks good to me!
@joowon-dm-snu heads up, Python SK memory is getting a refactor -> major simplification of classes. If this PR goes through first, I'll take point on updating Chroma memory store. |
okay, i think splitting dependencies should be merged first. I'll take a look when I get to work
No problem! |
@awharrison-28 i do some searches to fix installation of changing
|
Currently reviewing to ensure in line with both SK versions. |
this PR might need change as memory store changed by #684. |
@awharrison-28 Per your last comment re; your update of chroma memory based upon memory_base update.; should the current review be paused? |
@joowon-dm-snu would you be open to me updating this PR directly to update it to be consistent with #684. I'll also have a PR out shortly to solidify how we handle conditional dependencies. |
Okay I get it, I'll handle it this week :) |
…m-snu/semantic-kernel into feat--add-chroma-memory
reopening and updating |
@awharrison-28 ah my forked branch was so twisted so I rebranched it. |
Add support for Chroma https://docs.trychroma.com/ > Chroma is the open-source embedding database. Chroma makes it easy to build LLM apps by making knowledge, facts, and skills pluggable for LLMs. ### Motivation and Context * #403 Support for Chroma embedding database * #426 Python: feat: add chroma memory store * #449 Python: feat: add chroma memory store --------- Co-authored-by: Abby Harrison <[email protected]> Co-authored-by: Abby Harrison <[email protected]> Co-authored-by: Devis Lucato <[email protected]>
Add support for Chroma https://docs.trychroma.com/ > Chroma is the open-source embedding database. Chroma makes it easy to build LLM apps by making knowledge, facts, and skills pluggable for LLMs. ### Motivation and Context * microsoft#403 Support for Chroma embedding database * microsoft#426 Python: feat: add chroma memory store * microsoft#449 Python: feat: add chroma memory store --------- Co-authored-by: Abby Harrison <[email protected]> Co-authored-by: Abby Harrison <[email protected]> Co-authored-by: Devis Lucato <[email protected]>
### Motivation and Context <!-- Thank you for your contribution to the chat-copilot repo! Please help reviewers and future users, providing the following information: 1. Why is this change required? 2. What problem does it solve? 3. What scenario does it contribute to? 4. If it fixes an open issue, please link to the issue here. --> Incorrect artifact path in the plugin deployment workflow is causing the deployment to fail. ### Description <!-- Describe your changes, the overall approach, the underlying design. These notes will help understanding how your code works. Thanks! --> Fix the path. ### Contribution Checklist <!-- Before submitting this PR, please make sure: --> - [ ] The code builds clean without any errors or warnings - [ ] The PR follows the [Contribution Guidelines](https://github.com/microsoft/chat-copilot/blob/main/CONTRIBUTING.md) and the [pre-submission formatting script](https://github.com/microsoft/chat-copilot/blob/main/CONTRIBUTING.md#development-scripts) raises no violations - [ ] All unit tests pass, and I have added new tests where possible - [ ] I didn't break anyone 😄
Motivation and Context
solve issue #403
this is reopened PR #426
Description
I've added one quick E2E test example without adding any unit tests.
I tried to make it as identical to the original classes(
VolatileDataStore
&VolatileMemoryStore
) as possible.This PR has 3 issues.
pip install chromadb
)Contribution Checklist
dotnet format