Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python: feat: add chroma memory store #449

Closed
wants to merge 28 commits into from
Closed

Python: feat: add chroma memory store #449

wants to merge 28 commits into from

Conversation

joowon-dm-snu
Copy link
Contributor

Motivation and Context

solve issue #403
this is reopened PR #426

Description

I've added one quick E2E test example without adding any unit tests.
I tried to make it as identical to the original classes(VolatileDataStore & VolatileMemoryStore) as possible.

This PR has 3 issues.

Contribution Checklist

@adrianwyatt adrianwyatt added the python Pull requests for the Python Semantic Kernel label Apr 14, 2023
@awharrison-28
Copy link
Contributor

awharrison-28 commented Apr 17, 2023

Hi @joowon-dm-snu

Tried this out with the memory notebook and I have a couple questions. By default, if you replace the VolatileMemoryStore() with ChromaMemoryStore(), the memory store uses embedded DuckDB and I'm pretty confident that it's downloading an embedding model from sentence-transformers - ie the embeddings are not coming from the Kernel, but being generated by chroma infrastructure. Presumably this behavior can be changed using a chromadb settings file?

At minimum, we should make sure to document the default behavior, but ideally the base ChromaMemoryStore() without additional settings should use embeddings from the kernel.

@joowon-dm-snu
Copy link
Contributor Author

@awharrison-28

At minimum, we should make sure to document the default behavior, but ideally the base ChromaMemoryStore() without additional settings should use embeddings from the kernel.

Agree with documentation!
However, I didn't write the documentation because the semantic-kernel is a fully managed by MS, and I was worried that it wouldn't be adopted even if I wrote the official documentation. this is same for unit-tests

If you don't mind me writing it, and tell me what you want it to contain, I can write it :)

Copy link
Contributor

@awharrison-28 awharrison-28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Can you please add *.chroma/ to the .gitignore under # Persistant storage (which you could rename from # Persistant storage -> # Persistent storage while you're at it :)
  • See Code Comment - Found the Root Issue ---There's some incorrect logic in this PR that I'm investigating. I originally thought it might be the embeddings not being generated by the Kernel's registered AI service, but your code is consistent with ChromaDB docs for 'bringing your own embeddings." Here's a printout of the difference between memory stores (they should be the same). I will comment further if I find the issue.

With the VolatileMemoryStore:

======================
Query: I love Jupyter notebooks, how should I get started?

Result {++i}:
URL: : https://github.com/microsoft/semantic-kernel/blob/main/samples/notebooks/dotnet/00-getting-started.ipynb
Title : Jupyter notebook describing how to get started with the Semantic Kernel
Relevance: 0.8677540305183273

Result {++i}:
URL: : https://github.com/microsoft/semantic-kernel/blob/main/samples/notebooks/dotnet/02-running-prompts-from-file.ipynb
Title : Jupyter notebook describing how to pass prompts from a file to a semantic skill or function
Relevance: 0.8164925932277455

Result {++i}:
URL: : https://github.com/microsoft/semantic-kernel/blob/main/README.md
Title : README: Installation, getting started, and how to contribute
Relevance: 0.8086237941419581

With ChromaMemoryStore:

===========================
Query: I love Jupyter notebooks, how should I get started?

Result {++i}:
URL: : https://github.com/microsoft/semantic-kernel/blob/main/samples/notebooks/dotnet/00-getting-started.ipynb
Title : Jupyter notebook describing how to get started with the Semantic Kernel
Relevance: 0.2644919753074646

Result {++i}:
URL: : https://github.com/microsoft/semantic-kernel/blob/main/samples/notebooks/dotnet/02-running-prompts-from-file.ipynb
Title : Jupyter notebook describing how to pass prompts from a file to a semantic skill or function
Relevance: 0.3668723404407501

Result {++i}:
URL: : https://github.com/microsoft/semantic-kernel/blob/main/README.md
Title : README: Installation, getting started, and how to contribute
Relevance: 0.38269340991973877

Result {++i}:
URL: : https://github.com/microsoft/semantic-kernel/tree/main/samples/apps/chat-summary-webapp-react/README.md
Title : README: README associated with a sample starter react-based chat summary webapp
Relevance: 0.47094565629959106

Result {++i}:
URL: : https://github.com/microsoft/semantic-kernel/tree/main/samples/skills/ChatSkill/ChatGPT
Title : Sample demonstrating how to create a chat skill interfacing with ChatGPT
Relevance: 0.5258410573005676

@awharrison-28
Copy link
Contributor

@awharrison-28

At minimum, we should make sure to document the default behavior, but ideally the base ChromaMemoryStore() without additional settings should use embeddings from the kernel.

Agree with documentation! However, I didn't write the documentation because the semantic-kernel is a fully managed by MS, and I was worried that it wouldn't be adopted even if I wrote the official documentation. this is same for unit-tests

If you don't mind me writing it, and tell me what you want it to contain, I can write it :)

We are pro-documentation!

Can you add a description of how ChromaMemoryStore works at the top of ChromaMemoryStore.py. Mentioning things like the kernel generates the embeddings and the decision to custom calculate similarity instead of using ChromaDB nearest neighbor implementation are great things to add. @mkarle @alexchaomander would you agree that this is a good place to add implementation detail comments?

@alexchaomander
Copy link
Contributor

Yup +1 to putting directly in the comments the implementation details if we have to make certain choices on how to use Chroma with the SK.

@awharrison-28 awharrison-28 added PR: feedback to address Waiting for PR owner to address comments/questions PR: main labels Apr 20, 2023
alexchaomander
alexchaomander previously approved these changes Apr 22, 2023
Copy link
Contributor

@alexchaomander alexchaomander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple grammar changes but overall looks good to me!

python/semantic_kernel/memory/chroma_memory_store.py Outdated Show resolved Hide resolved
python/semantic_kernel/memory/chroma_memory_store.py Outdated Show resolved Hide resolved
@awharrison-28
Copy link
Contributor

@joowon-dm-snu heads up, Python SK memory is getting a refactor -> major simplification of classes. If this PR goes through first, I'll take point on updating Chroma memory store.

#684

@joowon-dm-snu
Copy link
Contributor Author

@joowon-dm-snu

Would you be willing to start the test_integration dependencies group in poetry?

okay, i think splitting dependencies should be merged first. I'll take a look when I get to work

@joowon-dm-snu heads up, Python SK memory is getting a refactor -> major simplification of classes. If this PR goes through first, I'll take point on updating Chroma memory store.

No problem!

@joowon-dm-snu
Copy link
Contributor Author

@awharrison-28 i do some searches to fix installation of chromadb for macos-latest, python3.11 but couldn't find good solution

changing semantic-kernel/.github/workflows/python-test.yml like below raise other clang errors

- name: Setup envs
  if: matrix.os == 'macos-latest' && matrix.python-version == '3.11'
  run: |
    export POETRY_HNSWLIB_NO_NATIVE=1
    python -m pip wheel --use-pep517 "hnswlib (==0.7.0)"
    cd python && poetry install --with test_integration

@shawncal shawncal assigned tawalke and unassigned shawncal Apr 27, 2023
@shawncal shawncal requested a review from tawalke April 27, 2023 18:23
@tawalke
Copy link
Contributor

tawalke commented May 1, 2023

Currently reviewing to ensure in line with both SK versions.

@joowon-dm-snu
Copy link
Contributor Author

@tawalke

Currently reviewing to ensure in line with both SK versions.

this PR might need change as memory store changed by #684.
If your team wants to me to done with this PR, i can take it.
But installation of chromadb for macos-latest, python3.11 should be solved first to pass Python tests / build (3.11, macos-latest) (pull_request)

@tawalke
Copy link
Contributor

tawalke commented May 1, 2023

@tawalke

Currently reviewing to ensure in line with both SK versions.

this PR might need change as memory store changed by #684. If your team wants to me to done with this PR, i can take it. But installation of chromadb for macos-latest, python3.11 should be solved first to pass Python tests / build (3.11, macos-latest) (pull_request)

@awharrison-28 Per your last comment re; your update of chroma memory based upon memory_base update.; should the current review be paused?

@awharrison-28
Copy link
Contributor

@joowon-dm-snu would you be open to me updating this PR directly to update it to be consistent with #684.

I'll also have a PR out shortly to solidify how we handle conditional dependencies.

@joowon-dm-snu
Copy link
Contributor Author

@joowon-dm-snu would you be open to me updating this PR directly to update it to be consistent with #684.

I'll also have a PR out shortly to solidify how we handle conditional dependencies.

Okay I get it, I'll handle it this week :)

@github-actions github-actions bot removed the python Pull requests for the Python Semantic Kernel label May 11, 2023
@awharrison-28 awharrison-28 reopened this May 11, 2023
@github-actions github-actions bot added the python Pull requests for the Python Semantic Kernel label May 11, 2023
@awharrison-28
Copy link
Contributor

reopening and updating

@joowon-dm-snu
Copy link
Contributor Author

@awharrison-28 ah my forked branch was so twisted so I rebranched it.

dluc added a commit that referenced this pull request May 17, 2023
Add support for Chroma https://docs.trychroma.com/

> Chroma is the open-source embedding database. Chroma makes it easy to
build LLM apps by making knowledge, facts, and skills pluggable for
LLMs.

### Motivation and Context

* #403 Support for Chroma embedding database
* #426 Python: feat: add chroma memory store
* #449 Python: feat: add chroma memory store

---------

Co-authored-by: Abby Harrison <[email protected]>
Co-authored-by: Abby Harrison <[email protected]>
Co-authored-by: Devis Lucato <[email protected]>
shawncal pushed a commit to johnoliver/semantic-kernel that referenced this pull request May 19, 2023
Add support for Chroma https://docs.trychroma.com/

> Chroma is the open-source embedding database. Chroma makes it easy to
build LLM apps by making knowledge, facts, and skills pluggable for
LLMs.

### Motivation and Context

* microsoft#403 Support for Chroma embedding database
* microsoft#426 Python: feat: add chroma memory store
* microsoft#449 Python: feat: add chroma memory store

---------

Co-authored-by: Abby Harrison <[email protected]>
Co-authored-by: Abby Harrison <[email protected]>
Co-authored-by: Devis Lucato <[email protected]>
golden-aries pushed a commit to golden-aries/semantic-kernel that referenced this pull request Oct 24, 2023
### Motivation and Context

<!-- Thank you for your contribution to the chat-copilot repo!
Please help reviewers and future users, providing the following
information:
  1. Why is this change required?
  2. What problem does it solve?
  3. What scenario does it contribute to?
  4. If it fixes an open issue, please link to the issue here.
-->
Incorrect artifact path in the plugin deployment workflow is causing the
deployment to fail.

### Description

<!-- Describe your changes, the overall approach, the underlying design.
These notes will help understanding how your code works. Thanks! -->
Fix the path.

### Contribution Checklist

<!-- Before submitting this PR, please make sure: -->

- [ ] The code builds clean without any errors or warnings
- [ ] The PR follows the [Contribution
Guidelines](https://github.com/microsoft/chat-copilot/blob/main/CONTRIBUTING.md)
and the [pre-submission formatting
script](https://github.com/microsoft/chat-copilot/blob/main/CONTRIBUTING.md#development-scripts)
raises no violations
- [ ] All unit tests pass, and I have added new tests where possible
- [ ] I didn't break anyone 😄
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PR: feedback to address Waiting for PR owner to address comments/questions python Pull requests for the Python Semantic Kernel
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants