Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CrateDB vector: Add CrateDBVectorSearchMultiCollection #15

Merged
merged 3 commits into from
Nov 21, 2023

Conversation

amotl
Copy link

@amotl amotl commented Nov 21, 2023

About

It is a special adapter which provides similarity search across multiple collections. It can not be used for indexing documents.

Synopsis

from langchain.vectorstores.cratedb import CrateDBVectorSearchMultiCollection

multisearch = CrateDBVectorSearchMultiCollection(
    collection_names=["test_collection_1", "test_collection_2"],
    embedding_function=embeddings,
    connection_string=CONNECTION_STRING,
)
docs_with_score = multisearch.similarity_search_with_score(query)

References

This patch has been conceived based on a feature request by @thunderbug1. Thanks!

@amotl amotl requested review from hammerhead, matriv, andnig and mkleen and removed request for hammerhead November 21, 2023 12:15
@amotl amotl force-pushed the multicollection-search branch 2 times, most recently from 51ec5a2 to e99f9f4 Compare November 21, 2023 14:07
Copy link

@matriv matriv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments an questions as I don't understand how the tests have changed.

Copy link

@matriv matriv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for your responses!
Left a minor suggestion, but LGTM.

@@ -367,11 +364,12 @@ def test_cratedb_collection_with_metadata() -> None:


def test_cratedb_collection_no_embedding_dimension() -> None:
"""Test end to end collection construction"""
"""
Verify that accessing a collection fails when addressed without dimensionality.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Verify that accessing a collection fails when addressed without dimensionality.
Verify that accessing a collection fails when addressed without specifying dimensios.

Copy link
Author

@amotl amotl Nov 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Addressed by amending 153d178280c7:

Verify that addressing collections fails when not specifying dimensions.

@amotl amotl force-pushed the multicollection-search branch 2 times, most recently from 153d178 to 84161f7 Compare November 21, 2023 16:46
It is a special adapter which provides similarity search across multiple
collections. It can not be used for indexing documents.
The CrateDB adapter works a bit different compared to the pgvector
adapter it is building upon: Because the dimensionality of the vector
field needs to be specified at table creation time, but because it is
also a runtime parameter in LangChain, the table creation needs to be
delayed.

In some cases, the tables do not exist yet, but this is only relevant
for the case when the user requests to pre-delete the collection, using
the `pre_delete_collection` argument. So, do the error handling only
there instead, and _not_ on the generic data model utility functions.
@amotl amotl changed the base branch from fix-cascaded-delete to cratedb November 21, 2023 17:37
@amotl amotl merged commit ef485de into cratedb Nov 21, 2023
@amotl amotl deleted the multicollection-search branch November 21, 2023 18:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants