Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial API POC #1

Merged
merged 6 commits into from
Apr 18, 2023
Merged

Initial API POC #1

merged 6 commits into from
Apr 18, 2023

Conversation

piercefreeman
Copy link
Owner

@piercefreeman piercefreeman commented Apr 17, 2023

Very rough initial sketch of the ORM syntax. This PR is aiming to:

  • Establish initial model specification
  • Establish initial query builder
  • Handle boolean and vector similarity filters
class MyObject(MilvusBase, milvus_client=Milvus()):
    __embedding_dim__ = 128
    __collection_dim__ = "my_collection"

    text: str
    embedding: Embedding
    id: int

    def __init__(self, text: str, embedding: Embedding):
        super().__init__()
        self.text = text
        self.embedding = embedding
        self.id = None

Insertion instructions:

obj1 = MyObject('foo', Embedding([1.0] * 128))
obj1.insert(milvus_client)

Search instructions:

results = session.query(MyObject).filter(MyObject.text == 'bar').order_by_similarity(MyObject.embedding, Embedding([1.0]*128)).limit(2).all()
assert len(results) == 1

@piercefreeman piercefreeman merged commit 75f0e37 into main Apr 18, 2023
@piercefreeman piercefreeman deleted the feature/poc branch April 18, 2023 18:52
Copy link

@hweller1 hweller1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good stuff! tests make usage pretty clear, and i like some of the extensions you've added to the Milvus ORM.

think it would be nice to add a readme to articulate a bit what this does that the pymilvus ORM (or other future vector db ORMs for that matter) do not to make usage a bit more obvious. I also think a Jupyter notebook with some examples would make this thing ✨

self._filters.append(f.to_expression())
return self

def offset(self, offset: int):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

recommend using the setter decorator for these: https://www.geeksforgeeks.org/getter-and-setter-in-python/

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hweller1 These are intended to be used as chaining functions query().filter(XYZ).offset(2) - are you envisioning a use case where people are directly using getters and setters as part of this chain?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh i think the usage shouldn't have to change, was just saying that there is a standard way of defining attribute setting. more syntactic than functional

vectordb_orm/tests/conftest.py Show resolved Hide resolved
vectordb_orm/similarity.py Show resolved Hide resolved
vectordb_orm/results.py Show resolved Hide resolved
class IVF_FLAT(IndexBase):
"""
- High-speed query
- Requires a recall rate as high as possible

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would augment this with some quick benchmarks. Very simple search task where IVF_FLAT is a better choice than say HNSW or ANNOY

)
)

schema = CollectionSchema(fields=fields, description=f"{cls.__name__} vectordb-generated collection")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it could be good to expose the __repr__ of the Collection or CollectionSchema somewhere- the user can also pull it by calling str(collection) or str(collection._schema)

Here and here are the definitions for them, they seems useful

vectordb_orm/base.py Show resolved Hide resolved

def _dict_representation(self):
type_converters = {
np.ndarray: DataType.FLOAT_VECTOR,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

took me a bit of digging but it's probably worth highlighting that this is the one type conversion you have that Milvus does not support out of the box, and is a major improvement on it's ORM! #numpy is all you need

def _result_to_objects(self, search_result: ChunkedQueryResult | list[dict[str, Any]]):
query_results : list[QueryResult] = []

if isinstance(search_result, ChunkedQueryResult):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when does Milvus return this vs an ordinary QueryResult ?

vectordb_orm/query.py Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants