Initial API POC #1

piercefreeman · 2023-04-17T18:35:40Z

Very rough initial sketch of the ORM syntax. This PR is aiming to:

Establish initial model specification
Establish initial query builder
Handle boolean and vector similarity filters

class MyObject(MilvusBase, milvus_client=Milvus()):
    __embedding_dim__ = 128
    __collection_dim__ = "my_collection"

    text: str
    embedding: Embedding
    id: int

    def __init__(self, text: str, embedding: Embedding):
        super().__init__()
        self.text = text
        self.embedding = embedding
        self.id = None

Insertion instructions:

obj1 = MyObject('foo', Embedding([1.0] * 128))
obj1.insert(milvus_client)

Search instructions:

results = session.query(MyObject).filter(MyObject.text == 'bar').order_by_similarity(MyObject.embedding, Embedding([1.0]*128)).limit(2).all()
assert len(results) == 1

hweller1

good stuff! tests make usage pretty clear, and i like some of the extensions you've added to the Milvus ORM.

think it would be nice to add a readme to articulate a bit what this does that the pymilvus ORM (or other future vector db ORMs for that matter) do not to make usage a bit more obvious. I also think a Jupyter notebook with some examples would make this thing ✨

hweller1 · 2023-04-18T18:43:07Z

vectordb_orm/query.py

+            self._filters.append(f.to_expression())
+        return self
+
+    def offset(self, offset: int):


recommend using the setter decorator for these: https://www.geeksforgeeks.org/getter-and-setter-in-python/

@hweller1 These are intended to be used as chaining functions query().filter(XYZ).offset(2) - are you envisioning a use case where people are directly using getters and setters as part of this chain?

oh i think the usage shouldn't have to change, was just saying that there is a standard way of defining attribute setting. more syntactic than functional

vectordb_orm/tests/conftest.py

vectordb_orm/similarity.py

vectordb_orm/results.py

hweller1 · 2023-04-18T18:55:00Z

vectordb_orm/indexes.py

+class IVF_FLAT(IndexBase):
+    """
+    - High-speed query
+    - Requires a recall rate as high as possible


would augment this with some quick benchmarks. Very simple search task where IVF_FLAT is a better choice than say HNSW or ANNOY

hweller1 · 2023-04-18T19:18:55Z

vectordb_orm/base.py

+                )
+            )
+
+        schema = CollectionSchema(fields=fields, description=f"{cls.__name__} vectordb-generated collection")


it could be good to expose the __repr__ of the Collection or CollectionSchema somewhere- the user can also pull it by calling str(collection) or str(collection._schema)

Here and here are the definitions for them, they seems useful

vectordb_orm/base.py

hweller1 · 2023-04-18T19:25:31Z

vectordb_orm/base.py

+
+    def _dict_representation(self):
+        type_converters = {
+            np.ndarray: DataType.FLOAT_VECTOR,


took me a bit of digging but it's probably worth highlighting that this is the one type conversion you have that Milvus does not support out of the box, and is a major improvement on it's ORM! #numpy is all you need

hweller1 · 2023-04-18T19:32:14Z

vectordb_orm/query.py

+    def _result_to_objects(self, search_result: ChunkedQueryResult | list[dict[str, Any]]):
+        query_results : list[QueryResult] = []
+
+        if isinstance(search_result, ChunkedQueryResult):


when does Milvus return this vs an ordinary QueryResult ?

vectordb_orm/query.py

piercefreeman added 6 commits April 17, 2023 11:12

Project skeleton, working hard-coded example

0824e8f

Refactor to logcal ORM files

345fdb3

Refactor to generic base class + sniff typehints

193cca4

Add generic constructor

8f173ee

Add object deletion in ORM

eb3b753

Add README

39ea9b3

piercefreeman merged commit 75f0e37 into main Apr 18, 2023

piercefreeman deleted the feature/poc branch April 18, 2023 18:52

hweller1 reviewed Apr 18, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial API POC #1

Initial API POC #1

piercefreeman commented Apr 17, 2023 •

edited

Loading

hweller1 left a comment

hweller1 Apr 18, 2023

piercefreeman Apr 21, 2023

hweller1 Apr 21, 2023

hweller1 Apr 18, 2023

hweller1 Apr 18, 2023

hweller1 Apr 18, 2023

hweller1 Apr 18, 2023

Initial API POC #1

Initial API POC #1

Conversation

piercefreeman commented Apr 17, 2023 • edited Loading

hweller1 left a comment

Choose a reason for hiding this comment

hweller1 Apr 18, 2023

Choose a reason for hiding this comment

piercefreeman Apr 21, 2023

Choose a reason for hiding this comment

hweller1 Apr 21, 2023

Choose a reason for hiding this comment

hweller1 Apr 18, 2023

Choose a reason for hiding this comment

hweller1 Apr 18, 2023

Choose a reason for hiding this comment

hweller1 Apr 18, 2023

Choose a reason for hiding this comment

hweller1 Apr 18, 2023

Choose a reason for hiding this comment

piercefreeman commented Apr 17, 2023 •

edited

Loading