Skip to content

Commit

Permalink
update readme, docstrings (#14)
Browse files Browse the repository at this point in the history
* get by id update and docstrings

* update README.md

* update README

* update README.md
  • Loading branch information
ekorman authored Nov 9, 2024
1 parent 12f0655 commit 0df7415
Show file tree
Hide file tree
Showing 4 changed files with 208 additions and 7 deletions.
94 changes: 94 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,97 @@
# affine

![badge](https://img.shields.io/endpoint?url=https://gist.githubusercontent.com/ekorman/7fbb57e6d6a2c8b69617ddf141043b98/raw/affine-coverage.json)

Affine is a Python library for providing a uniform and structured interface to various backing vector databases and approximate nearest neighbor libraries. It allows simple dataclass-like objects to describe collections together with a high-level query syntax for doing filtered vector search.

For vector databases, it currently supports:

- qdrant
- weaviate
- pinecone

For local mode, the following approximate nearest neighbor libraries are supported:

- FAISS
- annoy
- pynndescent
- scikit-learn KDTree
- naive/NumPy

Note: this project is very similar to [vectordb-orm](https://github.com/piercefreeman/vectordb-orm), which looks to be no longer maintained.

## Installation

```bash
pip install affine
# or `pip install affine[qdrant]` for qdrant support
# `pip install affine[weaviate]` for weaviate support
# `pip install affine[pinecone]` for pinecone support
```

## Basic Usage

```python
from affine import Collection, Vector, Filter, Query

# Define a collection
class MyCollection(Collection):
vec: Vector[3] # declare a 3-dimensional vector

# support for additional fields for filtering
a: int
b: str

db = LocalEngine()

# Insert vectors
db.insert(MyCollection(vec=[0.1, 0.0, -0.5], a=1, b="foo"))
db.insert(MyCollection(vec=[1.3, 2.1, 3.6], a=2, b="bar"))
db.insert(MyCollection(vec=[-0.1, 0.2, 0.3], a=3, b="foo"))

# Query vectors
result: list[MyCollection] = (
db.query(MyCollection)
.filter(MyCollection.b == "foo")
.similarity([2.8, 1.8, -4.5])
.limit(1)
)
```

## Engines

A fundamental notion of _affine_ are `Engine` classes. All such classes conform to the same API for interchangeabillity (with the exception of a few engine-specific restrictions which are be mentioned below). There are two broad types of engines

1. `LocalEngine`: this does nearest neighbor search on the executing machine, and supports a variety of libraries for the backing nearest neighborsearch (these are called the _backend_ of the local engine).

2. Vector database engines: these are engines that connect to a vector database service, such as QDrant, Weaviate, or Pinecone.

### Vector Databases

The currently supported vector databases are:

| Database | Class | Constructor arguments | Notes |
| -------- | ------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------- |
| Qdrant | `affine.engine.QdrantEngine` | `host: str` hostname to use<br><br>`port: int` port to use | - |
| Weaviate | `affine.engine.WeaviateEngine` | `host: str` hostname to use<br><br>`port: int` port to use | - |
| Pinecone | `affine.engine.PineconeEngine` | `api_key: Union[str, None]` pinecone API key. if not provided, it will be read from the environment variable PINECONE_API_KEY.<br><br>`spec: Union[ServerlessSpec, PodSpec, None]` the PodSpec or ServerlessSpec object. If not provided, a`ServerlessSpec` will be created from the environment variables PINECONE_CLOUD and PINECONE_REGION. | the Pinecone engine has the restriction that every collection must contain exactly one vector attribute. |

### Approximate Nearest Neighbor Libraries

The `LocalEngine` class provides an interface for doing nearest neighbor search on the executing machine, supporting a variety of libraries for the backing nearest neighborsearch. Which one is specified by the `backend` argument to the constructor. For example, to use `annoy`:

```python
from affine.engine.local import LocalEngine, AnnoyBackend

db = LocalEngine(backend=AnnoyBackend(n_tress=10))
```

The options and settings for the various supported backends are as follows:

| Library | Class | Constructor arguments | Notes |
| ------------------- | ---------------------------------------- | ------------------------------------------------------------------------ | ----- |
| naive/numpy | `affine.engine.local.NumPyBackend` | - | - |
| scikit-learn KDTree | `affine.engine.local.KDTreeBackend` | keyword arguments that get passed directly to `sklearn.neighbors.KDTree` | - |
| annoy | `affine.engine.local.AnnoyBackend` | `n_trees: int` number of trees to use<br>`n_jobs: int` defaults to -1 | - |
| FAISS | `affine.engine.local.FAISSBackend` | `index_factory_str: str` | - |
| PyNNDescent | `affine.engine.local.PyNNDescentBackend` | keyword arguments that get passed directly to `pynndescent.NNDescent` | - |
84 changes: 81 additions & 3 deletions affine/engine/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,10 +22,36 @@ def _query(
def query(
self, collection_class: Type[Collection], with_vectors: bool = False
) -> QueryObject:
"""
Parameters
----------
collection_class
the collection class to query
with_vectors
wether or not the returned objects should have their vector attributes populated
(or otherwise be set to `None`)
Returns
-------
QueryObject
the resulting QueryObject
"""
return QueryObject(self, collection_class, with_vectors=with_vectors)

@abstractmethod
def insert(self, record: Collection) -> int | str:
"""Insert a record
Parameters
----------
record
the record to insert
Returns
-------
int | str
the resulting id of the inserted record
"""
pass

@abstractmethod
Expand All @@ -35,10 +61,22 @@ def _delete_by_id(self, collection: Type[Collection], id: str) -> None:
def delete(
self,
*,
record: Collection | str | None = None,
record: Collection | None = None,
collection: Type[Collection] | None = None,
id: str | None = None,
) -> None:
"""Delete a record from the database. The record can either be specified
by its `Collection` object or by its id.
Parameters
----------
record
the record to delete
collection
the collection the record belongs to (needed if and and only deleting a record by its id)
id
the id of the record
"""
if bool(record is None) == bool(collection is None and id is None):
raise ValueError(
"Either record or collection and id must be provided"
Expand All @@ -58,15 +96,55 @@ def delete(

@abstractmethod
def get_elements_by_ids(
self, collection: type, ids: list[int]
self, collection: type, ids: list[int | str]
) -> list[Collection]:
"""Get elements by ids
Parameters
----------
ids
list of ids
Returns
-------
list[collection]
the resulting collection objects
"""
pass

@abstractmethod
def register_collection(self, collection_class: Type[Collection]) -> None:
"""Register a collection to the database
Parameters
----------
collection_class
the class of the collection to register. This class must inherit from `Collection`.
"""
pass

def get_element_by_id(self, collection: type, id_: int) -> Collection:
def get_element_by_id(
self, collection: type, id_: int | str
) -> Collection:
"""Get an element by its id
Parameters
----------
collection
the collection class the record belongs to
id_
the id of the record
Returns
-------
collection
the corresponding collection object for the record.
Raises
------
ValueError
if no record is found with the specified id.
"""
ret = self.get_elements_by_ids(collection, [id_])
if len(ret) == 0:
raise ValueError(f"No record found with id {id_}")
Expand Down
35 changes: 32 additions & 3 deletions affine/query.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,18 @@ def __init__(
self._similarity = None

def filter(self, filter_set: FilterSet | Filter) -> "QueryObject":
"""Filter the result of a query by specified filters
Parameters
----------
filter_set
the `FilterSet` or `Filter` object to use
Returns
-------
QueryObject
resulting `QueryObject`
"""
if isinstance(filter_set, Filter):
filter_set = FilterSet(
filters=[filter_set], collection=filter_set.collection
Expand All @@ -31,9 +43,28 @@ def filter(self, filter_set: FilterSet | Filter) -> "QueryObject":
return self

def all(self) -> list[Collection]:
"""Get all results of a query
Returns
-------
list[Collection]
all of the matching records for the query
"""
return self.db._query(self._filter_set, with_vectors=self.with_vectors)

def limit(self, n: int) -> list[Collection]:
"""Returns a fixed number of results of a query.
Parameters
----------
n
how many records to retrieve. in the case of a similarity search query
this will be the `n`-closest neighbors
Returns
-------
list[Collection]
"""
return self.db._query(
self._filter_set,
with_vectors=self.with_vectors,
Expand All @@ -42,8 +73,6 @@ def limit(self, n: int) -> list[Collection]:
)

def similarity(self, similarity: Similarity) -> "QueryObject":
"""Apply a similarity search to the query"""
self._similarity = similarity
return self

def get_by_id(self, id_) -> Collection:
return self.db.get_element_by_id(self.collection_class, id_)
2 changes: 1 addition & 1 deletion tests/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,7 @@ def _test_engine(db: Engine):
assert q9[0].name == "Apple"

# check we can query by id
assert db.query(Product).get_by_id(q9[0].id).name == "Apple"
assert db.get_element_by_id(Product, q9[0].id).name == "Apple"

# check we can delete
db.delete(record=q9[0])
Expand Down

0 comments on commit 0df7415

Please sign in to comment.