Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support multiple indexes and binary embeddings #5

Merged
merged 7 commits into from
Apr 20, 2023

Conversation

piercefreeman
Copy link
Owner

@piercefreeman piercefreeman commented Apr 18, 2023

Introduce support for binary embeddings and alternative indexing strategies.

Binary Vectors

Specifically users can now use python typehints that specify more specific np.ndarray types. When a boolean is provided for the expected values, we'll assume the user wants to create a binary vector and use the DataType.BINARY_VECTOR accordingly.

embedding: np.ndarray[np.bool_] = EmbeddingField(
    dim=128,
    index=FLAT()
)

Indexing Strategies

Mirror the index algorithms that are supported by Milvus. Keyword names for these functions differ from the kwargs that are used internally by Milvus, in an attempt to be more semantically interpretable about what the different arguments are doing and play better with additional schema backends.

Misc Changes

  • Allow for schema-level definition of the consistency that is used on query search. If unspecified, we will use Milvus' default behavior.
  • Validate the embedding type (binary vs. floating point) is compatible with the index type (IVF_FLAT, BIN_IVF_FLAT, etc).

Allow typehinting of numpy types via the new typehinting decorators,
like np.ndarray[np.bool_]. This communicates the expected type as binary
/ regular embeddings. We sniff for this attribute at runtime to switch
the column type that is injected into the database.
Milvus supports different levels of consistency depending on the demands
of the business case. The default setting is Bounded staleness, which
provides relatively good syncronization between replicas and much faster
inference times. This is undesirable for unit tests, however, because it
can result in stochastic behavior for some tests versus others.

Add a new customization parameter to the model definitions that allow
for specifying a different type of consistency for the schema.
@piercefreeman
Copy link
Owner Author

Stochastic failing test for binary similar recall, tracking here: milvus-io/pymilvus#1379

@piercefreeman piercefreeman merged commit 35ecdd0 into main Apr 20, 2023
@piercefreeman piercefreeman deleted the feature/support-multiple-indexes branch April 20, 2023 22:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant