Qdrant: Support Sparse Vectors #549

lambda-science · 2024-03-06T13:03:38Z

Is your feature request related to a problem? Please describe.
Qdrant v1.7.0 introduced sparse verctors (with SPLADE) and hybrid retrieval.
Could be cool to implement. https://qdrant.tech/articles/sparse-vectors/

Describe the solution you'd like
Allow to create collection with optional sparse vector and add a retrieve for hybrid search (and a SPLADE only ?)
Current:

    def _recreate_collection(self, collection_name: str, distance, embedding_dim: int):
        self.client.recreate_collection(
            collection_name=collection_name,
            vectors_config=rest.VectorParams(
                size=embedding_dim,
                distance=distance,
            ),
            shard_number=self.shard_number,
            replication_factor=self.replication_factor,
            write_consistency_factor=self.write_consistency_factor,
            on_disk_payload=self.on_disk_payload,
            hnsw_config=self.hnsw_config,
            optimizers_config=self.optimizers_config,
            wal_config=self.wal_config,
            quantization_config=self.quantization_config,
            init_from=self.init_from,
        )

could become as in the example article above:

    def _recreate_collection(self, collection_name: str, distance, embedding_dim: int):
        self.client.recreate_collection(
            collection_name=collection_name,
            vectors_config={
        "text-dense": rest.VectorParams(
            size=embedding_dim,
            distance=distance,
        )
    },
    sparse_vectors_config={
        "text-sparse": rest.SparseVectorParams(
            index=models.SparseIndexParams(
                on_disk=False,
            )
        )
    },
            shard_number=self.shard_number,
            replication_factor=self.replication_factor,
            write_consistency_factor=self.write_consistency_factor,
            on_disk_payload=self.on_disk_payload,
            hnsw_config=self.hnsw_config,
            optimizers_config=self.optimizers_config,
            wal_config=self.wal_config,
            quantization_config=self.quantization_config,
            init_from=self.init_from,
        )

However this requiere a number of component:

a SPLADE query embedder to embed the user question at query time => ISSUES if integrated inside this integration package at it probably requiere some big machine-learning libs ?
a SPLADE encoder to write document sparse vectors during indexation => ISSUES if integrated inside this integration package at it probably requiere some big machine-learning libs ?
a new hybrid retriever that can do a hybrid search such as:

client.search_batch(
    collection_name=collection_name,
    requests=[
        rest.SearchRequest(
            vector=rest.NamedVector(
                name="text-dense",
                vector=query_embedding,
            ),
            limit=top_k,
        ),
        rest.SearchRequest(
            vector=rest.NamedSparseVector(
                name="text-sparse",
                vector=rest.SparseVector(
                    indices=query_indices,
                    values=query_values,
                ),
            ),
            limit=top_k,
        ),
    ],
)

query_embedding results of classic query embedder
query_indices and query_values results of new SPLADE encoder

EDIT: Also Qdrant 1.8 is out 👀 https://qdrant.tech/articles/qdrant-1.8.x/ But I don't think it breaks anything with current implementation :)

The text was updated successfully, but these errors were encountered:

lambda-science · 2024-03-10T18:29:39Z

@Anush008 maybe you could be interested by this.
I think I could suggest a PR in the upcoming days / week I haven't started looking into the implementation yet :)

Anush008 · 2024-03-10T18:34:31Z

We could have these if the sparse vector generation can be abstracted away by other Haystack embedding integrations.

Since Qdrant's implementation will have to stay agnostic to the vectors.

lambda-science · 2024-03-10T18:59:11Z

We could have these if the sparse vector generation can be abstracted away by other Haystack embedding integrations.

Since Qdrant's implementation will have to stay agnostic to the vectors.

Got it !

It would still need a small modification to be able to input sparse vectors to the run() and setup the collection with sparse vector and do sparse query
Like:

NEW query_by_sparse() in DocumentStore
Modify _recreate_collection() in DocumentStore to add Sparse Vector in vectors_config
NEW QdrantSparseRetriever in Retrivers that just calls the self._document_store.query_by_sparse()
But yeah maybe I could work on a general component (outside of Qdrant) that can perform sparse embedding and give it to Qdrant object :)

And Un-related to this implementation (Qdrant agnostic):

A general SPLADE embedding component. Avaliable here: feat(FastEmbed): Support for SPLADE Sparse Embedder #579
OR
A general BM25 embedding component

anakin87 · 2024-04-12T15:55:19Z

Let's close this issue.

In case we are interested in introducing a hybrid Retriever in the future, I suggest we open another one and discuss it there.

lambda-science added the feature request Ideas to improve an integration label Mar 6, 2024

lambda-science changed the title ~~Support Qdrant Sparse Vectors and Hybrid Retrival~~ Qdrant: Support Sparse Vectors and Hybrid Retrival Mar 6, 2024

masci added the integration:qdrant label Mar 10, 2024

This was referenced Mar 13, 2024

feat(Qdrant): start to work on sparse vector integration #578

Merged

feat(FastEmbed): Support for SPLADE Sparse Embedder #579

Merged

masci assigned anakin87 Mar 22, 2024

anakin87 changed the title ~~Qdrant: Support Sparse Vectors and Hybrid Retrival~~ Qdrant: Support Sparse Vectors Apr 12, 2024

anakin87 closed this as completed Apr 12, 2024

anakin87 mentioned this issue Apr 16, 2024

Qdrant: hybrid retrieval #664

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qdrant: Support Sparse Vectors #549

Qdrant: Support Sparse Vectors #549

lambda-science commented Mar 6, 2024 •

edited

Loading

lambda-science commented Mar 10, 2024

Anush008 commented Mar 10, 2024

lambda-science commented Mar 10, 2024 •

edited

Loading

anakin87 commented Apr 12, 2024

Qdrant: Support Sparse Vectors #549

Qdrant: Support Sparse Vectors #549

Comments

lambda-science commented Mar 6, 2024 • edited Loading

lambda-science commented Mar 10, 2024

Anush008 commented Mar 10, 2024

lambda-science commented Mar 10, 2024 • edited Loading

anakin87 commented Apr 12, 2024

lambda-science commented Mar 6, 2024 •

edited

Loading

lambda-science commented Mar 10, 2024 •

edited

Loading