-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Supporting sparse semantic retrieval based on neural models #230
Comments
This is really interesting! @model-collapse Could you include the proposed interfaces you are going to add and what they would look like? |
Additionally, I believe Lucene has some features to support this case. See apache/lucene#11799. |
Hi @model-collapse,
|
@navneet1v
|
Hi @navneet1v , for question 4, sparse models have advantages over both KNN and BM25:
|
@model-collapse I cannot see where we are comparing the latency of BM-25 with the sparse retrieval? For #4 , we compared on the accuracy but I will be really interested in latency. |
@zhichao-aws, I understand the memory footprint between dense and sparse will be high. Also, my question was never about comparing dense and sparse vectors. From memory and latency yes sparse vector will work well. I want to compare Sparse vectors with OpenSearch Text Search. The three parameters, I would at-least like to see the comparison is:
|
@navneet1v , since the implementation details have large impact on Latency/Memory metrics, it's hard to give a concrete number in RFC state within the framework of opensearch. However, the search latency with other engines supporting inverted index is also a strong indicator to illustrate their efficiency: |
One clarification question, Do we need support of sparse vector data type in k-NN plugin similar to knn_vector(dense vector) to support sparse vector indexing/search? |
No, we can use lucene engine to index and search sparse vectors. We will implement this feature in neural-search and ml-commons. |
@zhichao-aws sorry could you explain which data type you would use for indexing sparse vectors? |
Hi @vamshin , for sparse vectors we need to build a machanism to index and search based on term-weight map. Our initial proposal was to use normal opensearch text field (i.e. After more research, we find that the lucene FeatureField is more straightforward and extensible. If we choose FeatureField, we'll introduce a new field like "sparse_vector" by implementing a new wrapper FieldMapper to transform the input term-weight map to lucene FeatureField (the wrapper FieldMapper can be put at neural-search plugin or somewhere else, we can discuss about that). For searching, we'll build a new query clause. For both design we have finished the POC code and proved they're workable. We'll run some benchmarking test to examine their execution efficiency. |
Hi @model-collapse , @zhichao-aws
|
Hi @navneet1v , these are really good questions. We did many investigations about different routines and debated much about the user interface. We will create a seperate issue to list our proposal and all alternatives we've considered. |
Hi all, Please create a doc issue or PR ASAP if this has doc implications for 2.11. Thanks. |
Great feature, I look forward to it and I hope it is going to be generally available on the upcoming 2.11 release in October. |
Are the out-of-box-models be fine-tuneable? Are you going to publish the fine-tuning process for them? |
Since we'll release the weights and structure of our model, users can fine-tune them using Pytorch or other frameworks with their own implementation. Sadly the fine-tuning process is out of scope for this release. If you believe it is important for this feature, let's create a feature request issue and call for comments. This will help us make the decision of future work. |
Thanks for the clarification. What I meant is just giving a fine-tuning example in the documentation or as a blog post, just like the example the team has posted for fine-tuning embedding models. |
Does it work with a KNN query as well? My team uses a custom inference server for all our ML models. |
Hi @yudhiesh , KNN query is used for dense vectors and neural sparse needs sparse vectors, like If your purpose is using custom inference server to generate sparse vector, then use raw sparse vector for query, then the answer is yes. Query by raw sparse vectors will be supported in 2.14 (if everything goes will, 2.14 will release in several few days). We have put our neural sparse model on huggingface (model link) and we have a demo deployment script to deploy neural sparse model in SageMaker. It can be a reference here |
Great thanks for the quick response! |
[RFC] Supporting sparse semantic retrieval based on neural models
Background
Dense retrieval based on neural models has achieved great success in search relevance tasks. However, dense methods use k-NN to search the most relevant docs, which consumes large amount of memory and CPU resource. It is very expensive.
Recent years, there are a lot of research about sparse retrieval based on neural models, such as DeepCT[1], SparTerm[2], SPLADE[3,4]. Since sparse retrieval fashion can be naturally implemented using inverted index, these methods are as efficient as BM25. After fine-tuning, neural sparse retrieval can achieve high search relevance on par with dense methods. Neural sparse methods also show great generalization ability. SPLADE defeats all dense methods in BEIR benchmark with the same setting. Thus we propose to implement support for sparse retrieval based on neural models.
Example Comparison 1: Dense and Sparse retrieval models on MS-MARCO dataset.
*: All the experiments are conducted on a single OpenSearch node with 8 * 2.4G CPU cores and 32GB RAM.
^: SPLADE-max conducts a BERT model inference for query encoding, thus have similar latency to dense methods.
Example Comparison 2: Splade vs. Others on BEIR benchmarking dataset.
All the above performance are extracted from the SpladeV2 paper.
Example Comparison 3: Splade vs. openai embedding on BEIR benchmarking subset.
Above table is extracted from the paper of openai. The experiments are conducted on a subset of BEIR benchmark.
What are we going to do?
Design
We are going to implement one IngestionProcessor for document sparse encoding and one QueryBuilder for sparse querying. Before ingestion or querying, we will have the sparse encoding model deploy via the ml-commons plugin and the ingestion processor will consequently invoke the prediction action for the encoding result. If query encoding is enabled, the query builder will also encode the queries via prediction actions while passing the query through a bert tokenizer when query encoding is disabled. Because the encoding result is in the form of sparse vector, it is very natural to adopt the term vector based Lucene index. Here is the architecture diagram.
Term Weight integration with Lucene
Suggested by the SPLADE paper, the relevance score is calculated following$r = \Sigma_{t \in T}w_t$ where $w_t$ is the weight of sparse term $t$ and $T$ is the intersection term set of query and document. Standard Lucene indices will only store TF(term frequency) and DF(document frequency), we will implement an analyzer that interprets term weights and stores into the payload attribute. Since the above formula is not a standard Lucene relevance scoring function, we will involve a PayloadScorer with a sum operator in the query.
The out-of-box Model
The schema of the sparse encoding model will be similar to SparTerm or SPLADE, where the input is a natural language sentence while the output is a sparse vector(in SPLADE, the sparse terms are BERT tokens). We will mainly focus on cross-domain optimization for a better relevance over different scenarios. The models are planned to be released in huggingface.co and have Appache 2.0 licence.
API
Ingestion Processor
The ingestion processor can be created with following API, where the field
field_map
can specify the fields need to be encoded and the new field names after encoding.Sparse Search Query
Similar to vector search based neural search, sparse retrieval is also bind with a query type called
neural_sparse
. One can search the fieldbody_sparse
(sparse encoded fields only) via the API below.The fields
model_id
andtokenizer
are optional, ifmodel_id
presents, the query executor will call the sparse model for query encoding, while iftokenizer
presents, the executor will only encode the query via tokenization.Reference
[1] Dai et al, Context-Aware Sentence/Passage Term Importance Estimation For First Stage Retrieval, arxiv.org. 2019.
[2] Bai et al, SparTerm: Learning Term-based Sparse Representation for Fast Text Retrieval, arxiv.org. 2020.
[3] Formal et al, SPLADE: Sparse lexical and expansion model for first stage ranking, SIGIR, 2021.
[4] Formal et al, SPLADE v2: Sparse lexical and expansion model for information retrieval, arxiv.org. 2021.
[5] Neelakantan et al, Text and Code Embeddings by Contrastive Pre-Training, arxiv.org. 2022.
The text was updated successfully, but these errors were encountered: