Bi-Encoder vs Cross-Encoder #12

YeonwooSung · 2023-09-11T02:41:20Z

Bi-Encoder는 기존 semantic search 방식으로, query와 target 문장을 각각 sentence vector화 시키고, cosine-similarity 등으로 탐색하는 방식
Cross-Encoder는 query와 target 문장을 입력으로 넣고 유사도 자체를 모델 output으로 얻는 방식 (sentence vector 생성 안함)

보통 cross-encoder가 더 높은 정확도를 가지나, cross-encoder는 scalability가 좋지 못함.

sentence-transformers는 두 방식 모두 지원

Bi-Encoders (see Computing Sentence Embeddings) are used whenever you need a sentence embedding in a vector space for efficient comparison. Applications are for example Information Retrieval / Semantic Search or Clustering. Cross-Encoders would be the wrong choice for these application: Clustering 10,000 sentence with CrossEncoders would require computing similarity scores for about 50 Million sentence combinations, which takes about 65 hours. With a Bi-Encoder, you compute the embedding for each sentence, which takes only 5 seconds. You can then perform the clustering.

sentence-transformers; cross-encoder vs bi-encoder

YeonwooSung · 2023-09-11T04:24:32Z

Using Cross-Encoders as reranker in multistage vector search

In search, or semantic matching of sentences, we can see this tradeoff in Bi-Encoder models compared with Cross-Encoder models. Bi-Encoder models are fast, but less accurate, while Cross-Encoders are more accurate, but slow. Luckily, we can combine them in a search pipeline to benefit from both models!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bi-Encoder vs Cross-Encoder #12

Bi-Encoder vs Cross-Encoder #12

YeonwooSung commented Sep 11, 2023 •

edited

Loading

YeonwooSung commented Sep 11, 2023

Bi-Encoder vs Cross-Encoder #12

Bi-Encoder vs Cross-Encoder #12

Comments

YeonwooSung commented Sep 11, 2023 • edited Loading

YeonwooSung commented Sep 11, 2023

YeonwooSung commented Sep 11, 2023 •

edited

Loading