Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bi-Encoder vs Cross-Encoder #12

Open
YeonwooSung opened this issue Sep 11, 2023 · 1 comment
Open

Bi-Encoder vs Cross-Encoder #12

YeonwooSung opened this issue Sep 11, 2023 · 1 comment

Comments

@YeonwooSung
Copy link
Contributor

YeonwooSung commented Sep 11, 2023

  • Bi-Encoder는 기존 semantic search 방식으로, query와 target 문장을 각각 sentence vector화 시키고, cosine-similarity 등으로 탐색하는 방식
  • Cross-Encoder는 query와 target 문장을 입력으로 넣고 유사도 자체를 모델 output으로 얻는 방식 (sentence vector 생성 안함)

보통 cross-encoder가 더 높은 정확도를 가지나, cross-encoder는 scalability가 좋지 못함.

sentence-transformers는 두 방식 모두 지원

Bi-Encoders (see Computing Sentence Embeddings) are used whenever you need a sentence embedding in a vector space for efficient comparison. Applications are for example Information Retrieval / Semantic Search or Clustering. Cross-Encoders would be the wrong choice for these application: Clustering 10,000 sentence with CrossEncoders would require computing similarity scores for about 50 Million sentence combinations, which takes about 65 hours. With a Bi-Encoder, you compute the embedding for each sentence, which takes only 5 seconds. You can then perform the clustering.

sentence-transformers; cross-encoder vs bi-encoder

@YeonwooSung
Copy link
Contributor Author

Using Cross-Encoders as reranker in multistage vector search

In search, or semantic matching of sentences, we can see this tradeoff in Bi-Encoder models compared with Cross-Encoder models. Bi-Encoder models are fast, but less accurate, while Cross-Encoders are more accurate, but slow. Luckily, we can combine them in a search pipeline to benefit from both models!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant