You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Bi-Encoder는 기존 semantic search 방식으로, query와 target 문장을 각각 sentence vector화 시키고, cosine-similarity 등으로 탐색하는 방식
Cross-Encoder는 query와 target 문장을 입력으로 넣고 유사도 자체를 모델 output으로 얻는 방식 (sentence vector 생성 안함)
보통 cross-encoder가 더 높은 정확도를 가지나, cross-encoder는 scalability가 좋지 못함.
sentence-transformers는 두 방식 모두 지원
Bi-Encoders (see Computing Sentence Embeddings) are used whenever you need a sentence embedding in a vector space for efficient comparison. Applications are for example Information Retrieval / Semantic Search or Clustering. Cross-Encoders would be the wrong choice for these application: Clustering 10,000 sentence with CrossEncoders would require computing similarity scores for about 50 Million sentence combinations, which takes about 65 hours. With a Bi-Encoder, you compute the embedding for each sentence, which takes only 5 seconds. You can then perform the clustering.
In search, or semantic matching of sentences, we can see this tradeoff in Bi-Encoder models compared with Cross-Encoder models. Bi-Encoder models are fast, but less accurate, while Cross-Encoders are more accurate, but slow. Luckily, we can combine them in a search pipeline to benefit from both models!
보통 cross-encoder가 더 높은 정확도를 가지나, cross-encoder는 scalability가 좋지 못함.
sentence-transformers는 두 방식 모두 지원
Bi-Encoders (see Computing Sentence Embeddings) are used whenever you need a sentence embedding in a vector space for efficient comparison. Applications are for example Information Retrieval / Semantic Search or Clustering. Cross-Encoders would be the wrong choice for these application: Clustering 10,000 sentence with CrossEncoders would require computing similarity scores for about 50 Million sentence combinations, which takes about 65 hours. With a Bi-Encoder, you compute the embedding for each sentence, which takes only 5 seconds. You can then perform the clustering.
sentence-transformers; cross-encoder vs bi-encoder
The text was updated successfully, but these errors were encountered: