v2.6.0 - Embedding Quantization, GISTEmbedLoss
This release brings embedding quantization: a way to heavily speed up retrieval & other tasks, and a new powerful loss function: GISTEmbedLoss.
Install this version with
pip install sentence-transformers==2.6.0
Embedding Quantization
Embeddings may be challenging to scale up, which leads to expensive solutions and high latencies. However, there is a new approach to counter this problem; it entails reducing the size of each of the individual values in the embedding: Quantization. Experiments on quantization have shown that we can maintain a large amount of performance while significantly speeding up computation and saving on memory, storage, and costs.
To be specific, using binary quantization may result in retaining 96% of the retrieval performance, while speeding up retrieval by 25x and saving on memory & disk space with 32x. Do not underestimate this approach! Read more about Embedding Quantization in our extensive blogpost.
Binary and Scalar Quantization
Two forms of quantization exist at this time: binary and scalar (int8). These quantize embedding values from float32
into binary
and int8
, respectively. For Binary quantization, you can use the following snippet:
from sentence_transformers import SentenceTransformer
from sentence_transformers.quantization import quantize_embeddings
# 1. Load an embedding model
model = SentenceTransformer("mixedbread-ai/mxbai-embed-large-v1")
# 2a. Encode some text using "binary" quantization
binary_embeddings = model.encode(
["I am driving to the lake.", "It is a beautiful day."],
precision="binary",
)
# 2b. or, encode some text without quantization & apply quantization afterwards
embeddings = model.encode(["I am driving to the lake.", "It is a beautiful day."])
binary_embeddings = quantize_embeddings(embeddings, precision="binary")
References:
GISTEmbedLoss
GISTEmbedLoss, as introduced in Solatorio (2024), is a guided variant of the more standard in-batch negatives (MultipleNegativesRankingLoss
) loss. Both loss functions are provided with a list of (anchor, positive) pairs, but while MultipleNegativesRankingLoss
uses anchor_i
and positive_i
as positive pair and all positive_j
with i != j
as negative pairs, GISTEmbedLoss
uses a second model to guide the in-batch negative sample selection.
This can be very useful, because it is plausible that anchor_i
and positive_j
are actually quite semantically similar. In this case, GISTEmbedLoss
would not consider them a negative pair, while MultipleNegativesRankingLoss
would. When finetuning MPNet-base on the AllNLI dataset, these are the Spearman correlation based on cosine similarity using the STS Benchmark dev set (higher is better):
The blue line is MultipleNegativesRankingLoss
, whereas the grey line is GISTEmbedLoss
with the small all-MiniLM-L6-v2
as the guide model. Note that all-MiniLM-L6-v2
by itself does not reach 88 Spearman correlation on this dataset, so this is really the effect of two models (mpnet-base
and all-MiniLM-L6-v2
) reaching a performance that they could not reach separately.
Soft save_to_hub
Deprecation
Most codebases that allow for pushing models to the Hugging Face Hub adopt a push_to_hub
method instead of a save_to_hub
method, and now Sentence Transformers will follow that convention. The push_to_hub
method will now be the recommended approach, although save_to_hub
will continue to exist for the time being: it will simply call push_to_hub
internally.
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("all-mpnet-base-v2")
...
# Train the model
model.fit(
train_objectives=[(train_dataloader, train_loss)],
evaluator=dev_evaluator,
epochs=num_epochs,
evaluation_steps=1000,
warmup_steps=warmup_steps,
)
# Push the model to Hugging Face
model.push_to_hub("tomaarsen/mpnet-base-nli-stsb")
All changes
- Add GISTEmbedLoss by @avsolatorio in #2535
- [
feat
] Add 'get_config_dict' method to GISTEmbedLoss for better model cards by @tomaarsen in #2543 - Enable saving modules as pytorch_model.bin by @CKeibel in #2542
- [
deprecation
] Deprecatesave_to_hub
in favor ofpush_to_hub
; add safe_serialization support topush_to_hub
by @tomaarsen in #2544 - Fix SentenceTransformer encode documentation return type default (numpy vectors) by @CKeibel in #2546
- [
docs
] Update return docstring of encode_multi_process by @tomaarsen in #2548 - [
feat
] Add binary & scalar embedding quantization support to Sentence Transformers by @tomaarsen in #2549
New Contributors
- @avsolatorio made their first contribution in #2535
- @CKeibel made their first contribution in #2542
Full Changelog: v2.5.1...v2.6.0