Skip to content

Releases: UKPLab/sentence-transformers

v3.3.0 - Massive CPU speedup with OpenVINO int8 quantization; Training with Prompts for stronger models; NanoBEIR IR evaluation; PEFT compatibility; Transformers v4.46.0 compatibility

11 Nov 12:02
Compare
Choose a tag to compare

4x speedup for CPU with OpenVINO int8 static quantization, training with prompts for a free performance boost, convenient evaluation on NanoBEIR: a subset of a strong Information Retrieval benchmark, PEFT compatibility by easily adding/loading adapters, Transformers v4.46.0 compatibility, and Python 3.8 deprecation.

Install this version with:

# Training + Inference
pip install sentence-transformers[train]==3.3.0

# Inference only, use one of:
pip install sentence-transformers==3.3.0
pip install sentence-transformers[onnx-gpu]==3.3.0
pip install sentence-transformers[onnx]==3.3.0
pip install sentence-transformers[openvino]==3.3.0

OpenVINO int8 static quantization (#3025)

We introduce int8 static quantization using OpenVINO, a highly performant solution that outperforms all other current backends by a mile, at a minimal loss in performance. Here are the updated benchmarks:

Quantizing directly to the Hugging Face Hub

from sentence_transformers import SentenceTransformer, export_static_quantized_openvino_model

# 1. Load a model with the OpenVINO backend
model = SentenceTransformer("all-MiniLM-L6-v2", backend="openvino")

# 2. Quantize the model to int8, push the model to https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2
# as a pull request:
export_static_quantized_openvino_model(
    model,
    quantization_config=None,
    model_name_or_path="sentence-transformers/all-MiniLM-L6-v2",
    push_to_hub=True,
    create_pr=True,
)

You can immediately use the model, even before it's merged, by using the revision argument:

from sentence_transformers import SentenceTransformer

pull_request_nr = 2 # TODO: Update this to the number of your pull request
model = SentenceTransformer(
    "all-MiniLM-L6-v2",
    backend="openvino",
    model_kwargs={"file_name": "openvino_model_qint8_quantized.xml"},
    revision=f"refs/pr/{pull_request_nr}"
)

And once it's merged:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer(
    "all-MiniLM-L6-v2",
    backend="openvino",
    model_kwargs={"file_name": "openvino/openvino_model_qint8_quantized.xml"},
)

Quantizing locally

You can also quantize a model and save it locally:

from sentence_transformers import SentenceTransformer, export_static_quantized_openvino_model
from optimum.intel import OVQuantizationConfig

model = SentenceTransformer("all-mpnet-base-v2", backend="openvino")
model.save_pretrained("path/to/all-mpnet-base-v2-local")
quantization_config = OVQuantizationConfig() # <- You can update settings here
export_static_quantized_openvino_model(model, quantization_config, "path/to/all-mpnet-base-v2-local")

And after quantizing, you can load it like so:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer(
    "path/to/all-mpnet-base-v2-local",
    backend="openvino",
    model_kwargs={"file_name": "openvino_model_qint8_quantized.xml"},
)

All original Sentence Transformer models already have these new openvino_model_qint8_quantized.xml files, so you can load them without exporting directly! I would recommend making pull requests for other models on Hugging Face that you'd like to see quantized.

Learn more about how to Speed up Inference in the documentation: https://sbert.net/docs/sentence_transformer/usage/efficiency.html

Training with Prompts (#2964)

Many modern embedding models are trained with “instructions” or “prompts” following the INSTRUCTOR paper. These prompts are strings, prefixed to each text to be embedded, allowing the model to distinguish between different types of text.

For example, the mixedbread-ai/mxbai-embed-large-v1 model was trained with Represent this sentence for searching relevant passages: as the prompt for all queries. This prompt is stored in the model configuration under the prompt name "query", so users can specify that prompt_name in model.encode:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("mixedbread-ai/mxbai-embed-large-v1")
query_embedding = model.encode("What are Pandas?", prompt_name="query")
# or
# query_embedding = model.encode("What are Pandas?", prompt="Represent this sentence for searching relevant passages: ")
document_embeddings = model.encode([
    "Pandas is a software library written for the Python programming language for data manipulation and analysis.",
    "Pandas are a species of bear native to South Central China. They are also known as the giant panda or simply panda.",
    "Koala bears are not actually bears, they are marsupials native to Australia.",
])
similarity = model.similarity(query_embedding, document_embeddings)
print(similarity)
# => tensor([[0.7594, 0.7560, 0.4674]])

Various papers (INSTRUCTOR, BGE) show that including prompts or instructions both during training and inference results in stronger performance. As of this release, it's now possible to easily train with prompts in Sentence Transformers with just one extra training argument: prompts. There are 4 accepted formats for it:

  1. str: A single prompt to use for all columns in all datasets. For example:
    args = SentenceTransformerTrainingArguments(
        ...,
        prompts="text: ",
        ...,
    )
  2. Dict[str, str]: A dictionary mapping column names to prompts, applied to all datasets. For example:
    args = SentenceTransformerTrainingArguments(
        ...,
        prompts={
            "query": "query: ",
            "answer": "document: ",
        },
        ...,
    )
  3. Dict[str, str]: A dictionary mapping dataset names to prompts. This should only be used if your training/evaluation/test datasets are a DatasetDict or a dictionary of Dataset. For example:
    args = SentenceTransformerTrainingArguments(
        ...,
        prompts={
            "stsb": "Represent this text for semantic similarity search: ",
            "nq": "Represent this text for retrieval: ",
        },
        ...,
    )
  4. Dict[str, Dict[str, str]]: A dictionary mapping dataset names to dictionaries mapping column names to prompts. This should only be used if your training/evaluation/test datasets are a DatasetDict or a dictionary of Dataset. For example:
    args = SentenceTransformerTrainingArguments(
        ...,
        prompts={
            "stsb": {
                "sentence1": "sts: ",
                "sentence2": "sts: ",
            },
            "nq": {
                "query": "query: ",
                "document": "document: ",
            },
        },
        ...,
    )

I've trained models with and without prompts for 2 base models: mpnet-base and bert-base-uncased:

For both base models, the model with prompts consistently outperformed the baseline model. After training, the models with prompts resulted in a 0.66% and 0.90% relative improvement on NDCG@10 at no extra cost.

mpnet-base tests bert-base-uncased tests
mpnet_base_nq_nanobeir bert_base_nq_nanobeir

NanoBEIR Evaluator integration (#2966)

This update introduced a new simple NanoBEIREvaluator, evaluating your model against NanoBEIR: a collection of subsets of the 13 BEIR datasets. BEIR corresponds to the retrieval tab of MTEB, and is commonly seen as a valuable indicator of general-purpose information retrieval performance.

With the NanoBEIREvaluator, you can easily evaluate your models on a much faster benchmark that should give similar insights in performance...

Read more

v3.2.1 - Patch CLIP loading, small ONNX fix, compatibility with other libraries

21 Oct 12:19
Compare
Choose a tag to compare

This patch release fixes some small bugs, such as related to loading CLIP models, automatic model card generation issues, and ensuring compatibility with third party libraries.

Install this version with

# Training + Inference
pip install sentence-transformers[train]==3.2.1

# Inference only, use one of:
pip install sentence-transformers==3.2.1
pip install sentence-transformers[onnx-gpu]==3.2.1
pip install sentence-transformers[onnx]==3.2.1
pip install sentence-transformers[openvino]==3.2.1

Fixing Loading non-Transformer models

In v3.2.0, a non-Transformer based model (e.g. CLIP) would not load correctly if the model was saved in the root of the model repository/directory. This has been resolved in #3007.

Throw error if StaticEmbedding-based model is finetuned with incompatible losses

The following losses are not compatible with StaticEmbedding-based models:

  • CachedGISTEmbedLoss
  • CachedMultipleNegativesRankingLoss
  • CachedMultipleNegativesSymmetricRankingLoss
  • DenoisingAutoEncoderLoss
  • GISTEmbedLoss

An error is now thrown when one of these are used with a StaticEmbedding-based model. I recommend using MultipleNegativesRankingLoss to finetune these models, e.g. as in https://huggingface.co/tomaarsen/static-bert-uncased-gooaq.
Note: to get good performance, you must use much higher learning rates than otherwise. In my experiments, 2e-1 worked well.

Patch ONNX model when the model uses output_hidden_states

For example, this script used to fail, but passes now:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer(
    "distiluse-base-multilingual-cased",
    backend="onnx",
    model_kwargs={"provider": "CPUExecutionProvider"},
)

sentences = ["This is an example sentence", "Each sentence is converted"]
embeddings = model.encode(sentences)
print(embeddings.shape)

All changes

  • Bump optimum version by @echarlaix in #2984
  • [docs] Update the training snippets for some losses that should use the v3 Trainer by @tomaarsen in #2987
  • [enh] Throw error if StaticEmbedding-based model is trained with incompatible loss by @tomaarsen in #2990
  • [fix] Fix semantic_search_usearch with 'binary' by @tomaarsen in #2989
  • [enh] Add support for large_string in model card create by @yaohwang in #2999
  • [model cards] Prevent crash on generating widgets if dataset column is empty by @tomaarsen in #2997
  • [fix] Added model2vec import compatible with current and newer version by @Pringled in #2992
  • Fix cache_dir issue with loading CLIPModel by @BoPeng in #3007
  • [warn] Throw a warning if compute_metrics is set, as it's not used by @tomaarsen in #3002
  • [fix] Prevent IndexError if output_hidden_states & ONNX by @tomaarsen in #3008

New Contributors

Full Changelog: v3.2.0...v3.2.1

v3.2.0 - ONNX and OpenVINO backends offering 2-3x speedup; Static Embeddings offering 50x-500x speedups at ~10-20% performance cost

10 Oct 17:59
Compare
Choose a tag to compare

This release introduces 2 new efficient computing backends for SentenceTransformer models: ONNX and OpenVINO + optimization & quantization, allowing for speedups up to 2x-3x; static embeddings via Model2Vec allowing for lightning-fast models (i.e., 50x-500x speedups) at a ~10%-20% performance cost; and various small improvements and fixes.

Install this version with

# Training + Inference
pip install sentence-transformers[train]==3.2.0

# Inference only, use one of:
pip install sentence-transformers==3.2.0
pip install sentence-transformers[onnx-gpu]==3.2.0
pip install sentence-transformers[onnx]==3.2.0
pip install sentence-transformers[openvino]==3.2.0

Faster ONNX and OpenVINO Backends for SentenceTransformer (#2712)

Introducing a new backend keyword argument to the SentenceTransformer initialization, allowing values of "torch" (default), "onnx", and "openvino".
These come with new installations:

pip install sentence-transformers[onnx-gpu]
# or ONNX for CPU only:
pip install sentence-transformers[onnx]
# or
pip install sentence-transformers[openvino]

It's as simple as:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("all-MiniLM-L6-v2", backend="onnx")

sentences = ["This is an example sentence", "Each sentence is converted"]
embeddings = model.encode(sentences)

If you specify a backend and your model repository or directory contains an ONNX/OpenVINO model file, it will automatically be used! And if your model repository or directory doesn't have one already, an ONNX/OpenVINO model will be automatically exported. Just remember to model.push_to_hub or model.save_pretrained into the same model repository or directory to avoid having to re-export the model every time.

All keyword arguments passed via model_kwargs will be passed on to ORTModel.from_pretrained or OVBaseModel.from_pretrained. The most useful arguments are:

  • provider: (Only if backend="onnx") ONNX Runtime provider to use for loading the model, e.g. "CPUExecutionProvider" . See https://onnxruntime.ai/docs/execution-providers/ for possible providers. If not specified, the strongest provider (E.g. "CUDAExecutionProvider") will be used.
  • file_name: The name of the ONNX file to load. If not specified, will default to "model.onnx" or otherwise "onnx/model.onnx" for ONNX, and "openvino_model.xml" and "openvino/openvino_model.xml" for OpenVINO. This argument is useful for specifying optimized or quantized models.
  • export: A boolean flag specifying whether the model will be exported. If not provided, export will be set to True if the model repository or directory does not already contain an ONNX or OpenVINO model.

For example:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer(
    "all-MiniLM-L6-v2",
	backend="onnx",
	model_kwargs={
		"file_name": "model_O3.onnx",
		"provider": "CPUExecutionProvider",
	}
)

sentences = ["This is an example sentence", "Each sentence is converted"]
embeddings = model.encode(sentences)

Benchmarks

We ran benchmarks for CPU and GPU, averaging findings across 4 models of various sizes, 3 datasets, and numerous batch sizes. Here are the findings:

These findings resulted in these recommendations:
image

For GPU, you can expect 2x speedup with fp16 at no cost, and for CPU you can expect ~2.5x speedup at a cost of 0.4% accuracy.

ONNX Optimization and Quantization

In addition to exporting default ONNX and OpenVINO models, we also introduce 2 helper methods for optimizing and quantizing ONNX models:

Optimization

export_optimized_onnx_model: This function uses Optimum to implement several optimizations in the ONNX model, ranging from basic optimizations to approximations and mixed precision. Read about the 4 default options here. This function accepts:

  • model A SentenceTransformer model loaded with backend="onnx".
  • optimization_config: "O1", "O2", "O3", or "O4" from 🤗 Optimum or a custom OptimizationConfig instance.
  • model_name_or_path: The directory or model repository where the optimized model will be saved.
  • push_to_hub: Whether the push the exported model to the hub with model_name_or_path as the repository name. If False, the model will be saved in the directory specified with model_name_or_path.
  • create_pr: If push_to_hub, then this denotes whether a pull request is created rather than pushing the model directly to the repository. Very useful for optimizing models of repositories that you don't have write access to.
  • file_suffix: The suffix to add to the optimized model file name. Will use the optimization_config string or "optimized" if not set.

The usage is like this:

from sentence_transformers import SentenceTransformer, export_optimized_onnx_model

onnx_model = SentenceTransformer("BAAI/bge-large-en-v1.5", backend="onnx")
export_optimized_onnx_model(
	model=onnx_model,
	optimization_config="O4",
	model_name_or_path="BAAI/bge-large-en-v1.5",
	push_to_hub=True,
	create_pr=True,
)

After which you can load the model with:

from sentence_transformers import SentenceTransformer

pull_request_nr = 2 # TODO: Update this to the number of your pull request
model = SentenceTransformer(
   "BAAI/bge-large-en-v1.5",
   backend="onnx",
   model_kwargs={"file_name": "onnx/model_O4.onnx"},
   revision=f"refs/pr/{pull_request_nr}"
)

or when it gets merged:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer(
   "BAAI/bge-large-en-v1.5",
   backend="onnx",
   model_kwargs={"file_name": "onnx/model_O4.onnx"},
)

Quantization

export_dynamic_quantized_onnx_model: This function uses Optimum to quantize the ONNX model to int8, also allowing for hardware-specific optimizations. This results in impressive speedups for CPUs. In my findings, each of the default quantization configuration options gave approximately the same performance improvements. This function accepts

  • model A SentenceTransformer model loaded with backend="onnx".
  • quantization_config: "arm64", "avx2", "avx512", or "avx512_vnni" representing quantization configurations from AutoQuantizationConfig, or an QuantizationConfig instance.
  • model_name_or_path: The directory or model repository where the optimized model will be saved.
  • push_to_hub: Whether the push the exported model to the hub with model_name_or_path as the repository name. If False, the model will be saved in the directory specified with model_name_or_path.
  • create_pr: If push_to_hub, then this denotes whether a pull request is created rather than pushing the model directly to the repository. Very useful for quantizing models of repositories that you don't have write access to.
  • file_suffix: The suffix to add to the optimized model file name. Will use the quantization_config string or e.g. "int8_quantized" if not set.

The usage is like this:

from sentence_transformers import SentenceTransformer, export_quantized_onnx_model

onnx_model = SentenceTransformer("BAAI/bge-large-en-v1.5", backend="onnx")
export_quantized_onnx_model(
	model=onnx_model,
	quantization_config="avx512",
	model_name_or_path="BAAI/bge-large-en-v1.5",
	push_to_hub=True,
	create_pr=True,
)

After which you can load the model with:

from sentence_transformers import SentenceTransformer

pull_request_nr = 2 # TODO: Update this to the number of your pull request
model = SentenceTransformer(
   "BAAI/bge-large-en-v1.5",
   backend="onnx",
   model_kwargs={"file_name": "onnx/model_qint8_avx512.onnx"},
   revision=f"refs/pr/{pull_request_nr}"
)

or when it gets merged:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer(
   "BAAI/bge-large-en-v1.5",
   backend="onnx",
   model_kwargs={"file_name": "onnx/model_qint8_avx512.onnx"},
)

Lightning-Fast Static Embeddings via Model2Vec (#2961)

If ONNX or OpenVINO isn't fast enough for you yet, then perhaps you'll enjoy Static Embeddings. These embeddings are a bit akin to [GLoVe](https://nlp.stanford.edu/proje...

Read more

v3.1.1 - Patch hard negative mining & remove `numpy<2` restriction

19 Sep 14:18
Compare
Choose a tag to compare

This patch release fixes hard negatives mining for models that don't automatically normalize their embeddings and it lifts the numpy<2 restriction that was previously required.

Install this version with

# Full installation:
pip install sentence-transformers[train]==3.1.1

# Inference only:
pip install sentence-transformers==3.1.1

Hard Negatives Mining Patch (#2944)

The mine_hard_negatives utility introduced in the previous release would fail if use_faiss=True & the model does not automatically normalize its embeddings. This release patches that, allowing the utility to work with all Sentence Transformer models:

from sentence_transformers.util import mine_hard_negatives
from sentence_transformers import SentenceTransformer
from datasets import load_dataset

# Load a Sentence Transformer model
model = SentenceTransformer("mixedbread-ai/mxbai-embed-large-v1").bfloat16()

# Load a dataset to mine hard negatives from
dataset = load_dataset("sentence-transformers/natural-questions", split="train[:10000]")
print(dataset)
"""
Dataset({
    features: ['query', 'answer'],
    num_rows: 10000
})
"""

# Mine hard negatives
dataset = mine_hard_negatives(
    dataset=dataset,
    model=model,
    range_min=10,
    range_max=50,
    max_score=0.8,
    margin=0.1,
    num_negatives=5,
    sampling_strategy="random",
    batch_size=128,
    use_faiss=True,
)
'''
Batches: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 75/75 [00:21<00:00,  3.51it/s]
Batches: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 79/79 [00:03<00:00, 25.77it/s]
Querying FAISS index: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  3.98it/s]
Metric       Positive       Negative     Difference
Count          10,000         47,711
Mean           0.7600         0.5376         0.2299
Median         0.7673         0.5379         0.2274
Std            0.0658         0.0387         0.0629
Min            0.3858         0.3732         0.1044
25%            0.7219         0.5129         0.1833
50%            0.7673         0.5379         0.2274
75%            0.8058         0.5617         0.2724
Max            0.9341         0.7024         0.4780
Skipped 48770 potential negatives (9.56%) due to the margin of 0.1.
Could not find enough negatives for 2289 samples (4.58%). Consider adjusting the range_max, range_min, margin and max_score parameters if you'd like to find more valid negatives.
'''
print(dataset)
'''
Dataset({
    features: ['query', 'answer', 'negative'],
    num_rows: 47711
})
'''
print(dataset[0])
'''
{
    'query': 'where is the us navy base in japan located',
    'answer': 'United States Fleet Activities Yokosuka The United States Fleet Activities Yokosuka (横須賀海 軍施設, Yokosuka kaigunshisetsu) or Commander Fleet Activities Yokosuka (司令官艦隊活動横須賀, Shirei-kan kantai katsudō Yokosuka) is a United States Navy base in Yokosuka, Japan. Its mission is to maintain and operate base facilities for the logistic, recreational, administrative support and service of the U.S. Naval Forces Japan, Seventh Fleet and other operating forces assigned in the Western Pacific. CFAY is the largest strategically important U.S. naval installation in the western Pacific.[1] As of August 2013[update], it was commanded by Captain David Glenister.',
    'negative': "2011 Tōhoku earthquake and tsunami The earthquake took place at 14:46 JST (UTC 05:46) around 67\xa0km (42\xa0mi) from the nearest point on Japan's coastline, and initial estimates indicated the tsunami would have taken 10 to 30\xa0minutes to reach the areas first affected, and then areas farther north and south based on the geography of the coastline.[127][128] Just over an hour after the earthquake at 15:55 JST, a tsunami was observed flooding Sendai Airport, which is located near the coast of Miyagi Prefecture,[129][130] with waves sweeping away cars and planes and flooding various buildings as they traveled inland.[131][132] The impact of the tsunami in and around Sendai Airport was filmed by an NHK News helicopter, showing a number of vehicles on local roads trying to escape the approaching wave and being engulfed by it.[133] A 4-metre-high (13\xa0ft) tsunami hit Iwate Prefecture.[134] Wakabayashi Ward in Sendai was also particularly hard hit.[135] At least 101 designated tsunami evacuation sites were hit by the wave.[136]"
}
'''
dataset.push_to_hub("natural-questions-hard-negatives", "triplet")

Thanks to @omarnj-lab for pointing out the bug to me.

Numpy restriction lifted (#2937)

The v3.1.0 Sentence Transformers release required numpy<2 to prevent crashes on Windows. However, various third-parties (e.g. scipy) have now been recompiled & released, allowing the Windows tests to pass again.

If you experience the following snippet:

A module that was compiled using NumPy 1.x cannot be run in NumPy 2.0.0 as it may crash. To support both 1.x and 2.x versions of NumPy, modules must be compiled with NumPy 2.0. Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.
If you are a user of the module, the easiest solution will be to downgrade to 'numpy<2' or try to upgrade the affected module. We expect that some modules will need time to support NumPy 2.

Then consider 1) upgrading the dependency from which the error occurred or 2) downgrading numpy to below v2:

pip install -U numpy<2

Thanks to @kozlek for pointing this out to me and helping getting it resolved.

All changes

  • [deps] Attempt to remove numpy restrictions by @tomaarsen in #2937
  • [metadata] Extend pyproject.toml metadata by @tomaarsen in #2943
  • [fix] Ensure that the embeddings from hard negative mining are normalized by @tomaarsen in #2944

Full Changelog: v3.1.0...v3.1.1

v3.1.0 - Hard Negatives Mining utility; new loss function for symmetric tasks; streaming datasets; custom modules

11 Sep 14:06
Compare
Choose a tag to compare

This release introduces a hard negatives mining utility to get better models out of your data, a new strong loss function for symmetric tasks, training with streaming datasets to avoid having to store datasets fully on disk, custom modules to allow for more creativity from model authors, and many bug fixes, small additions and documentation improvements.

Install this version with

# Full installation:
pip install sentence-transformers[train]==3.1.0

# Inference only:
pip install sentence-transformers==3.1.0

Warning

Due to incompatibilities with Windows, we have set numpy<2 in the Sentence Transformers requirements. If you're not on Windows, you can still install numpy>=2 and everything should work as expected.

Hard Negatives Mining utility (#2768, #2848)

Hard negatives are texts that are rather similar to some anchor text (e.g. a question), but are not the correct match. For example:

  • Anchor: "are red pandas actually pandas?"
  • Positive: "Red pandas, like giant pandas, are bamboo eaters native to Asia's high forests. Despite these similarities and their shared name, the two species are not closely related. Red pandas are much smaller than giant pandas and are the only living member of their taxonomic family."
  • Hard negative: "The giant panda (Ailuropoda melanoleuca; Chinese: 大熊猫; pinyin: dàxióngmāo), also known as the panda bear or simply the panda, is a bear native to south central China."

These negatives are more difficult for a model to distinguish from the correct answer, leading to a stronger training signal and a stronger overall model when used with one of the Loss Functions that accepts (anchor, positive, negative) pairs such as the one above.

This release introduces a utility function called mine_hard_negatives that allows you to mine for these hard negatives given a (anchor, positive) dataset (and optionally a corpus of negative candidate texts).

It boasts the following features to give you fine-grained control over the similarity of the mined negatives relative to the anchor:

  • CrossEncoder rescoring for higher quality negative selection.
  • Skip the top $n$ negative candidates as these might be true positives.
  • Consider only the top $n$ negative candidates.
  • Skip negative candidates that are within some margin of the true similarity between anchor and positive.
  • Skip negative candidates whose similarity is larger than some max_score.
  • Two sampling strategies: pick the top negative candidates that satisfy the requirements, or pick them randomly.
  • FAISS index for searching for negative candidates.
  • Option to return data as triplets only, or as 2 + num_negatives-tuples.
from sentence_transformers.util import mine_hard_negatives
from sentence_transformers import SentenceTransformer
from datasets import load_dataset

# Load a Sentence Transformer model
model = SentenceTransformer("all-MiniLM-L6-v2")

# Load a dataset to mine hard negatives from
dataset = load_dataset("sentence-transformers/natural-questions", split="train")
print(dataset)
"""
Dataset({
    features: ['query', 'answer'],
    num_rows: 100231
})
"""

# Mine hard negatives
dataset = mine_hard_negatives(
    dataset=dataset,
    model=model,
    range_min=10,
    range_max=50,
    max_score=0.8,
    margin=0.1,
    num_negatives=5,
    sampling_strategy="random",
    batch_size=128,
    use_faiss=True,
)
'''
Batches: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 588/588 [00:33<00:00, 17.37it/s]
Batches: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 784/784 [00:07<00:00, 101.55it/s]
Querying FAISS index: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:07<00:00,  1.06s/it]
Metric       Positive       Negative     Difference
Count         100,231        460,725        460,725
Mean           0.6866         0.4133         0.2917
Median         0.7010         0.4059         0.2873
Std            0.1125         0.0673         0.1006
Min            0.0303         0.1638         0.1029
25%            0.6221         0.3649         0.2112
50%            0.7010         0.4059         0.2873
75%            0.7667         0.4561         0.3647
Max            0.9584         0.7362         0.7073
Skipped 882722 potential negatives (17.27%) due to the margin of 0.1.
Skipped 27 potential negatives (0.00%) due to the maximum score of 0.8.
Could not find enough negatives for 40430 samples (8.07%). Consider adjusting the range_max, range_min, margin and max_score parameters if you'd like to find more valid negatives.
'''
print(dataset)
'''
Dataset({
    features: ['query', 'answer', 'negative'],
    num_rows: 460725
})
'''
print(dataset[0])
'''
{
    'query': 'the first person to use the word geography was',
    'answer': 'History of geography The history of geography includes many histories of geography which have differed over time and between different cultural and political groups. In more recent developments, geography has become a distinct academic discipline. \'Geography\' derives from the Greek γεωγραφία – geographia,[1] a literal translation of which would be "to describe or write about the Earth". The first person to use the word "geography" was Eratosthenes (276–194 BC). However, there is evidence for recognizable practices of geography, such as cartography (or map-making) prior to the use of the term geography.',
    'negative': 'Terminology of the British Isles The word "Great" means "larger", in comparison with Brittany in modern-day France. One historical term for the peninsula in France that largely corresponds to the modern French province is Lesser or Little Britain. That region was settled by many British immigrants during the period of Anglo-Saxon migration into Britain, and named "Little Britain" by them. The French term "Bretagne" now refers to the French "Little Britain", not to the British "Great Britain", which in French is called Grande-Bretagne. In classical times, the Graeco-Roman geographer Ptolemy in his Almagest also called the larger island megale Brettania (great Britain). At that time, it was in contrast to the smaller island of Ireland, which he called mikra Brettania (little Britain).[62] In his later work Geography, Ptolemy refers to Great Britain as Albion and to Ireland as Iwernia. These "new" names were likely to have been the native names for the islands at the time. The earlier names, in contrast, were likely to have been coined before direct contact with local peoples was made.[63]'
}
'''
dataset.push_to_hub("natural-questions-hard-negatives", "triplet")

This dataset can immediately be used in conjunction with MultipleNegativesRankingLoss, likely resulting in a stronger model than if you had just used the natural-questions dataset outright.

Here are some example datasets that I created using this new function:

Big thanks to @ChrisGeishauser and @ArthurCamara for assisting with this feature.

Add CachedMultipleNegativesSymmetricRankingLoss loss function (#2879)

Let's break this down:

  • MultipleNegativesRankingLoss (MNRL): Given (anchor, positive) text pairs or (anchor, positive, negative) text triplets, this loss trains for "Given an anchor (e.g. a query), which text out of a big lineup (all positives and negatives in the batch) is the true positive (e.g. the answer)?".
  • MultipleNegativesSymmetricRankingLoss (MNSRL): Adaptation of MNRL that adds a second loss term which means: "Given an positive (e.g. an summary), which text out of a big lineup (all anchors) is the true anchor (e.g. the full article)?". This is useful for symmetric tasks, such as clustering, classification, finding similar texts, and a bit less useful for asymmetric tasks such as question-answer retrieval.
  • CachedMultipleNegativesRankingLoss (CMNRL): Adaptation of MNRL such that the batch size can be increased to an arbitrary size at a flat 10-20% training speed cost. A higher batch size means a larger lineup for the model to find the true positive in, often resulting in a better training signal and model.

The v3.1 Sentence Transformers release now introduces a new loss: CachedMultipleNegativesSymmetricRankingLoss (CMNSRL), which combines both of the previous adaptations. The result is a loss adept at symmetric training tasks for which you can pick an arbitrarily large batch size. It is likely the strongest loss for Semantic Textual Similarity (STS) tasks in Sentence Transformers now.
Big thanks to @madhavthaker1 for working to include it.

Str...

Read more

v3.0.1 - Patch introducing new Trainer features, model card improvements and evaluator fixes

07 Jun 13:01
Compare
Choose a tag to compare

This patch release introduces some improvements for the SentenceTransformerTrainer, as well as some updates for the automatic model card generation. It also patches some minor evaluator bugs and a bug with MatryoshkaLoss. Lastly, every single Sentence Transformer model can now be saved and loaded with the safer model.safetensors files.

Install this version with

# Full installation:
pip install sentence-transformers[train]==3.0.1

# Inference only:
pip install sentence-transformers==3.0.1

SentenceTransformerTrainer improvements

  • Implement gradient checkpointing for lower memory usage during training (#2717)
  • Implement support for push_to_hub=True Training Argument, also implement trainer.push_to_hub(...) (#2718)

Model Cards

This patch release improves on the automatically generated model cards in several ways:

  • Your training datasets are now automatically linked if they're on Hugging Face (#2711)
  • A new generated_from_trainer tag is now also added (#2710)
  • The automatically included widget examples are now improved, especially for question-answering. Previously, the widget could give examples of comparing two questions with eachother (#2713)
  • If you save a model locally, then load it again and upload it, it would previously still show
...
# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
...

This now gets replaced with your new model ID on Hugging Face (#2714)

  • The exact training dataset size is now included in the model metadata, rather than as a bucket of e.g. 1K<n<10K (#2728)

Evaluators fixes

  • The primary metric of evaluators in SequentialEvaluator would be ignored in the scores calculation (#2700)
  • Fix confusing print statement in TranslationEvaluator when using print_wrong_matches=True (#1894)
  • Fix bug that prevents you from customizing the primary_metric in InformationRetrievalEvaluator (#2701)
  • Allow passing a list of evaluators to the STTrainer rather than a SequentialEvaluator (#2717)

Losses fixes

  • Fix MatryoshkaLoss crash if the first dimension is not the biggest (#2719)

Security

  • Integrate safetensors with all modules, including Dense, LSTM, CNN, etc. to prevent needing pickled pytorch_model.bin anymore (#2722)

All changes

  • updating to evaluation_strategy by @higorsilvaa in #2686
  • fix loss link by @Samoed in #2690
  • Fix bug that restricts users from specifying custom primary_function in InformationRetrievalEvaluator by @hetulvp in #2701
  • Fix a bug in SequentialEvaluator to use primary_metric if defined in evaluator. by @hetulvp in #2700
  • [fix] Always override the originally saved version in the ST config by @tomaarsen in #2709
  • [model cards] Also include HF datasets in the model card metadata by @tomaarsen in #2711
  • Add "generated_from_trainer" tag to auto-generated model cards by @tomaarsen in #2710
  • Fix confusing print statement in TranslationEvaluator by @NathanS-Git in #1894
  • [model cards] Improve the widget example selection: not based on embeddings, better for QA by @tomaarsen in #2713
  • [model cards] Replace 'sentence_transformers_model_id' from reused model if possible by @tomaarsen in #2714
  • [feat] Allow passing a list of evaluators to the Trainer by @tomaarsen in #2716
  • [fix] Fix gradient checkpointing to allow for much lower memory usage by @tomaarsen in #2717
  • [fix] Implement create_model_card on the Trainer, allowing args.push_to_hub=True by @tomaarsen in #2718
  • [fix] Fix MatryoshkaLoss crash if the first dimension is not the biggest by @tomaarsen in #2719
  • Update models_en_sentence_embeddings.html by @saikartheekb in #2720
  • [typing] Improve typing for many functions & add py.typed to satisfy mypy by @tomaarsen in #2724
  • [fix] Fix edge case with evaluator being None by @tomaarsen in #2726
  • [simplify] Set can_return_loss=True globally, instead of via the data collator by @tomaarsen in #2727
  • [feat] Integrate safetensors with Dense, etc. modules too. by @tomaarsen in #2722
  • [model cards] Specify the exact dataset size as a tag, will be bucketized by HF by @tomaarsen in #2728

New Contributors

Full Changelog: v3.0.0...v3.0.1

v3.0.0 - Sentence Transformer Training Refactor; new similarity methods; hyperparameter optimization; 50+ datasets release

28 May 11:54
Compare
Choose a tag to compare

This release consists of a major refactor that overhauls the training approach (introducing multi-gpu training, bf16, loss logging, callbacks, and much more), adds convenient similarity and similarity_pairwise methods, adds extra keyword arguments, introduces Hyperparameter Optimization, and includes a massive reformatting and release of 50+ datasets for training embedding models. In total, this is the largest Sentence Transformers update since the project was first created.

Install this version with

# Full installation:
pip install sentence-transformers[train]==3.0.0

# Inference only:
pip install sentence-transformers==3.0.0

Sentence Transformer training refactor (#2449)

The v3.0 release centers around this huge modernization of the training approach for SentenceTransformer models. Whereas training before v3.0 used to be all about InputExample, DataLoader and model.fit, the new training approach relies on 5 new components. You can learn more about these components in our Training and Finetuning Embedding Models with Sentence Transformers v3 blogpost. Additionally, you can read the new Training Overview, check out the Training Examples, or read this summary:

  1. Dataset
    A training Dataset or DatasetDict. This class is much more suited for sharing & efficient modifications than lists/DataLoaders of InputExample instances. A Dataset can contain multiple text columns that will be fed in order to the corresponding loss function. So, if the loss expects (anchor, positive, negative) triplets, then your dataset should also have 3 columns. The names of these columns are irrelevant. If there is a "label" or "score" column, it is treated separately, and used as the labels during training.
    A DatasetDict can be used to train with multiple datasets at once, e.g.:
    DatasetDict({
        multi_nli: Dataset({
            features: ['premise', 'hypothesis', 'label'],
            num_rows: 392702
        })
        snli: Dataset({
            features: ['snli_premise', 'hypothesis', 'label'],
            num_rows: 549367
        })
        stsb: Dataset({
            features: ['sentence1', 'sentence2', 'label'],
            num_rows: 5749
        })
    })
    When a DatasetDict is used, the loss parameter to the SentenceTransformerTrainer must also be a dictionary with these dataset keys, e.g.:
    {
        'multi_nli': SoftmaxLoss(...),
        'snli': SoftmaxLoss(...),
        'stsb': CosineSimilarityLoss(...),
    }
  2. Loss Function
    A loss function, or a dictionary of loss functions like described above. These loss functions do not require changes compared to before this PR.
  3. Training Arguments
    A SentenceTransformerTrainingArguments instance, subclass of a TrainingArguments instance. This powerful class controls the specific details of the training.
  4. Evaluator
    An optional SentenceEvaluator instance. Unlike before, models can now be evaluated both on an evaluation dataset with some loss function and/or a SentenceEvaluator instance.
  5. Trainer
    The new SentenceTransformersTrainer instance based on the transformers Trainer. This instance is provided with a SentenceTransformer model, a SentenceTransformerTrainingArguments class, a SentenceEvaluator, a training and evaluation Dataset/DatasetDict and a loss function/dict of loss functions. Most of these parameters are optional. Once provided, all you have to do is call trainer.train().

Some of the major features that are now implemented include:

  • MultiGPU Training (Data Parallelism (DP) and Distributed Data Parallelism (DDP))
  • bf16 training support
  • Loss logging
  • Evaluation datasets + evaluation loss
  • Improved callback support (built-in via Weights and Biases, TensorBoard, CodeCarbon, etc., as well as custom callbacks)
  • Gradient checkpointing
  • Gradient accumulation
  • Improved model card generation
  • Warmup ratio
  • Pushing to the Hugging Face Hub on every model checkpoint
  • Resuming from a training checkpoint
  • Hyperparameter Optimization

This script is a minimal example (no evaluator, no training arguments) of training mpnet-base on a part of the all-nli dataset using MultipleNegativesRankingLoss:

from datasets import load_dataset
from sentence_transformers import SentenceTransformer, SentenceTransformerTrainer
from sentence_transformers.losses import MultipleNegativesRankingLoss

# 1. Load a model to finetune
model = SentenceTransformer("microsoft/mpnet-base")

# 2. Load a dataset to finetune on
dataset = load_dataset("sentence-transformers/all-nli", "triplet")
train_dataset = dataset["train"].select(range(10_000))
eval_dataset = dataset["dev"].select(range(1_000))

# 3. Define a loss function
loss = MultipleNegativesRankingLoss(model)

# 4. Create a trainer & train
trainer = SentenceTransformerTrainer(
    model=model,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    loss=loss,
)
trainer.train()

# 5. Save the trained model
model.save_pretrained("models/mpnet-base-all-nli")

Additionally, trained models now automatically produce extensive model cards. Each of the following models were trained using some script from the Training Examples, and the model cards were not edited manually whatsoever:

Prior to the Sentence Transformer v3 release, all models would be trained using the SentenceTransformer.fit method. Rather than deprecating this method, starting from v3.0, this method will use the SentenceTransformerTrainer behind the scenes. This means that your old training code should still work, and should even be upgraded with the new features such as multi-gpu training, loss logging, etc. That said, the new training approach is much more powerful, so it is recommended to write new training scripts using the new approach.

Many of the old training scripts were updated to use the new Trainer-based approach, but not all have been updated yet. We accept help via Pull Requests to assist in updating the scripts.

Similarity Score (#2615, #2490)

Sentence Transformers v3.0 introduces two new useful methods:

and one property:

These can be used to calculate the similarity between embeddings, and to specify which similarity function should be used, for example:

>>> from sentence_transformers import SentenceTransformer
>>> model = SentenceTransformer("all-mpnet-base-v2")
>>> sentences = [
...     "The weather is so nice!",
...     "It's so sunny outside.",
...     "He's driving to the movie theater.",
...     "She's going to the cinema.",
... ]
>>> embeddings = model.encode(sentences, normalize_embeddings=True)
>>> model.similarity(embeddings, embeddings)
tensor([[1.0000, 0.7235, 0.0290, 0.1309],
        [0.7235, 1.0000, 0.0613, 0.1129],
        [0.0290, 0.0613, 1.0000, 0.5027],
        [0.1309, 0.1129, 0.5027, 1.0000]])
>>> model.similarity_fn_name
"cosine"
>>> model.similarity_fn_name = "euclidean"
>>> model.similarity(embedd...
Read more

v2.7.0 - CachedGISTEmbedLoss, easy Matryoshka inference & evaluation, CrossEncoder, Intel Gaudi2

17 Apr 13:16
Compare
Choose a tag to compare

This release introduces a new promising loss function, easier inference for Matryoshka models, new functionality for CrossEncoders and Inference on Intel Gaudi2, along much more.

Install this version with

pip install sentence-transformers==2.7.0

New loss function: CachedGISTEmbedLoss (#2592)

For a number of years, MultipleNegativesRankingLoss (also known as SimCSE, InfoNCE, in-batch negatives loss) has been the state of the art in embedding model training. Notably, this loss function performs better with a larger batch size.

Recently, various improvements have been introduced:

  1. CachedMultipleNegativesRankingLoss was introduced, which allows you to pick much higher batch sizes (e.g. 65536) with constant memory.
  2. GISTEmbedLoss takes a guide model to guide the in-batch negative sample selection. This prevents false negatives, resulting in a stronger training signal.

Now, @JacksonCakes has combined these two approaches to produce the best of both worlds: CachedGISTEmbedLoss. This loss function allows for high batch sizes with constant memory usage, while also using a guide model to assist with the in-batch negative sample selection.

As can be seen in our Loss Overview, this model should be used with (anchor, positive) pairs or (anchor, positive, negative) triplets, much like MultipleNegativesRankingLoss, CachedMultipleNegativesRankingLoss, and GISTEmbedLoss. In short, any example using those loss functions can be updated to use CachedGISTEmbedLoss! Feel free to experiment, e.g. with this training script.

Automatic Matryoshka model truncation (#2573)

Sentence Transformers v2.4.0 introduced Matryoshka models: models whose embeddings are still useful after truncation. Since then, many useful Matryoshka models have been trained.

As of this release, the truncation for these Matryoshka embedding models can be done automatically via a new truncate_dim constructor argument:

from sentence_transformers import SentenceTransformer
from sentence_transformers.util import cos_sim

matryoshka_dim = 64
model = SentenceTransformer("nomic-ai/nomic-embed-text-v1.5", trust_remote_code=True, truncate_dim=matryoshka_dim)

embeddings = model.encode(
    [
        "search_query: What is TSNE?",
        "search_document: t-distributed stochastic neighbor embedding (t-SNE) is a statistical method for visualizing high-dimensional data by giving each datapoint a location in a two or three-dimensional map.",
        "search_document: Amelia Mary Earhart was an American aviation pioneer and writer.",
    ]
)
print(embeddings.shape)
# => [3, 64]

similarities = cos_sim(embeddings[0], embeddings[1:])
# => tensor([[0.7839, 0.4933]])

Extra information:

Model truncation in all evaluators (#2582)

Alongside easier inference with Matryoshka models, evaluating them is now also much easier. You can also pass truncate_dim to any Evaluator. This way you can easily check the performance of any Sentence Transformer model at various truncated dimensions (even if the model was not trained with MatryoshkaLoss!)

from sentence_transformers.evaluation import EmbeddingSimilarityEvaluator
from sentence_transformers import SentenceTransformer
import datasets

model = SentenceTransformer("tomaarsen/mpnet-base-nli-matryoshka")

stsb = datasets.load_dataset("mteb/stsbenchmark-sts", split="test")

for dim in [768, 512, 256, 128, 64, 32, 16, 8, 4]:
    evaluator = EmbeddingSimilarityEvaluator(
        stsb["sentence1"],
        stsb["sentence2"],
        [score / 5 for score in stsb["score"]],
        name=f"sts-test-{dim}",
        truncate_dim=dim,
    )
    print(f"dim={dim:<3}: {evaluator(model) * 100:.2f} Spearman Correlation")
dim=768: 86.81 Spearman Correlation
dim=512: 86.76 Spearman Correlation
dim=256: 86.66 Spearman Correlation
dim=128: 86.20 Spearman Correlation
dim=64 : 85.40 Spearman Correlation
dim=32 : 82.42 Spearman Correlation
dim=16 : 79.31 Spearman Correlation
dim=8  : 72.82 Spearman Correlation
dim=4  : 63.44 Spearman Correlation

Here are some example training scripts that use this new truncate_dim option to assist with training Matryoshka models:

CrossEncoder improvements

This release improves the support for CrossEncoder reranker models.

push_to_hub (#2524)

You can now push trained CrossEncoder models to the 🤗 Hugging Face Hub!

from sentence_transformers import CrossEncoder

...

model = CrossEncoder("distilroberta-base")

# Train the model
model.fit(
    train_dataloader=train_dataloader,
    evaluator=evaluator,
    epochs=num_epochs,
    warmup_steps=warmup_steps,
)

model.push_to_hub("tomaarsen/distilroberta-base-stsb-cross-encoder")

trust_remote_code for custom models (#2595)

You can now load custom models from the Hugging Face Hub, i.e. models that have custom modelling code that require trust_remote_code to load.

from sentence_transformers import CrossEncoder

# Note: this model does not require `trust_remote_code=True` - there are currently no models that require it yet.
model = CrossEncoder("cross-encoder/ms-marco-MiniLM-L-6-v2", trust_remote_code=True)

# We want to compute the similarity between the query sentence
query = "A man is eating pasta."

# With all sentences in the corpus
corpus = [
    "A man is eating food.",
    "A man is eating a piece of bread.",
    "The girl is carrying a baby.",
    "A man is riding a horse.",
    "A woman is playing violin.",
    "Two men pushed carts through the woods.",
    "A man is riding a white horse on an enclosed ground.",
    "A monkey is playing drums.",
    "A cheetah is running behind its prey.",
]

# We rank all sentences in the corpus for the query
ranks = model.rank(query, corpus)

# Print the scores
print("Query:", query)
for rank in ranks:
    print(f"{rank['score']:.2f}\t{corpus[rank['corpus_id']]}")

Inference on Intel Gaudi2 (#2557)

From this release onwards, you will be able to perform inference on Intel Gaudi2 accelerators. No modifications are needed, as the library will automatically detect the hpu device and configure the model accordingly. Thanks to Intel Habana for the support here.

All changes

  • [docs] Add simple Makefile for building docs by @tomaarsen in #2566
  • [examples] Add Matryoshka evaluation plot by @kddubey in #2564
  • Adding push_to_hub to CrossEncoder by @imvladikon in #2524
  • Fix semantic_search_usearch() for single query by @karmi in #2572
  • [requirements] Set minimum transformers version to 4.34.0 for is_nltk_available by @tomaarsen in #2574
  • [docs] Update link: retrieve_rerank_simple_wikipedia.py -> .ipynb by @tomaarsen in #2580
  • Document dev reqs, add ruff pre-commit by @kddubey in #2576
  • Enable Sentence Transformer Inference with Intel Gaudi2 GPU Supported ( 'hpu' ) by @ZhengHongming888 in #2557
  • [feat] Add truncation support by @kddubey in #2573
  • [examples] Add model upload for training_nli_v3 with GISTEmbedLoss by @tomaarsen in #2584
  • Add truncation support in evaluators by @kddubey in #2582
  • Add ST annotation to evaluators by @kddubey in #2586
  • [fix] Matryoshka training always patch original forward, and check matryoshka_dims by @kddubey in #2593
  • corrected comment from kmeans to...
Read more

v2.6.1 - Fix Quantized Semantic Search rescoring

26 Mar 09:01
Compare
Choose a tag to compare

This is a patch release to fix a bug in semantic_search_faiss and semantic_search_usearch that caused the scores to not correspond to the returned corpus indices. Additionally, you can now evaluate embedding models after quantizing their embeddings.

Precision support in EmbeddingSimilarityEvaluator

You can now pass precision to the EmbeddingSimilarityEvaluator to evaluate the performance after quantization:

from sentence_transformers import SentenceTransformer
from sentence_transformers.evaluation import EmbeddingSimilarityEvaluator, SimilarityFunction
import datasets

model = SentenceTransformer("all-mpnet-base-v2")

stsb = datasets.load_dataset("mteb/stsbenchmark-sts", split="test")

print("Spearman correlation based on Cosine Similarity on the STS Benchmark test set:")
for precision in ["float32", "uint8", "int8", "ubinary", "binary"]:
    evaluator = EmbeddingSimilarityEvaluator(
        stsb["sentence1"],
        stsb["sentence2"],
        [score / 5 for score in stsb["score"]],
        main_similarity=SimilarityFunction.COSINE,
        name="sts-test",
        precision=precision,
    )
    print(precision, evaluator(model))
Spearman correlation based on Cosine Similarity on the STS Benchmark test set:
float32 0.8342190421330611
uint8 0.8260094846238505
int8 0.8312754408857808
ubinary 0.8244338431442343
binary 0.8244338431442343

All changes

  • Add 'precision' support to the EmbeddingSimilarityEvaluator by @tomaarsen in #2559
  • [hotfix] Quantization patch; fix semantic_search_faiss/semantic_search_usearch rescoring by @tomaarsen in #2558
  • Fix a typo in a docstring in CosineSimilarityLoss.py by @bryant1410 in #2553

Full Changelog: v2.6.0...v2.6.1

v2.6.0 - Embedding Quantization, GISTEmbedLoss

22 Mar 15:29
Compare
Choose a tag to compare

This release brings embedding quantization: a way to heavily speed up retrieval & other tasks, and a new powerful loss function: GISTEmbedLoss.

Install this version with

pip install sentence-transformers==2.6.0

Embedding Quantization

Embeddings may be challenging to scale up, which leads to expensive solutions and high latencies. However, there is a new approach to counter this problem; it entails reducing the size of each of the individual values in the embedding: Quantization. Experiments on quantization have shown that we can maintain a large amount of performance while significantly speeding up computation and saving on memory, storage, and costs.

To be specific, using binary quantization may result in retaining 96% of the retrieval performance, while speeding up retrieval by 25x and saving on memory & disk space with 32x. Do not underestimate this approach! Read more about Embedding Quantization in our extensive blogpost.

Binary and Scalar Quantization

Two forms of quantization exist at this time: binary and scalar (int8). These quantize embedding values from float32 into binary and int8, respectively. For Binary quantization, you can use the following snippet:

from sentence_transformers import SentenceTransformer
from sentence_transformers.quantization import quantize_embeddings

# 1. Load an embedding model
model = SentenceTransformer("mixedbread-ai/mxbai-embed-large-v1")

# 2a. Encode some text using "binary" quantization
binary_embeddings = model.encode(
    ["I am driving to the lake.", "It is a beautiful day."],
    precision="binary",
)

# 2b. or, encode some text without quantization & apply quantization afterwards
embeddings = model.encode(["I am driving to the lake.", "It is a beautiful day."])
binary_embeddings = quantize_embeddings(embeddings, precision="binary")

References:

GISTEmbedLoss

GISTEmbedLoss, as introduced in Solatorio (2024), is a guided variant of the more standard in-batch negatives (MultipleNegativesRankingLoss) loss. Both loss functions are provided with a list of (anchor, positive) pairs, but while MultipleNegativesRankingLoss uses anchor_i and positive_i as positive pair and all positive_j with i != j as negative pairs, GISTEmbedLoss uses a second model to guide the in-batch negative sample selection.

This can be very useful, because it is plausible that anchor_i and positive_j are actually quite semantically similar. In this case, GISTEmbedLoss would not consider them a negative pair, while MultipleNegativesRankingLoss would. When finetuning MPNet-base on the AllNLI dataset, these are the Spearman correlation based on cosine similarity using the STS Benchmark dev set (higher is better):

312039399-ef5d4042-a739-41f6-a6ca-eddc7f901411
The blue line is MultipleNegativesRankingLoss, whereas the grey line is GISTEmbedLoss with the small all-MiniLM-L6-v2 as the guide model. Note that all-MiniLM-L6-v2 by itself does not reach 88 Spearman correlation on this dataset, so this is really the effect of two models (mpnet-base and all-MiniLM-L6-v2) reaching a performance that they could not reach separately.

Soft save_to_hub Deprecation

Most codebases that allow for pushing models to the Hugging Face Hub adopt a push_to_hub method instead of a save_to_hub method, and now Sentence Transformers will follow that convention. The push_to_hub method will now be the recommended approach, although save_to_hub will continue to exist for the time being: it will simply call push_to_hub internally.

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("all-mpnet-base-v2")

...

# Train the model
model.fit(
    train_objectives=[(train_dataloader, train_loss)],
    evaluator=dev_evaluator,
    epochs=num_epochs,
    evaluation_steps=1000,
    warmup_steps=warmup_steps,
)

# Push the model to Hugging Face
model.push_to_hub("tomaarsen/mpnet-base-nli-stsb")

All changes

  • Add GISTEmbedLoss by @avsolatorio in #2535
  • [feat] Add 'get_config_dict' method to GISTEmbedLoss for better model cards by @tomaarsen in #2543
  • Enable saving modules as pytorch_model.bin by @CKeibel in #2542
  • [deprecation] Deprecate save_to_hub in favor of push_to_hub; add safe_serialization support to push_to_hub by @tomaarsen in #2544
  • Fix SentenceTransformer encode documentation return type default (numpy vectors) by @CKeibel in #2546
  • [docs] Update return docstring of encode_multi_process by @tomaarsen in #2548
  • [feat] Add binary & scalar embedding quantization support to Sentence Transformers by @tomaarsen in #2549

New Contributors

Full Changelog: v2.5.1...v2.6.0