Merge v3.0 pre-release into master, prepare for full v3.0 release (#2683

) * [`v3`] Training refactor - MultiGPU, loss logging, bf16, etc. (#2449) * See #1638: Adds huggingface trainer for sentence transformers * Fix type of tokenizer * Get the trainer using the feature collation * Update the docstring to reflect changes * Initial draft for refactoring training usig the Transformers Trainer * Separate 'fit' functionality (new and old) into a mixin * Resolve test issues * Reformat * Update the imports * Add TODO regarding custom label columns * Remove dead code * Don't provide the trainer to the eval sampler * Introduce datasets as a dependency * Introduce "accelerate" as a dependency * Avoid use_amp on CPU tests * Specify that SentenceTransformer is a class, not a module * Avoid circular import * Remove | used as an "or" operator in typing * Use test evaluator after training, as intended * Use tokenize function instead of tokenizer; Add EvaluatorCallback which calls the evaluator on every epoch (for BC); Stop saving "do_lower_case" from Transformer; * Reformat * Revert Transformer tokenizer changes * Add support for the tokenizer to return more than just input_ids & attention_masks Required for LSTM * Use the test evaluators after training the examples * Use pure torch for BoW tokenization * Use dev evaluator for BiLSTM - test fails * Add Trainer support for BoW-based models * Pass epoch to evaluator in every-epoch callback For fit backwards compatibility * Run formatting * Use steps_per_epoch to set max_steps if possible * Ignore extracting dataloader arguments for now * Remove dead code * Allow both "label" and "score" columns for labels * Reformatting * Improve errors if datasets don't match with loss dictionary well * Made tests more consistent; list instead of set * Simplify trainer with DatasetDict * Implement a proportional sampler in addition to round robin * Add CLIP finetuning support to the Trainer * Start updating evaluators to return dictionaries * Reformat * Hackishly insert the DataParallel model into the loss function * Allow for fsdp=["full_shard", "auto_wrap"] with fsdp_config={"transformer_layer_cls_to_wrap": "BertLayer"} * Re-add support for DataParallel * Use 'ParallelMode.NOT_PARALLEL' * Prevent crash with DDP & an evaluation set * When training with multiple datasets, add "dataset_name" column Rather than relying on some Batch Sampler hacking (which fails with some distributed training approaches) * Update type hints: make loss & evaluator optional Co-authored-by: Wang Bo <[email protected]> * Set correct superclasses for samplers * Override 'accelerator.even_batches' as it's incompatible with multi-dataset * Throw exception if "return_loss" or "dataset_name" columns are used * Set min. version for accelerate * Heavily extend model card generation * Remove some dead code * Fix evaluator type hints * Ensure that 'model_card_template.md' is included in the built package * Rephrase comments slightly * Heavily refactor samplers; add no duplicates/group by label samplers * Ensure that data_loader.dataset exists in FitMixin * Adopt 8 as the default batch * Fix logging error in example * Remove the deprecated correct_bias * Simplify with walrus operator * Fix some bugs in set_widget_examples with short datasets * Improve docstring slightly * Add edge case in case training data has an unrecognized format * Fix extracting dataset metadata * Remove moot TYPE_CHECKING * Set base model when loading a ST model also * Add test_dataloader, add prefetch_factor to dataloaders * Resolve predict_example fix; fix newlines in text * Fix bug in compute_dataset_metrics examples * Add call to action in ValueError * Reuse original model card if no training is done * Also collect nested losses (e.g. MatryoshkaLoss) and make losses in tags * Remove generated tag; keep loss: prefix on tags * Remove unused arguments * Add support for "best model step" in model card * Make hyperparameters code-formatted * Fix load_best_model for Transformers models, prevent for non-Transformers * Store base_model_revision in model_card_data * Prevent crash when loading a local model * Allow for bfloat16 inference --------- Co-authored-by: Matthew Franglen <[email protected]> Co-authored-by: Wang Bo <[email protected]> * [`v3`] Add `similarity` and `similarity_pairwise` methods to Sentence Transformers (#2615) * Add similarity function to model configuration * Add more tests * Replace util.cos_sim with model.similarity in some examples * Reintroduce evaluation.SimilarityFunction * Remove last references of score function in ST class * Add similarity_fn_name to model card * Add save_pretrained alias for save * Introduce DOT alias for DOT_PRODUCT * [`v3`] Fix various model card errors (#2616) * Prevent model card save failure * Print exceptions in more detail when they occur * Fix edge case if dataset language is None * [`v3`] Fix trainer `compute_loss` when evaluating/predicting if the `loss` updated the inputs in-place (#2617) * Recompute the features if return_output * Add SimilarityFunction to __init__, increment dev version * Never return None in infer_datasets (#2620) * Implement resume_from_checkpoint (#2621) * [`v3`] Update example scripts to the new v3 training format (#2622) * Update example scripts to the new v3 training format * Add distillation training examples * Add Matryoshka training examples * Add NLI training examples * Add STS training scripts * Fix accidentally overriding eval set * Update paraphrases multi-dataset training script * Convert regular dicts to DatasetDict on Trainer init * Update Quora duplicate training scripts * Update "other" training scripts * Update multilingual conversion script * Add example scripts to Evaluators * Add example to ST class itself * Update docs formatting slightly * Fix model card snippet * Add short docstring for similarity_fn_name property * Remove "return_outputs" as it's not strictly necessary. Avoids OOM & speeds up training (#2633) * Fix crash from inferring the dataset_id from a local dataset (#2636) See #2635 * Fix multilingual conversion script; extend MSELoss to multi-column (#2641) And remove the now-unnecessary make_multilingual_sys.py * Update evaluation scripts to use HF Datasets (#2642) * Increment the version in setup.py (as well) * Fix resume_from_checkpoint by also updating the loss (#2648) I'm not very sure if updating the potential wrapped model like this will also work; it seems a bit risky, but it's equally risky to not do it. * Fix an issue with in-place variable overriding preventing backwards passes on MSELoss (#2647) Only when there's multiple columns * Simplify load_from_checkpoint using load_state_dict (#2650) Overriding the model has several downsides, e.g. regarding the model card generation * Don't override the labels variable to avoid inplace operation (#2651) * Resolve "one of the variables needed for gradient computation has been modified by an inplace operation." (#2654) * [`v3`] Add hyperparameter optimization support by letting `loss` be a Callable that accepts a `model` (#2655) * Add HPO support by letting the 'loss' be a function * Only add "dataset_name" column if required by the loss function * Add tag hinting at the number of training samples (#2660) * [`v3`] For the Cached losses; ignore gradients if grad is disabled (e.g. eval) (#2668) * For the Cached losses; ignore gradients if grad is disabled (e.g. eval) * Warn that Matryoshka/AdaptiveLayer losses are not compatible with Cached * [`docs`] Rewrite the https://sbert.net documentation for v3.0 (#2632) * Start restructuring/rewriting the docs * Update Pretrained Models section for ST * Update & add many docstrings * Completely overhaul "Training Overview" docs page for ST * Update dataset overview * Remove kwargs from paraphrase_mining signature * Add "aka sbert" * Remove Hugging Face docs page * Update ST Usages * Fix some links * Use the training examples corresponding to that model type * Add hyperparameter optimization example script + docs * Add distributed training docs * Complete rewrite for the Sentence Transformer docs portion * Update the CE part of the docs * Specify if __name__ == "__main__" & dataloader_drop_last with DDP * Update the entire project to Google-style docstring * Remove contact page * Update README with updated links, etc. * Update the loss examples * Fix formatting * Add remove_columns/select_columns tip to dataset overview * [`v3`] Chore - include import sorting in ruff (#2672) * Include import sorting in ruff * Remove deprecated ignore-init-module-imports * Remove --select I from ruff.toml again after CI issues * [`v3`] Prevent warning with 'model.fit' with transformers >= 4.41.0 due to evaluation_strategy (#2673) * Prevent warning with 'model.fit' with transformers >= 4.41.0 due to evaluation_strategy * Reformat * [`v3`] Add various useful Sphinx packages (copy code, link to code, nicer tabs) (#2674) * No longer hide toctrees in API Reference * Add linkcode support It's not perfect, as it'll always link to 'master', but it'll do pretty nicely for the most part. * Add copy button to all code blocks * Add nicer tabs * Reformatted * [`v3`] Make the "primary_metric" for evaluators a bit more robust (#2675) * Make the "primary_metric" for evaluators a bit more robust * Also remove some other TODOs that are not very important or already done * Set `broadcast_buffers = False` when training with DDP (#2663) * [`v3`] Warn about using DP instead of DDP + set dataloader_drop_last with DDP (#2677) * Warn about using DP instead of DDP + set dataloader_drop_last with DDP * Prevent duplicate warnings * Remove note, done automatically now * Avoid inequality comparison to True * [`v3`] Add warning that Evaluators only run on 1 GPU when multi-GPU training (#2678) * Add warning that Evaluators only run on 1 GPU when multi-GPU training * Also add a note in the distributed training docs * [`v3`] Move training dependencies into a "train" extra (#2676) * Move training dependencies into a "train" extra * Install the train extra with the CI tests * Simplify dev install: also include train deps there * Implement is_..._available in ST instead; add is_training_available * Update references to the API ref (#2679) * [`v3`] Add "dataset_size:" to the tag denoting the number of training samples (#2680) * Prepend "dataset_size:" instead. I can always change the look of this later On the HF side * Fix formatting of Python modules * Docs: pairwise_cosine_similarity -> pairwise_similarity * Link to the yet-to-be-released release notes instead * Update phrasing on local_files_only docstring * Link directly to the 2DMSE preprint * Add missing subset in quora-duplicates * Add missing docstrings arguments for Cached... losses * Update training overview docs based on the blogpost reviews --------- Co-authored-by: Matthew Franglen <[email protected]> Co-authored-by: Wang Bo <[email protected]>
UKPLab · May 28, 2024 · 3eeb3c0 · 3eeb3c0
1 parent 684b6b5
commit 3eeb3c0
Show file tree

Hide file tree

Showing 292 changed files with 13,732 additions and 7,262 deletions.
diff --git a/.github/workflows/tests.yml b/.github/workflows/tests.yml
@@ -45,7 +45,7 @@ jobs:
         if: steps.restore-cache.outputs.cache-hit != 'true'
 
       - name: Install the checked-out sentence-transformers
-        run: python -m pip install .
+        run: python -m pip install .[train]
 
       - name: Run unit tests
         shell: bash

diff --git a/.gitignore b/.gitignore
@@ -19,7 +19,9 @@ nr_*/
 /docs/make.bat
 /examples/training/quora_duplicate_questions/quora-IR-dataset/
 build
-
 htmlcov
 .coverage
-.venv
+wandb
+checkpoints
+tmp
+.venv
diff --git a/MANIFEST.in b/MANIFEST.in
@@ -0,0 +1 @@
+include sentence_transformers/model_card_template.md
diff --git a/README.md b/README.md
@@ -1,14 +1,10 @@
 <!--- BADGES: START --->
+[![HF Models](https://img.shields.io/badge/%F0%9F%A4%97-models-yellow)](https://huggingface.co/models?library=sentence-transformers)
 [![GitHub - License](https://img.shields.io/github/license/UKPLab/sentence-transformers?logo=github&style=flat&color=green)][#github-license]
 [![PyPI - Python Version](https://img.shields.io/pypi/pyversions/sentence-transformers?logo=pypi&style=flat&color=blue)][#pypi-package]
 [![PyPI - Package Version](https://img.shields.io/pypi/v/sentence-transformers?logo=pypi&style=flat&color=orange)][#pypi-package]
-[![Conda - Platform](https://img.shields.io/conda/pn/conda-forge/sentence-transformers?logo=anaconda&style=flat)][#conda-forge-package]
-[![Conda (channel only)](https://img.shields.io/conda/vn/conda-forge/sentence-transformers?logo=anaconda&style=flat&color=orange)][#conda-forge-package]
 [![Docs - GitHub.io](https://img.shields.io/static/v1?logo=github&style=flat&color=pink&label=docs&message=sentence-transformers)][#docs-package]
-<!--- 
-[![PyPI - Downloads](https://img.shields.io/pypi/dm/sentence-transformers?logo=pypi&style=flat&color=green)][#pypi-package]
-[![Conda](https://img.shields.io/conda/dn/conda-forge/sentence-transformers?logo=anaconda)][#conda-forge-package] 
---->
+<!-- [![PyPI - Downloads](https://img.shields.io/pypi/dm/sentence-transformers?logo=pypi&style=flat&color=green)][#pypi-package] -->
 
 [#github-license]: https://github.com/UKPLab/sentence-transformers/blob/master/LICENSE
 [#pypi-package]: https://pypi.org/project/sentence-transformers/
@@ -20,38 +16,24 @@
 
 This framework provides an easy method to compute dense vector representations for **sentences**, **paragraphs**, and **images**. The models are based on transformer networks like BERT / RoBERTa / XLM-RoBERTa etc. and achieve state-of-the-art performance in various tasks. Text is embedded in vector space such that similar text are closer and can efficiently be found using cosine similarity.
 
-We provide an increasing number of **[state-of-the-art pretrained models](https://www.sbert.net/docs/pretrained_models.html)** for more than 100 languages, fine-tuned for various use-cases.
+We provide an increasing number of **[state-of-the-art pretrained models](https://www.sbert.net/docs/sentence_transformer/pretrained_models.html)** for more than 100 languages, fine-tuned for various use-cases.
 
-Further, this framework allows an easy  **[fine-tuning of custom embeddings models](https://www.sbert.net/docs/training/overview.html)**, to achieve maximal performance on your specific task.
+Further, this framework allows an easy  **[fine-tuning of custom embeddings models](https://www.sbert.net/docs/sentence_transformer/training_overview.html)**, to achieve maximal performance on your specific task.
 
 For the **full documentation**, see **[www.SBERT.net](https://www.sbert.net)**.
 
-The following publications are integrated in this framework:
-
-- [Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks](https://arxiv.org/abs/1908.10084) (EMNLP 2019)
-- [Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation](https://arxiv.org/abs/2004.09813) (EMNLP 2020)
-- [Augmented SBERT: Data Augmentation Method for Improving Bi-Encoders for Pairwise Sentence Scoring Tasks](https://arxiv.org/abs/2010.08240) (NAACL 2021)
-- [The Curse of Dense Low-Dimensional Information Retrieval for Large Index Sizes](https://arxiv.org/abs/2012.14210) (arXiv 2020)
-- [TSDAE: Using Transformer-based Sequential Denoising Auto-Encoder for Unsupervised Sentence Embedding Learning](https://arxiv.org/abs/2104.06979) (arXiv 2021)
-- [BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models](https://arxiv.org/abs/2104.08663) (arXiv 2021)
-- [Matryoshka Representation Learning](https://arxiv.org/abs/2205.13147) (arXiv 2022)
-
 ## Installation
 
-We recommend **Python 3.8** or higher, **[PyTorch 1.11.0](https://pytorch.org/get-started/locally/)** or higher and **[transformers v4.32.0](https://github.com/huggingface/transformers)** or higher. The code does **not** work with Python 2.7.
+We recommend **Python 3.8+**, **[PyTorch 1.11.0+](https://pytorch.org/get-started/locally/)**, and **[transformers v4.34.0+](https://github.com/huggingface/transformers)**.
 
 **Install with pip**
 
-Install the *sentence-transformers* with `pip`:
-
 ```
 pip install -U sentence-transformers
 ```
 
 **Install with conda**
 
-You can install the *sentence-transformers* with `conda`:
-
 ```
 conda install -c conda-forge sentence-transformers
 ```
@@ -73,8 +55,6 @@ If you want to use a GPU / CUDA, you must install PyTorch with the matching CUDA
 
 See [Quickstart](https://www.sbert.net/docs/quickstart.html) in our documenation.
 
-[This example](https://github.com/UKPLab/sentence-transformers/tree/master/examples/applications/computing-embeddings/computing_embeddings.py) shows you how to use an already trained Sentence Transformer model to embed sentences for another task.
-
 First download a pretrained model.
 
 ````python
@@ -87,58 +67,52 @@ Then provide some sentences to the model.
 
 ````python
 sentences = [
-    "This framework generates embeddings for each input sentence",
-    "Sentences are passed as a list of string.",
-    "The quick brown fox jumps over the lazy dog.",
+    "The weather is lovely today.",
+    "It's so sunny outside!",
+    "He drove to the stadium.",
 ]
-sentence_embeddings = model.encode(sentences)
+embeddings = model.encode(sentences)
+print(embeddings.shape)
+# => (3, 384)
 ````
 
-And that's it already. We now have a list of numpy arrays with the embeddings.
+And that's already it. We now have a numpy arrays with the embeddings, one for each text. We can use these to compute similarities.
 
 ````python
-for sentence, embedding in zip(sentences, sentence_embeddings):
-    print("Sentence:", sentence)
-    print("Embedding:", embedding)
-    print("")
+similarities = model.similarity(embeddings, embeddings)
+print(similarities)
+# tensor([[1.0000, 0.6660, 0.1046],
+#         [0.6660, 1.0000, 0.1411],
+#         [0.1046, 0.1411, 1.0000]])
 ````
 
 ## Pre-Trained Models
 
-We provide a large list of [Pretrained Models](https://www.sbert.net/docs/pretrained_models.html) for more than 100 languages. Some models are general purpose models, while others produce embeddings for specific use cases. Pre-trained models can be loaded by just passing the model name: `SentenceTransformer('model_name')`.
-
-[»  Full list of pretrained models](https://www.sbert.net/docs/pretrained_models.html)
+We provide a large list of [Pretrained Models](https://www.sbert.net/docs/sentence_transformer/pretrained_models.html) for more than 100 languages. Some models are general purpose models, while others produce embeddings for specific use cases. Pre-trained models can be loaded by just passing the model name: `SentenceTransformer('model_name')`.
 
 ## Training
 
 This framework allows you to fine-tune your own sentence embedding methods, so that you get task-specific sentence embeddings. You have various options to choose from in order to get perfect sentence embeddings for your specific task. 
 
-See [Training Overview](https://www.sbert.net/docs/training/overview.html) for an introduction how to train your own embedding models. We provide [various examples](https://github.com/UKPLab/sentence-transformers/tree/master/examples/training) how to train models on various datasets.
+See [Training Overview](https://www.sbert.net/docs/sentence_transformer/training_overview.html) for an introduction how to train your own embedding models. We provide [various examples](https://github.com/UKPLab/sentence-transformers/tree/master/examples/training) how to train models on various datasets.
 
 Some highlights are:
 - Support of various transformer networks including BERT, RoBERTa, XLM-R, DistilBERT, Electra, BART, ...
 - Multi-Lingual and multi-task learning
 - Evaluation during training to find optimal model
-- [20+ loss-functions](https://www.sbert.net/docs/package_reference/losses.html) allowing to tune models specifically for semantic search, paraphrase mining, semantic similarity comparison, clustering, triplet loss, contrastive loss.
-
-## Performance
-
-Our models are evaluated extensively on 15+ datasets including challening domains like Tweets, Reddit, emails. They achieve by far the **best performance** from all available sentence embedding methods. Further, we provide several **smaller models** that are **optimized for speed**.
-
-[» Full list of pretrained models](https://www.sbert.net/docs/pretrained_models.html)
+- [20+ loss-functions](https://www.sbert.net/docs/package_reference/sentence_transformer/losses.html) allowing to tune models specifically for semantic search, paraphrase mining, semantic similarity comparison, clustering, triplet loss, contrastive loss, etc.
 
 ## Application Examples
 
 You can use this framework for:
 
 - [Computing Sentence Embeddings](https://www.sbert.net/examples/applications/computing-embeddings/README.html)
 - [Semantic Textual Similarity](https://www.sbert.net/docs/usage/semantic_textual_similarity.html)
+- [Semantic Search](https://www.sbert.net/examples/applications/semantic-search/README.html)
+- [Retrieve & Re-Rank](https://www.sbert.net/examples/applications/retrieve_rerank/README.html) 
 - [Clustering](https://www.sbert.net/examples/applications/clustering/README.html)
 - [Paraphrase Mining](https://www.sbert.net/examples/applications/paraphrase-mining/README.html)
- - [Translated Sentence Mining](https://www.sbert.net/examples/applications/parallel-sentence-mining/README.html)
- - [Semantic Search](https://www.sbert.net/examples/applications/semantic-search/README.html)
- - [Retrieve & Re-Rank](https://www.sbert.net/examples/applications/retrieve_rerank/README.html) 
- - [Text Summarization](https://www.sbert.net/examples/applications/text-summarization/README.html) 
+- [Translated Sentence Mining](https://www.sbert.net/examples/applications/parallel-sentence-mining/README.html)
 - [Multilingual Image Search, Clustering & Duplicate Detection](https://www.sbert.net/examples/applications/image-search/README.html)
 
 and many more use-cases.
@@ -193,7 +167,7 @@ If you use one of the multilingual models, feel free to cite our publication [Ma
 
 Please have a look at [Publications](https://www.sbert.net/docs/publications.html) for our different publications that are integrated into SentenceTransformers.
 
-Contact person: Tom Aarsen, [[email protected]](mailto:[email protected])
+Maintainer: [Tom Aarsen](https://github.com/tomaarsen), 🤗 Hugging Face
 
 https://www.ukp.tu-darmstadt.de/
 

diff --git a/docs/Makefile b/docs/Makefile
@@ -1,3 +1,6 @@
 
 docs:
-	sphinx-build -c . -a -E .. _build
+	sphinx-build -c . -a -E .. _build
+
+docs-quick:
+	sphinx-build -c . .. _build
diff --git a/docs/_static/css/custom.css b/docs/_static/css/custom.css
@@ -24,4 +24,88 @@ dl.class > dt {
 
 .wy-side-nav-search {
     padding-top: 0px;
-}
+}
+
+.components {
+    display: flex;
+    flex-flow: row wrap;
+}
+
+.components > .box {
+    flex: 1;
+    margin: 0.5rem;
+    padding: 1rem;
+    border-style: solid;
+    border-width: 1px;
+    border-radius: 0.5rem;
+    border-color: rgb(55 65 81);
+    background-color: #e3e3e3;
+    color: #404040; /* Override the colors imposed by <a href> */
+}
+
+.components > .box:nth-child(1) > .header {
+    background-image: linear-gradient(to bottom right, #60a5fa, #3b82f6);
+}
+
+.components > .box:nth-child(2) > .header {
+    background-image: linear-gradient(to bottom right, #fb923c, #f97316);
+}
+
+.components > .box:nth-child(3) > .header {
+    background-image: linear-gradient(to bottom right, #f472b6, #ec4899);
+}
+
+.components > .box:nth-child(4) > .header {
+    background-image: linear-gradient(to bottom right, #a78bfa, #8b5cf6);
+}
+
+.components > .box:nth-child(5) > .header {
+    background-image: linear-gradient(to bottom right, #34d399, #10b981);
+}
+
+.components > .optional {
+    background: repeating-linear-gradient(
+        135deg,
+        #f1f1f1,
+        #f1f1f1 25px,
+        #e3e3e3 25px,
+        #e3e3e3 50px
+    );
+}
+
+.components > .box > .header {
+    border-style: solid;
+    border-width: 1px;
+    border-radius: 0.5rem;
+    border-color: rgb(55 65 81);
+    padding: 0.5rem;
+    text-align: center;
+    margin-bottom: 0.5rem;
+    font-weight: bold;
+    color: white;
+}
+
+.sidebar p {
+    font-size: 100% !important;
+}
+
+.training-arguments {
+    background-color: #f3f6f6;
+    border: 1px solid #e1e4e5;
+}
+
+.training-arguments > .header {
+    font-weight: 700;
+    padding: 6px 12px;
+    background: #e1e4e5;
+}
+
+.training-arguments > .table {
+    display: grid;
+    grid-template-columns: repeat(auto-fill, minmax(15em, 1fr));
+}
+
+.training-arguments > .table > a {
+    padding: 0.5rem;
+    border: 1px solid #e1e4e5;
+}
diff --git a/docs/_themes/sphinx_rtd_theme/__init__.py b/docs/_themes/sphinx_rtd_theme/__init__.py
@@ -8,7 +8,6 @@
 
 import sphinx
 
-
 __version__ = "0.5.0"
 __version_full__ = __version__
 

diff --git a/docs/_themes/sphinx_rtd_theme/footer.html b/docs/_themes/sphinx_rtd_theme/footer.html
@@ -24,9 +24,6 @@
         &copy; {% trans %}Copyright{% endtrans %} {{ copyright }}
       {%- endif %}
     {%- endif %}
-
-       &bull; <a href="/docs/contact.html">Contact</a>
-
     {%- if build_id and build_url %}
       <span class="build">
         {# Translators: Build is a noun, not a verb #}

diff --git a/docs/_themes/sphinx_rtd_theme/layout.html b/docs/_themes/sphinx_rtd_theme/layout.html
@@ -121,8 +121,12 @@
             </a>
 
             <div style="display: flex; justify-content: center;">
-              <div id="twitter-button">
+              <!-- This snippet adds a "Follow SBERT on Twitter" button. I'll remove it as Nils doesn't post about SBERT anmymore -->
+              <!-- <div id="twitter-button">
                 <a href="https://twitter.com/Nils_Reimers" target="_blank" title="Follow SBERT on Twitter"><img src="/_static/Twitter_Logo_White.svg" height="20" style="margin: 0px 10px 0px -10px;"> </a>
+              </div> -->
+              <div id="hf-button">
+                <a href="https://huggingface.co/models?library=sentence-transformers" target="_blank" title="See all Sentence Transformer models"><img src="{{ pathto('_static/img/hf-logo.svg', 1) }}" style="margin: 0px 10px 0px -10px; padding: 0px; height: 28px; width: 28px;"></a>
               </div>
               <div id="github-button"></div>
             </div>

diff --git a/docs/_themes/sphinx_rtd_theme/theme.conf b/docs/_themes/sphinx_rtd_theme/theme.conf
@@ -8,7 +8,7 @@ canonical_url =
 analytics_id =
 collapse_navigation = True
 sticky_navigation = True
-navigation_depth = 4
+navigation_depth =
 includehidden = True
 titles_only =
 logo_only =