High-quality sentence embeddings are fundamental in many natural language processing (NLP) tasks, such as semantic textual similarity (STS) and retrieval-augmented generation (RAG).
Nevertheless, most existing methods leverage fixed-length embeddings from full-layer language models, which lack the scalability to accommodate the diverse available resources across various applications.
Viewing this gap, we propose a novel sentence embedding model
To enable espresso sentence embeddings (ESE), please specify --apply_ese 1
and configure appropriate ESE hyperparameters via --ese_kl_temperature float
and --ese_compression_size integer
.
Here is an training example:
WANDB_MODE=disabled CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 --master_port=1234 -m angle_emb.angle_trainer \
--model_name_or_path WhereIsAI/UAE-Large-V1 \
--train_name_or_path SeanLee97/nli_for_simcse --save_dir ckpts/UAE-Large-Espresso \
--ibn_w 10.0 --cosine_w 0. --angle_w 1.0 --angle_tau 20.0 --learning_rate 1e-6 --maxlen 75 \
--workers 16 \
--pooling_strategy cls \
--epochs 1 \
--batch_size 128 \
--logging_steps 100 \
--warmup_steps 200 \
--save_steps 1000 \
--fp16 1 \
--gradient_accumulation_steps 4 \
--apply_ese 1 \
--ese_compression_size 128 \
--ese_kl_temperature 1.0
@article{li2024ese,
title={ESE: Espresso Sentence Embeddings},
author={Li, Xianming and Li, Zongxi and Li, Jing and Xie, Haoran and Li, Qing},
journal={arXiv preprint arXiv:2402.14776},
year={2024}
}