In this project, we provide an easy-to-use toolkit for both word and sentence embedding evaluations.
For more details: ACL 2022: Just Rank: Rethinking Evaluation with Word and Sentence Similarities
- Mar. 22, 2022
- More example scripts for how to test on each supported model.
- Mar.21, 2022
- You can now follow the template to test on your own embedding model.
- Support a series of sentence embedding models including InferSent, SimCSE, Sentence_BERT, BERT-Whitening, BERT-Flow, etc..
- Sentence Embedding Evaluation part is updated.
- Mar.20, 2022
- Word Embedding Evaluation part is updated.
Section | Description |
---|---|
Evluation Tasks | Evluation Tasks |
Environment Setup | Environments |
Models and Quick Start | Models and Quick Start |
Benchmarking - Word | Leaderboard |
Benchmarking - Sentence | Leaderboard |
References | References |
Acknowledge | Acknowledge |
The following are the supported evaluation tasks:
- Word Embedding Evaluation
- Sentence Embedding Evaluation
Tested with the following dependencies:
- python==3.8.12
- pytorch==1.11.0
- transformers==4.11.3
- scikit-learn==0.23.2
Please look into the details of the following script file for how to set up the environment.
bash environment.sh
We have supoorted a list of word & sentence embedding models for quick evaluation and benchmarking.
-
Word Embedding Models
- Any word embedding files follow this format.
- Integrate one post-processing method.
-
Word-level EvalRank and Similarity
- To test on your own model, simply change the word embedding path.
bash word_evaluate.sh # To evaluate on your own word embedding model update file: word_evaluate.sh WORD_EMB_PATH='PATH/TO/WORD/EMBEDDING'
-
Sentence Embedding Models
- Bag-of-word (averaging word embedding)
- Bag-of-word with post-processing
- InferSent
- BERT
- BERT-Whitening
- BERT-Flow
- Sentence-BERT
- SimCSE
-
Sentence-level EvalRank and Similarity
- You can also easily test your own sentence embedding model using our provided template.
bash sentence_evaluate.sh # To evaluate on your own sentence embedding model modify the following to files update file: sentence_evaluate.sh SENT_EMB_MODEL='customize' update file: ./src/models/sent_emb/_customize.py
For better classification performance, edit the following part (in file src/s_evaluation.py):
params_senteval = {'task_path': './data/', 'usepytorch': True, 'kfold': 5} params_senteval['classifier'] = {'nhid': 0, 'optim': 'rmsprop', 'batch_size': 128, 'tenacity': 3, 'epoch_size': 2}
to
params_senteval.update({'task_path': PATH_TO_DATA, 'usepytorch': True, 'kfold': 10}) params_senteval['classifier'] = {'nhid': 50, 'optim': 'adam', 'batch_size': 64, 'tenacity': 5, 'epoch_size': 4}
For a complete set of model performance, refer to the bash and log files in scripts/. Simply run the corresponding script for results.
Word Embedding (cos) | EvalRank (MRR) | Hits1 | Hits3 |
---|---|---|---|
toy_emb.txt | 3.18 | 1.18 | 3.54 |
glove.840B.300d.txt | 13.15 | 4.66 | 15.72 |
GoogleNews-vectors-negative300.txt | 12.88 | 4.57 | 14.35 |
crawl-300d-2M.vec | 17.22 | 5.77 | 19.99 |
dict2vec-300d.vec | 12.71 | 4.04 | 13.04 |
- More benchmarking results can be found in this page: word_evalrank, word_similarity.
- More benchmarking results can also be found in scripts and their corresponding logs.
Sentence Embedding (cos) | EvalRank (MRR) | Hits1 | Hits3 |
---|---|---|---|
toy_emb.txt | 41.15 | 28.79 | 49.65 |
glove.840B.300d.txt | 61.00 | 44.94 | 74.66 |
InferSentv1 | 60.72 | 41.92 | 77.21 |
InferSentv2 | 63.89 | 45.59 | 80.47 |
BERT(first-last-avg) | 68.01 | 51.70 | 81.91 |
BERT-whitening | 66.58 | 46.54 | 84.22 |
Sentence-BERT | 64.12 | 47.07 | 79.05 |
SimCSE | 69.50 | 52.34 | 84.43 |
If you find our package useful, please cite our paper.
- Just Rank: Rethinking Evaluation with Word and Sentence Similarities
- ACL 2022 Main Conference
@inproceedings{wang-etal-2022-just,
title = "Just Rank: Rethinking Evaluation with Word and Sentence Similarities",
author = "Wang, Bin and
Kuo, C.-C. and
Li, Haizhou",
booktitle = "Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
month = may,
year = "2022",
address = "Dublin, Ireland",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2022.acl-long.419",
pages = "6060--6077"
}
@article{evalrank_2022,
title={Just Rank: Rethinking Evaluation with Word and Sentence Similarities},
author={Wang, Bin and Kuo, C.-C. Jay and Li, Haizhou},
journal={arXiv preprint arXiv:2203.02679},
year={2022}
}
- We borrow a portion of sentence embedding evaluation from SentEval. Please consider cite their work if you found that part useful.
Contact Info: [email protected].