GitHub - BinWang28/EvalRank-Embedding-Evaluation: ACL 2022: Just Rank: Rethinking Evaluation with Word and Sentence Similarities

Word & Sentence Embedding Evaluation

In this project, we provide an easy-to-use toolkit for both word and sentence embedding evaluations.

For more details: ACL 2022: Just Rank: Rethinking Evaluation with Word and Sentence Similarities

Update

Mar. 22, 2022
- More example scripts for how to test on each supported model.
Mar.21, 2022
- You can now follow the template to test on your own embedding model.
- Support a series of sentence embedding models including InferSent, SimCSE, Sentence_BERT, BERT-Whitening, BERT-Flow, etc..
- Sentence Embedding Evaluation part is updated.
Mar.20, 2022
- Word Embedding Evaluation part is updated.

Outline

Section	Description
Evluation Tasks	Evluation Tasks
Environment Setup	Environments
Models and Quick Start	Models and Quick Start
Benchmarking - Word	Leaderboard
Benchmarking - Sentence	Leaderboard
References	References
Acknowledge	Acknowledge

Evluation Tasks

The following are the supported evaluation tasks:

Word Embedding Evaluation
- EvalRank (Word-Level)
- Word Similarity Tasks
  - WS-353, WS-353-SIM, WS-353-REL, MC-30, RG-65, Rare-Word, MEN, MTurk-287, MTurk-771, YP-130, SimLex-999, Verb-143, SimVerb-3500
Sentence Embedding Evaluation
- EvalRank (Sentence-Level)
- Downstream Tasks
  - MR, CR, SUBJ, MPQA, SST2, SST5, TREC, MRPC, SICK-E, SCICITE
- Semantic Textual Similarity (STS) Tasks
  - STS 12~16, STS-Benchmark, STR

Environment Setup

Tested with the following dependencies:

python==3.8.12
pytorch==1.11.0
transformers==4.11.3
scikit-learn==0.23.2

Please look into the details of the following script file for how to set up the environment.

bash environment.sh

Models and Quick Start

We have supoorted a list of word & sentence embedding models for quick evaluation and benchmarking.

Word Embedding Models
- Any word embedding files follow this format.
- Integrate one post-processing method.

Word-level EvalRank and Similarity

To test on your own model, simply change the word embedding path.

bash word_evaluate.sh

# To evaluate on your own word embedding model
update file: word_evaluate.sh
WORD_EMB_PATH='PATH/TO/WORD/EMBEDDING'

Sentence Embedding Models
- Bag-of-word (averaging word embedding)
- Bag-of-word with post-processing
- InferSent
- BERT
- BERT-Whitening
- BERT-Flow
- Sentence-BERT
- SimCSE

Sentence-level EvalRank and Similarity

You can also easily test your own sentence embedding model using our provided template.

bash sentence_evaluate.sh

# To evaluate on your own sentence embedding model modify the following to files
update file: sentence_evaluate.sh
SENT_EMB_MODEL='customize'
update file: ./src/models/sent_emb/_customize.py

For better classification performance, edit the following part (in file src/s_evaluation.py):

params_senteval = {'task_path': './data/', 'usepytorch': True, 'kfold': 5}
params_senteval['classifier'] = {'nhid': 0, 'optim': 'rmsprop', 'batch_size': 128,
                                'tenacity': 3, 'epoch_size': 2}

to

params_senteval.update({'task_path': PATH_TO_DATA, 'usepytorch': True, 'kfold': 10})
params_senteval['classifier'] = {'nhid': 50, 'optim': 'adam', 'batch_size': 64,
                                'tenacity': 5, 'epoch_size': 4}

For a complete set of model performance, refer to the bash and log files in scripts/. Simply run the corresponding script for results.

Benchmarking - Word

Word Embedding (cos)	EvalRank (MRR)	Hits1	Hits3
toy_emb.txt	3.18	1.18	3.54
glove.840B.300d.txt	13.15	4.66	15.72
GoogleNews-vectors-negative300.txt	12.88	4.57	14.35
crawl-300d-2M.vec	17.22	5.77	19.99
dict2vec-300d.vec	12.71	4.04	13.04

More benchmarking results can be found in this page: word_evalrank, word_similarity.
More benchmarking results can also be found in scripts and their corresponding logs.

Benchmarking - Sentence

Sentence Embedding (cos)	EvalRank (MRR)	Hits1	Hits3
toy_emb.txt	41.15	28.79	49.65
glove.840B.300d.txt	61.00	44.94	74.66
InferSentv1	60.72	41.92	77.21
InferSentv2	63.89	45.59	80.47
BERT(first-last-avg)	68.01	51.70	81.91
BERT-whitening	66.58	46.54	84.22
Sentence-BERT	64.12	47.07	79.05
SimCSE	69.50	52.34	84.43

References

If you find our package useful, please cite our paper.

Just Rank: Rethinking Evaluation with Word and Sentence Similarities
- ACL 2022 Main Conference

@inproceedings{wang-etal-2022-just,
    title = "Just Rank: Rethinking Evaluation with Word and Sentence Similarities",
    author = "Wang, Bin  and
      Kuo, C.-C.  and
      Li, Haizhou",
    booktitle = "Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = may,
    year = "2022",
    address = "Dublin, Ireland",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.acl-long.419",
    pages = "6060--6077"
}

@article{evalrank_2022,
  title={Just Rank: Rethinking Evaluation with Word and Sentence Similarities},
  author={Wang, Bin and Kuo, C.-C. Jay and Li, Haizhou},
  journal={arXiv preprint arXiv:2203.02679},
  year={2022}
}

Acknowledge

We borrow a portion of sentence embedding evaluation from SentEval. Please consider cite their work if you found that part useful.

Contact Info: [email protected].

Name		Name	Last commit message	Last commit date
Latest commit History 231 Commits
benchmarking		benchmarking
data		data
img		img
scripts		scripts
src		src
.gitignore		.gitignore
JustRank.pdf		JustRank.pdf
LICENSE		LICENSE
README.md		README.md
environment.sh		environment.sh
poster.pdf		poster.pdf
sentence_evaluate.sh		sentence_evaluate.sh
word_evaluate.sh		word_evaluate.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Word & Sentence Embedding Evaluation

Update

Outline

Evluation Tasks

Environment Setup

Models and Quick Start

Benchmarking - Word

Benchmarking - Sentence

References

Acknowledge

About

Releases

Packages

Languages

License

BinWang28/EvalRank-Embedding-Evaluation

Folders and files

Latest commit

History

Repository files navigation

Word & Sentence Embedding Evaluation

Update

Outline

Evluation Tasks

Environment Setup

Models and Quick Start

Benchmarking - Word

Benchmarking - Sentence

References

Acknowledge

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages