Skip to content
/ ConGen Public

Implementation of ConGen: Unsupervised Control and Generalization Distillation For Sentence Representation (Finding of EMNLP 2022).

License

Notifications You must be signed in to change notification settings

KornWtp/ConGen

Repository files navigation

ConGen

Implementation of ConGen: Unsupervised Control and Generalization Distillation For Sentence Representation (Finding of EMNLP 2022).

Citation

@inproceedings{limkonchotiwat-etal-2022-congen,
    title = "{ConGen}: Unsupervised Control and Generalization Distillation For Sentence Representation",
    author = "Limkonchotiwat, Peerat  and
      Ponwitayarat, Wuttikorn  and
      Lowphansirikul, Lalita and
      Udomcharoenchaikit, Can  and
      Chuangsuwanich, Ekapol  and
      Nutanong, Sarana",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2022",
    year = "2022",
    publisher = "Association for Computational Linguistics",
}

Announcement (2023)

  • We have a new version of ConGen: SCT (published at TACL2023).
  • The SCT method outperforms ConGen on distillation settings.
  • This method is also effective for a small model to learn sentence embedding without the teacher model!

Installation

git clone https://github.com/KornWtp/ConGen.git
cd ConGen
pip install -e .

Our models (Small to Large)

Usage

Training data

We use the training data from BSL's paper: monolingual version and multilingual version.

Development data

We use sts-b development set from sentence transformer.

Parameters

The full model parameters:

Models Teacher Temp Student Temp Queue Size Learning Rate
BERT-Tiny 0.05 0.05 16384 5e-4
BERT-Mini 0.05 0.07 16384 3e-4
Tiny-BERT-L4 0.05 0.05 65536 1e-4
MiniLM-L3 0.05 0.07 16384 5e-4
MiniLM-L6 0.05 0.07 65536 3e-4
BERT-Small 0.05 0.07 65536 3e-4
MiniLM-L12 0.05 0.07 16384 5e-5
Tiny-BERT-L6 0.05 0.07 65536 5e-5
BERT-base 0.05 0.07 65536 5e-5
RoBERTa-base 0.1 0.1 1024 5e-5
Multilingual-DistilBERT 0.05 0.07 65536 3e-4
Multilingual-MiniLM-L12 0.05 0.07 65536 3e-4

Train your own model

Please set the model's parameter before training.

>> bash train_congen.sh

For finetuning model parameters:

learning_rate_all=(3e-4 5e-4 1e-4 3e-5 5e-5 1e-5)
queue_sizes=(262144 131072 65536 16384 1024)
teacher_temps=(0.01 0.03 0.05 0.07 0.09 0.1)
student_temps=(0.01 0.03 0.05 0.07 0.09 0.1)

Evaluation

Our evaluation code for sentence embeddings is based on a modified version of SentEval and SimCSE.

Before evaluation, please download the evaluation datasets by running

cd SentEval/data/downstream/
bash download_dataset.sh

Evaluation - Notebook

Please see https://github.com/KornWtp/ConGen/tree/main/notebook

Evaluation - Python

Then come back to the root directory, you can evaluate any sentence transformers models using SimCSE evaluation code. For example,

python evaluation.py \
    --model_name_or_path "your-model-path" \
    --task_set sts \
    --mode test

Main results - STS

In our paper, we average score over three models and shown as follows:

Methods Semantic Textual Similarity (STS) average scores
BERT
Tiny
BERT
Mini
Tiny
BERT-L4
MiniLM
L3
MiniLM
L6
BERT
Small
MiniLM
L12
Tiny
BERT-L6
BERT
Base
RoBERTa
Base
#Param (M) 4 11 14 17 22 29 33 67 109 125
Finetuning-based
Teacher SimCSE-Unsup-RoBERTa-large: 78.90
Sup-SimCSE 72.35 76.52 78.19 76.49 78.86 78.59 80.48 81.23 81.57 82.52
Unsup-SimCSE 64.47 65.94 67.91 55.10 59.15 69.13 67.90 73.67 76.25 77.10
Distillation-based
L2 73.32 76.07 77.03 76.66 77.51 77.30 78.79 78.95 78.97 79.00
Making 70.76 74.42 76.39 75.34 74.74 76.92 76.91 78.67 78.07 79.06
SKD 68.83 72.02 73.05 72.66 73.59 75.06 74.58 77.62 78.05 77.44
CKD 76.19 76.59 77.48 77.14 77.90 76.97 77.92 78.29 78.54 78.34
Our propose method
ConGen 76.85 78.09 78.54 78.22 79.10 78.91 79.68 79.73 80.06 79.78

Full results

Models STS-12 STS-13 STS-14 STS-15 STS-16 STS-B SICK-R Avg.
BERT-Tiny 72.18 81.12 75.45 83.22 77.89 79.03 69.05 76.85
BERT-Mini 74.17 82.69 76.58 84.30 78.23 80.84 69.82 78.09
Tiny-BERT-L4 74.3 83.07 77.37 84.70 79.06 80.99 70.26 78.54
MiniLM-L3 74.00 82.93 76.58 84.35 78.57 81.00 70.09 78.22
MiniLM-L6 75.06 83.86 77.29 85.01 79.67 81.92 70.89 79.10
BERT-Small 74.50 83.58 77.29 84.83 79.72 81.93 70.55 78.91
MiniLM-L12 75.25 84.61 78.27 85.51 80.52 82.32 71.32 79.68
Tiny-BERT-L6 75.53 84.76 78.33 85.72 80.42 82.25 71.12 79.73
BERT-base 75.58 85.13 78.54 85.75 81.12 82.81 71.47 80.06
RoBERTa-base 75.32 84.56 77.26 85.33 81.34 82.67 72.00 79.78

We have Thai sentence embedding models from ConGen!!

Hyper-Parameters

Parameters Models Teacher Temp Student Temp Queue Size Learning Rate
<30M ConGen-WangchanBERT-Tiny 0.01 0.01 65536 3e-4
ConGen-WangchanBERT-Small 0.05 0.09 65536 5e-4
>100M ConGen-simcse-model-roberta-base-thai 0.05 0.03 65536 3e-4
ConGen-paraphrase-multilingual-mpnet-base-v2 0.05 0.05 262144 1e-4

Thai semantic textual similarity benchmark

Parameters Models Spearman's Correlation (*100)
<30M ConGen-WangchanBERT-Tiny 66.43
ConGen-WangchanBERT-Small 70.65
>100M ConGen-simcse-model-roberta-base-thai 66.21
ConGen-paraphrase-multilingual-mpnet-base-v2 76.56

Thai transfer benchmark

Wisesight

Parameters Models Acc (*100) F1 (*100, weighted)
<30M ConGen-WangchanBERT-Tiny 61.55 62.19
ConGen-WangchanBERT-Small 64.77 65.30
>100M ConGen-simcse-model-roberta-base-thai 65.07 65.28
ConGen-paraphrase-multilingual-mpnet-base-v2 67.84 68.31

Wongnai

Parameters Models Acc (*100) F1 (*100, weighted)
<30M ConGen-WangchanBERT-Tiny 42.67 44.78
ConGen-WangchanBERT-Small 43.38 45.99
>100M ConGen-simcse-model-roberta-base-thai 41.32 41.57
ConGen-paraphrase-multilingual-mpnet-base-v2 47.22 48.63

Generated Review

Parameters Models Acc (*100) F1 (*100, weighted)
<30M ConGen-WangchanBERT-Tiny 54.26 52.69
ConGen-WangchanBERT-Small 58.22 57.03
>100M ConGen-simcse-model-roberta-base-thai 49.81 47.94
ConGen-paraphrase-multilingual-mpnet-base-v2 58.00 56.80

About

Implementation of ConGen: Unsupervised Control and Generalization Distillation For Sentence Representation (Finding of EMNLP 2022).

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published