Interpreting and Exploiting Functional Specialization in Multi-Head Attention under Multi-task Learning

The Code for the EMNLP 2023 main conference paper "Interpreting and Exploiting Functional Specialization in Multi-Head Attention under Multi-task Learning".

Inspired by functional specialization in the human brain, which helps to efficiently handle multiple tasks, we propose an interpreting method to quantify the degree of functional specialization in multi-head attention (IAP), and a simple multi-task training method to increase functional specialization in multi-head attention (IAT).

How to run?

Set up an virtual environment

conda create -n mha python=3.9
conda activate mha
pip install -r requirements.txt

Important Attention-head Pruning (IAP)

(optional) Finetune a multitask learning model

cd script
bash multitasks_train.sh

Quantify the degree of functional specialization

cd script
# Set up your private parameters, e.g., the path of multitask model and tasks
vim important_attention_prune.sh
bash important_attention_prune.sh

Important Attention-head Training (IAT)

cd script
# Set up your private parameters, e.g., the path of source model and tasks
vim important_attention_train.sh
bash important_attention_train.sh

AG-Pair Dataset

The AG-Pair dataset is built from the original dataset AG's News that contains 120k training samples from four topics. Given a pair of news as input, the model has to predict whether they are belonging to the same topic (Same) or not (Different).

To generate this dataset, samples in AG are iterated in random order and have an equal chance to combine a sample in the same topic or the other three topics. Thus the numbers of training samples in two classes are both 60k. Moreover, each news in AG's News occurs exactly twice in the AG-Pair dataset to keep the same word frequency.

Contact

[email protected]

How to cite our paper?

@inproceedings{li-etal-2023-FunctionalSpecialization,
  author    = {Chong Li and
               Shaonan Wang and
               Yunhao Zhang and
               Jiajun Zhang and
               Chengqing Zong},
  title = "Interpreting and Exploiting Functional Specialization in Multi-Head Attention under Multi-task Learning",
  booktitle = "Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing",
  year = "2023",
  address = "Singapore",
  publisher = "Association for Computational Linguistics",
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
data		data
script		script
src		src
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Interpreting and Exploiting Functional Specialization in Multi-Head Attention under Multi-task Learning

How to run?

Set up an virtual environment

Important Attention-head Pruning (IAP)

Important Attention-head Training (IAT)

AG-Pair Dataset

Contact

How to cite our paper?

About

Releases

Packages

Languages

License

ZNLP/FunctionalSpecializationInMHA

Folders and files

Latest commit

History

Repository files navigation

Interpreting and Exploiting Functional Specialization in Multi-Head Attention under Multi-task Learning

How to run?

Set up an virtual environment

Important Attention-head Pruning (IAP)

Important Attention-head Training (IAT)

AG-Pair Dataset

Contact

How to cite our paper?

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages