On the Multilingual Ability of Decoder-based Pre-trained Language Models: Finding and Controlling Language-Specific Neurons
This is the official implementation of On the Multilingual Ability of Decoder-based Pre-trained Language Models: Finding and Controlling Language-Specific Neurons
(Accepted at NAACL 2024).
The paper is available at NAACL 2024 and arXiv.
This code is a modification of Self-Conditioning Pre-Trained Language Models
.
# Ground Truth Texts
assets/Language/sense/
# Model-Generated Texts
outputs/
The requirements are listed in frozen_requirements.txt.
The code has been tested using Python 3.8
.
Run the following for installation:
cd <path_to_this_project>
conda create -n lang_neuron python=3.8
conda activate lang_neuron
pip install -U pip wheel
bash
pip install -r frozen_requirements.txt
python -c "import nltk; nltk.download('punkt')"
Models are fetched from HuggingFace Transformers repository. Model Support:
- xglm
- bloom
- llama-2
Run the following script to collect responses from a model when specified texts are entered into the model.
bash main_prod_env.sh "xglm-564M compute_responses Language de 2000 on_p50 expertise_limited_2000_both"
The responses will be saved inside path_to_save_responses/{model_name}/sense/[concept]/responses
.
The expertise is defined as the Average Precision (AP) achieved by a unit when its responses are considered prediction scores for the sentences.
bash main_prod_env.sh "xglm-564M compute_expertise Language de 2000 on_p50 expertise_limited_2000_both"
The expertise results are saved as a CSV file in path_to_save_responses/{model_name}/sense/[concept]/expertise
.
Column ap
contains the expertise measured for each model unit and column on_p50
contains the median response of each unit to the positive sentences.
Run the following script to to limit expertise to only Top-N and Bottom-N neurons.
bash main_prod_env.sh "xglm-564M limit_expertise Language de 2000 on_p50 expertise_limited_2000_both"
In this step, the above computed expertise is used to generate sentences starting with a null prompt.
bash main_prod_env.sh "xglm-564M generate_activated Language de 2000 on_p50 expertise_limited_2000_both"
In this step, the above computed expertise is used to generate sentences with a prompt "Translate an English sentence into a target language.\nEnglish: {source_text}\nTarget Language:".
bash main_prod_env.sh "xglm-564M generate_activated_condition Language de 2000 on_p50 expertise_limited_2000_both flores200 2"
@inproceedings{kojima-etal-2024-multilingual,
title = "On the Multilingual Ability of Decoder-based Pre-trained Language Models: Finding and Controlling Language-Specific Neurons",
author = "Kojima, Takeshi and
Okimura, Itsuki and
Iwasawa, Yusuke and
Yanaka, Hitomi and
Matsuo, Yutaka",
booktitle = "Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics",
year = "2024",
address = "Mexico City, Mexico",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.naacl-long.384",
pages = "6912--6964",
}