Skip to content

Latest commit

 

History

History
114 lines (83 loc) · 3.88 KB

README.md

File metadata and controls

114 lines (83 loc) · 3.88 KB

On the Multilingual Ability of Decoder-based Pre-trained Language Models: Finding and Controlling Language-Specific Neurons

This is the official implementation of On the Multilingual Ability of Decoder-based Pre-trained Language Models: Finding and Controlling Language-Specific Neurons (Accepted at NAACL 2024).

The paper is available at NAACL 2024 and arXiv.

Notice

This code is a modification of Self-Conditioning Pre-Trained Language Models.

Data Path

# Ground Truth Texts
assets/Language/sense/
# Model-Generated Texts
outputs/

Installation

The requirements are listed in frozen_requirements.txt.
The code has been tested using Python 3.8.
Run the following for installation:

Create a virtual environment

cd <path_to_this_project>
conda create -n lang_neuron python=3.8
conda activate lang_neuron
pip install -U pip wheel

Install selfcond (recommended for reproducibility)

bash
pip install -r frozen_requirements.txt
python -c "import nltk; nltk.download('punkt')"

1. Finding Language-Specific Neurons

Models are fetched from HuggingFace Transformers repository. Model Support:

  • xglm
  • bloom
  • llama-2

1.1 Collect responses from a model

Run the following script to collect responses from a model when specified texts are entered into the model.

bash main_prod_env.sh "xglm-564M compute_responses Language de 2000 on_p50 expertise_limited_2000_both"

The responses will be saved inside path_to_save_responses/{model_name}/sense/[concept]/responses.

1.2 Compute expertise

The expertise is defined as the Average Precision (AP) achieved by a unit when its responses are considered prediction scores for the sentences.

bash main_prod_env.sh "xglm-564M compute_expertise Language de 2000 on_p50 expertise_limited_2000_both"

The expertise results are saved as a CSV file in path_to_save_responses/{model_name}/sense/[concept]/expertise. Column ap contains the expertise measured for each model unit and column on_p50 contains the median response of each unit to the positive sentences.

1.3 Limit expertise (only Top-N and Bottom-N neurons)

Run the following script to to limit expertise to only Top-N and Bottom-N neurons.

bash main_prod_env.sh "xglm-564M limit_expertise Language de 2000 on_p50 expertise_limited_2000_both"

2. Controlling Language-Specific Neurons

2-1. Unconditional text generation

In this step, the above computed expertise is used to generate sentences starting with a null prompt.

bash main_prod_env.sh "xglm-564M generate_activated Language de 2000 on_p50 expertise_limited_2000_both"

2-2. Conditional text generation (machine translation task)

In this step, the above computed expertise is used to generate sentences with a prompt "Translate an English sentence into a target language.\nEnglish: {source_text}\nTarget Language:".

bash main_prod_env.sh "xglm-564M generate_activated_condition Language de 2000 on_p50 expertise_limited_2000_both flores200 2"

Citation

@inproceedings{kojima-etal-2024-multilingual,
    title = "On the Multilingual Ability of Decoder-based Pre-trained Language Models: Finding and Controlling Language-Specific Neurons",
    author = "Kojima, Takeshi  and
      Okimura, Itsuki  and
      Iwasawa, Yusuke  and
      Yanaka, Hitomi  and
      Matsuo, Yutaka",
    booktitle = "Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics",
    year = "2024",
    address = "Mexico City, Mexico",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.naacl-long.384",
    pages = "6912--6964",
}