This is the official repository containing the code for our paper The Importance of Prompt Tuning for Automated Neuron Explanations (NeurIPS 2023 ATTRIB). arXiv, Project website. We build heavily on OpenAIs automated-interpretability and specifically analyze the importance of specific prompt types used to generate new explanations.
We find that simpler, more intuitive prompts such as our summary can increase both computational efficiency and quality of the generated explanations. In particular our simple Summary prompt shown below can outperform the original while requiring 2.4 times less input tokens per neuron.
- Setup: first
cd neuron-explainer
and then install required packages by runningpip install -e .
- Change line 4 of
neuron_explainer/utils.py
to correspond to your OpenAI API key - Run
Experiments/save_descriptions.ipynb
to generate explanations for 5 sample neurons using our different prompts. - Compare results with neuron activations shown in NeuronViewer
- To explain a different set of neurons, input a different csv to neurons_to_evaluate that similar to
inputs/test_neurons.csv
has thelayer
andneuron
columns.
- All neuron descriptions generated for the paper are available in
Experiments\results
. - Simulate and Score experiments can be reproduced by running
Experiments\simulate_score.ipynb
. Note: the simulator model used originally, 'gpt-3.5-turbo-instruct', is no longer supported by the API in the required format and has been replaced with 'text-davinci-003' in the code. This change will likely decrease simulation quality and increase api costs. - To calculate Similarity to baseline explanation(AdaCS) to evaluate explanation quality, run
Experiments\ada_cs.ipynb
. - To explan Neuron Puzzles and calculate AdaCS similarity to their ground truth explanations, run
Experiments\puzzles.ipynb
- Finally our comparison of number of API tokens per explained neuron can be reproduced in
Experiments\simulate_score.ipynb
.
Experiments\save_neuron_info.ipynb
and Experiments\get_interpretable_neurons.ipynb
are used to collect NeuronViewer explanations and select which neurons we should explain.
If you find this code useful, please cite:
@misc{lee2023importance,
title={The Importance of Prompt Tuning for Automated Neuron Explanations},
author={Justin Lee and Tuomas Oikarinen and Arjun Chatha and Keng-Chi Chang and Yilan Chen and Tsui-Wei Weng},
year={2023},
eprint={2310.06200},
archivePrefix={arXiv},
primaryClass={cs.CL}
}