`prompt-steering`

This repository is a landing page for tools related to prompt steering in language models. Our primary focus is to measure and analyze how controllable models are through prompting.

Current features (last updated 2024-11-15)

prompt-steerability: A benchmark suite for measuring the extent to which models can be prompted to change behavior

Installation

To install, set up a virtual environment using Python 3.10, navigate to prompt_steerability/, and run:

pip install -e .

Setting up vLLM

Model access is enabled via HuggingFace and vLLM. For a simple guide please refer to the quickstart or follow these steps:

Request access to models through HuggingFace:
- visit the model page on HuggingFace
- click "Access" button and fill out the form if model is gated
- once approved, get your HuggingFace token from the account settings
- log in using: huggingface-cli login
Install vLLM with GPU support in your virtual environment:

pip install vllm[all]

Start a vLLM server (note model access uses the OpenAI API format):

python -m vllm.entrypoints.openai.api_server \
    --model <your-model-name> \
    --host 0.0.0.0 \
    --port 8000 \
    --device cuda

Ensure that <your-model-name> exists as one of the model-id's in model-config in config.yaml (and as named on HuggingFace).

For a complete list of parameter options (including options for instantiating multiple GPUs), please see the vLLM documentation.

Configuring the benchmark

Configuration details are specified in prompt_steerability/config.yaml. Key parameters are:

num-benchmark-trials: the number of (outer) benchmark trials
persona-dimensions: the persona dimensions you want to benchmark the models over
models: the models you wish to benchmark (ensure these are hosted at the base-url locations specified in model-config)
steering config:
- method: the steering method (principles)
- params:
  - num-steering-trials: the number of steering trials
  - steering-budgets: dictates the number of steering statements to include
profiling:
- method: the profiling method (query-principles)
- params:
  - num-questions: how many profiling questions to ask per trial
  - representation: how to represent profiles
- inference: how to parse outputs (log-probs vs output-parsing)

Running the benchmark

After hosting the desired models, navigate to persona/benchmark/ and run run_persona_steerability.py to run the benchmark.

Citation

@article{miehling2024evaluating,
    title={Evaluating the Prompt Steerability of Large Language Models},
    author={Miehling, Erik and Desmond, Michael and Ramamurthy, Karthikeyan Natesan and Daly, Elizabeth M. and Dognin, Pierre and Rios, Jesus and Bouneffouf, Djallel and Liu, Miao},
    journal={arXiv preprint arXiv:2411.12405},
    year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
prompt_steerability		prompt_steerability
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

`prompt-steering`

Current features (last updated 2024-11-15)

Installation

Setting up vLLM

Configuring the benchmark

Running the benchmark

Citation

About

Releases

Packages

Contributors 2

Languages

IBM/prompt-steering

Folders and files

Latest commit

History

Repository files navigation

prompt-steering

Current features (last updated 2024-11-15)

Installation

Setting up vLLM

Configuring the benchmark

Running the benchmark

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

`prompt-steering`

Packages