Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive?

This repository contains code and figures for our paper Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive? by Rylan Schaeffer, Hailey Schoelkopf, Brando Miranda, Gabriel Mukobi, Varun Madan, Adam Ibrahim, Herbie Bradley, Stella Biderman and Sanmi Koyejo.

Setup | Usage | Contributing | Citing | Contact

Installation

(Optional) Update conda:

conda update -n base -c defaults conda -y

Create and activate the conda environment:

conda env create --file environment.yml -y && conda activate elusive

Update pip.

pip install --upgrade pip

Install some additional packages:

pip install bitsandbytes sentencepiece

(Optional) To run evals, initialize EleutherAI's lm-evaluation-harness:

git submodule update --init --recursive

Change into the directory and install lm-evaluation-harness:

cd submodules/lm-evaluation-harness && pip install -e . && cd ../..

Login to wandb:

wandb login

Data

Data will be provided once the paper is accepted and published. For early access, please contact the authors see Contact below.

Code

This project's code has three broad stages:

Collecting Language Model Scores on NLP Benchmarks: Running language model families on standard LLM benchmarks and collating the per-sample results.
Computing Compute-Score Correlations: For each 4-tuple of (language model family, NLP benchmark, correlation metric, performance score), we compute the per-sample correlations between scores and compute over the model family. This is done using scripts/compute_correlations_between_sample_scores_and_compute.py and W&B sweeps
Analyzing Compute-Score Correlations: We analyze the results of the correlations in the paper and generate figures using the Python scripts in notebooks.

Contributing

Contributions are welcome! Please format your code with black.

Citing

To cite this work, please use:

@misc{schaeffer2024predictingdownstreamcapabilitiesfrontier,
      title={Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive?}, 
      author={Rylan Schaeffer and Hailey Schoelkopf and Brando Miranda and Gabriel Mukobi and Varun Madan and Adam Ibrahim and Herbie Bradley and Stella Biderman and Sanmi Koyejo},
      year={2024},
      eprint={2406.04391},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2406.04391}, 
}

Note: We created a new clean repository for the review process; thus, this repo's commit history is not representative of each individual's contributions.

Contact

Questions? Comments? Interested in collaborating? Open an issue or email [email protected] and [email protected].

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
configs		configs
figures		figures
notebooks		notebooks
pred_evals		pred_evals
scripts		scripts
submodules		submodules
sweeps/correlations_between_sample_scores_and_compute		sweeps/correlations_between_sample_scores_and_compute
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive?

Installation

Data

Code

Contributing

Citing

Contact

About

Releases

Packages

Languages

RylanSchaeffer/KoyejoLab-Why-Has-Predicting-Downstream-Capabilities-Remained-Elusive

Folders and files

Latest commit

History

Repository files navigation

Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive?

Installation

Data

Code

Contributing

Citing

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages