lm-evaluation-harness broken on master #3292

pcmoritz · 2024-03-09T03:08:42Z

Since #3065, the eval suite https://github.com/EleutherAI/lm-evaluation-harness is broken.

Repro (this should be run on 2 A100s or H100s to make sure the Mixtral model fits into GPU memory):

# First install vllm from master via https://docs.vllm.ai/en/latest/getting_started/installation.html#build-from-source

# Then clone an install https://github.com/EleutherAI/lm-evaluation-harness
git clone https://github.com/EleutherAI/lm-evaluation-harness
cd lm-evaluation-harness
pip install -e .

# Now run the evaluation harness
lm_eval --model vllm --model_args pretrained=mistralai/Mixtral-8x7B-Instruct-v0.1,tensor_parallel_size=2 --tasks mmlu --num_fewshot 5

This fails with

  File "/home/ray/anaconda3/bin/lm_eval", line 8, in <module>
    sys.exit(cli_evaluate())
  File "/home/ray/default/lm-evaluation-harness/lm_eval/__main__.py", line 318, in cli_evaluate
    results = evaluator.simple_evaluate(
  File "/home/ray/default/lm-evaluation-harness/lm_eval/utils.py", line 288, in _wrapper
    return fn(*args, **kwargs)
  File "/home/ray/default/lm-evaluation-harness/lm_eval/evaluator.py", line 230, in simple_evaluate
    results = evaluate(
  File "/home/ray/default/lm-evaluation-harness/lm_eval/utils.py", line 288, in _wrapper
    return fn(*args, **kwargs)
  File "/home/ray/default/lm-evaluation-harness/lm_eval/evaluator.py", line 368, in evaluate
    resps = getattr(lm, reqtype)(cloned_reqs)
  File "/home/ray/default/lm-evaluation-harness/lm_eval/api/model.py", line 321, in loglikelihood
    return self._loglikelihood_tokens(new_reqs)
  File "/home/ray/default/lm-evaluation-harness/lm_eval/models/vllm_causallms.py", line 379, in _loglikelihood_tokens
    answer = self._parse_logprobs(
  File "/home/ray/default/lm-evaluation-harness/lm_eval/models/vllm_causallms.py", line 416, in _parse_logprobs
    continuation_logprobs = sum(
TypeError: unsupported operand type(s) for +: 'int' and 'Logprob'

The API breakage is fixed in EleutherAI/lm-evaluation-harness#1549, but after the fix it is extremely slow (about 40x slower than before), so not really feasible to run:

Running loglikelihood requests:   0%|                  [...]               | 32/56168 [22:52<668:47:47, 42.89s/it]

Being able to run the evaluation harness in a timely manner is crucial so we can ensure model performance doesn't degrade.

The text was updated successfully, but these errors were encountered:

baberabb · 2024-03-09T08:37:19Z

I think this is because without specifying a batch size the harness defaults to bs 1. Should be fixed if you use --batch_size auto and we can take advantage of vLLM's continuous batching.

Sshubam · 2024-10-03T08:49:35Z

@pcmoritz did you solve this? Im facing similar issue.

github-actions · 2025-01-02T02:00:06Z

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!

pcmoritz assigned Yard1 Mar 9, 2024

github-actions bot added the stale label Jan 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lm-evaluation-harness broken on master #3292

lm-evaluation-harness broken on master #3292

pcmoritz commented Mar 9, 2024 •

edited

Loading

baberabb commented Mar 9, 2024

Sshubam commented Oct 3, 2024

github-actions bot commented Jan 2, 2025

lm-evaluation-harness broken on master #3292

lm-evaluation-harness broken on master #3292

Comments

pcmoritz commented Mar 9, 2024 • edited Loading

baberabb commented Mar 9, 2024

Sshubam commented Oct 3, 2024

github-actions bot commented Jan 2, 2025

pcmoritz commented Mar 9, 2024 •

edited

Loading