Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lm-evaluation-harness broken on master #3292

Open
pcmoritz opened this issue Mar 9, 2024 · 3 comments
Open

lm-evaluation-harness broken on master #3292

pcmoritz opened this issue Mar 9, 2024 · 3 comments
Assignees
Labels

Comments

@pcmoritz
Copy link
Collaborator

pcmoritz commented Mar 9, 2024

Since #3065, the eval suite https://github.com/EleutherAI/lm-evaluation-harness is broken.

Repro (this should be run on 2 A100s or H100s to make sure the Mixtral model fits into GPU memory):

# First install vllm from master via https://docs.vllm.ai/en/latest/getting_started/installation.html#build-from-source

# Then clone an install https://github.com/EleutherAI/lm-evaluation-harness
git clone https://github.com/EleutherAI/lm-evaluation-harness
cd lm-evaluation-harness
pip install -e .

# Now run the evaluation harness
lm_eval --model vllm --model_args pretrained=mistralai/Mixtral-8x7B-Instruct-v0.1,tensor_parallel_size=2 --tasks mmlu --num_fewshot 5

This fails with

  File "/home/ray/anaconda3/bin/lm_eval", line 8, in <module>
    sys.exit(cli_evaluate())
  File "/home/ray/default/lm-evaluation-harness/lm_eval/__main__.py", line 318, in cli_evaluate
    results = evaluator.simple_evaluate(
  File "/home/ray/default/lm-evaluation-harness/lm_eval/utils.py", line 288, in _wrapper
    return fn(*args, **kwargs)
  File "/home/ray/default/lm-evaluation-harness/lm_eval/evaluator.py", line 230, in simple_evaluate
    results = evaluate(
  File "/home/ray/default/lm-evaluation-harness/lm_eval/utils.py", line 288, in _wrapper
    return fn(*args, **kwargs)
  File "/home/ray/default/lm-evaluation-harness/lm_eval/evaluator.py", line 368, in evaluate
    resps = getattr(lm, reqtype)(cloned_reqs)
  File "/home/ray/default/lm-evaluation-harness/lm_eval/api/model.py", line 321, in loglikelihood
    return self._loglikelihood_tokens(new_reqs)
  File "/home/ray/default/lm-evaluation-harness/lm_eval/models/vllm_causallms.py", line 379, in _loglikelihood_tokens
    answer = self._parse_logprobs(
  File "/home/ray/default/lm-evaluation-harness/lm_eval/models/vllm_causallms.py", line 416, in _parse_logprobs
    continuation_logprobs = sum(
TypeError: unsupported operand type(s) for +: 'int' and 'Logprob'

The API breakage is fixed in EleutherAI/lm-evaluation-harness#1549, but after the fix it is extremely slow (about 40x slower than before), so not really feasible to run:

Running loglikelihood requests:   0%|                  [...]               | 32/56168 [22:52<668:47:47, 42.89s/it]

Being able to run the evaluation harness in a timely manner is crucial so we can ensure model performance doesn't degrade.

@baberabb
Copy link

baberabb commented Mar 9, 2024

I think this is because without specifying a batch size the harness defaults to bs 1. Should be fixed if you use --batch_size auto and we can take advantage of vLLM's continuous batching.

@Sshubam
Copy link

Sshubam commented Oct 3, 2024

@pcmoritz did you solve this? Im facing similar issue.

Copy link

github-actions bot commented Jan 2, 2025

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!

@github-actions github-actions bot added the stale label Jan 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants