Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Pixtral + guided_json fails with Internal Server Error #8429

Closed
1 task done
pseudotensor opened this issue Sep 12, 2024 · 8 comments · Fixed by #8521
Closed
1 task done

[Bug]: Pixtral + guided_json fails with Internal Server Error #8429

pseudotensor opened this issue Sep 12, 2024 · 8 comments · Fixed by #8521
Labels
bug Something isn't working

Comments

@pseudotensor
Copy link

pseudotensor commented Sep 12, 2024

Your current environment

docker pull vllm/vllm-openai:latest
docker stop pixtral ; docker remove pixtral
docker run -d --restart=always \
    --runtime=nvidia \
    --gpus '"device=MIG-2ea01c20-8e9b-54a7-a91b-f308cd216a95"' \
    --shm-size=10.24gb \
    -p 5001:5001 \
        -e NCCL_IGNORE_DISABLED_P2P=1 \
    -e HUGGING_FACE_HUB_TOKEN=$HUGGING_FACE_HUB_TOKEN \
    -e VLLM_NCCL_SO_PATH=/usr/local/lib/python3.10/dist-packages/nvidia/nccl/lib/libnccl.so.2 \
    -v /etc/passwd:/etc/passwd:ro \
    -v /etc/group:/etc/group:ro \
    -u `id -u`:`id -g` \
    -v "${HOME}"/.cache:$HOME/.cache/ \
    -v "${HOME}"/.cache/huggingface:$HOME/.cache/huggingface \
    -v "${HOME}"/.cache/huggingface/hub:$HOME/.cache/huggingface/hub \
    -v "${HOME}"/.config:$HOME/.config/   -v "${HOME}"/.triton:$HOME/.triton/  \
    --network host \
    --name pixtral \
    vllm/vllm-openai:latest \
        --port=5001 \
        --host=0.0.0.0 \
        --model=mistralai/Pixtral-12B-2409 \
        --seed 1234 \
        --tensor-parallel-size=1 \
        --gpu-memory-utilization 0.98 \
        --enforce-eager \
        --tokenizer_mode mistral \
        --limit_mm_per_prompt 'image=8' \
        --max-model-len=32768 \
        --max-num-batched-tokens=32768 \
        --max-log-len=100 \
        --download-dir=$HOME/.cache/huggingface/hub &>> logs.vllm_server.pixtral.txt

All seems to work for sending an image query. But as soon as I try any simple guided_json or guided_choice, it always fails.

from openai import OpenAI

base_url = 'http://IP:80/v1'  # replace IP
api_key = "EMPTY"

client_args = dict(base_url=base_url, api_key=api_key)
openai_client = OpenAI(**client_args)


prompt = """<all_documents>
<doc>
<name>roses.pdf</name>
<page>1</page>
<text>
I like red roses, and red elephants.
</text>
</doc>
</all_documents>


<response_format_instructions>

Ensure you follow this JSON schema, and ensure to use the same key names as the schema:
\`\`\`json
{"color": {"type": "string"}}
\`\`\`


</response_format_instructions>



What do I like?"""

guided_json = {"type": "object",
    "properties": {
        "color": {
        "type": "string"
        }
        }
}

messages = [{'role': 'user', 'content': prompt}]
stream = False
client_kwargs = dict(model='mistralai/Pixtral-12B-2409',
                     max_tokens=2048, stream=stream, messages=messages,
                     response_format=dict(type='json_object'),
                     extra_body=dict(guided_json=guided_json))
client = openai_client.chat.completions

responses = client.create(**client_kwargs)
text = responses.choices[0].message.content
print(text)

gives:

Traceback (most recent call last):
  File "/home/jon/h2ogpt/check_openai_json1.py", line 51, in <module>
    responses = client.create(**client_kwargs)
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/openai/_utils/_utils.py", line 274, in wrapper
    return func(*args, **kwargs)
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/openai/resources/chat/completions.py", line 668, in create
    return self._post(
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/openai/_base_client.py", line 936, in request
    return self._request(
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/openai/_base_client.py", line 1025, in _request
    return self._retry_request(
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/openai/_base_client.py", line 1074, in _retry_request
    return self._request(
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/openai/_base_client.py", line 1025, in _request
    return self._retry_request(
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/openai/_base_client.py", line 1074, in _retry_request
    return self._request(
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/openai/_base_client.py", line 1040, in _request
    raise self._make_status_error_from_response(err.response) from None
openai.InternalServerError: Error code: 500

and vllm shows:

INFO:     172.16.0.101:3146 - "GET /v1/models HTTP/1.1" 200 OK
INFO:     172.16.0.101:3146 - "GET /v1/models HTTP/1.1" 200 OK
INFO:     172.16.0.101:3146 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error

Model Input Dumps

No response

🐛 Describe the bug

See above.

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
@pseudotensor pseudotensor added the bug Something isn't working label Sep 12, 2024
@DarkLight1337
Copy link
Member

Can you show the server-side stack trace?

@pseudotensor
Copy link
Author

I did, it only shows that "Internal Server Error" as I mentioned there's literally nothing else in the server trace except that. No stack trace etc.

@DarkLight1337
Copy link
Member

DarkLight1337 commented Sep 13, 2024

To better debug the issue, can you use guided decoding in offline inference via LLM.chat method? That should show the full stack trace.

@stikkireddy
Copy link

i have the same issue but the environment i have access to;

the following just hangs :(.
the api server throws internal server error.

from vllm import LLM
llm = LLM(
  model="/root/models/mistralai/Pixtral-12B-2409",
  tokenizer_mode="mistral",
  served_model_name="mistralai/Pixtral-12B-2409",
  max_model_len=5*4096,
  guided_decoding_backend="outlines",
  limit_mm_per_prompt={"image": 5},
  tensor_parallel_size=4,
)

@ywang96
Copy link
Member

ywang96 commented Sep 14, 2024

I don't think guided_decoding/outlines officially supports mistral tokenizer (we still need to double check on this), and I don't think it's really vLLM's responsibility to make sure they work with each other if they don't. However, if they are indeed incompatible, then we should disable guided_decoding when mistral tokenizer is present.

Perhaps @patrickvonplaten you might have some thoughts for this?

@patrickvonplaten
Copy link
Contributor

For now can we raise a NotImplementedError? Make with a error message that asks for a contribution if people are interested in this feature?

@ywang96
Copy link
Member

ywang96 commented Sep 16, 2024

For now can we raise a NotImplementedError? Make with a error message that asks for a contribution if people are interested in this feature?

Yea, I think that's a good idea and something rather straightforward to do!

@gcalmettes
Copy link
Contributor

gcalmettes commented Sep 27, 2024

The latest code of lm-format-enforcer should now be compatible with the MistralTokenizer. There is no release yet, but installing the library from main should do the trick:

pip install git+https://github.com/noamgat/lm-format-enforcer.git --force-reinstall

@stikkireddy your code should run now, if you switch the guided_decoding_backend to lm-format-enforcer

from vllm import LLM
llm = LLM(
  model="/root/models/mistralai/Pixtral-12B-2409",
  tokenizer_mode="mistral",
  served_model_name="mistralai/Pixtral-12B-2409",
  max_model_len=5*4096,
  guided_decoding_backend="lm-format-enforcer",
  limit_mm_per_prompt={"image": 5},
  tensor_parallel_size=4,
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants