[Bug]: Pixtral + guided_json fails with Internal Server Error #8429

pseudotensor · 2024-09-12T22:05:08Z

Your current environment

docker pull vllm/vllm-openai:latest
docker stop pixtral ; docker remove pixtral
docker run -d --restart=always \
    --runtime=nvidia \
    --gpus '"device=MIG-2ea01c20-8e9b-54a7-a91b-f308cd216a95"' \
    --shm-size=10.24gb \
    -p 5001:5001 \
        -e NCCL_IGNORE_DISABLED_P2P=1 \
    -e HUGGING_FACE_HUB_TOKEN=$HUGGING_FACE_HUB_TOKEN \
    -e VLLM_NCCL_SO_PATH=/usr/local/lib/python3.10/dist-packages/nvidia/nccl/lib/libnccl.so.2 \
    -v /etc/passwd:/etc/passwd:ro \
    -v /etc/group:/etc/group:ro \
    -u `id -u`:`id -g` \
    -v "${HOME}"/.cache:$HOME/.cache/ \
    -v "${HOME}"/.cache/huggingface:$HOME/.cache/huggingface \
    -v "${HOME}"/.cache/huggingface/hub:$HOME/.cache/huggingface/hub \
    -v "${HOME}"/.config:$HOME/.config/   -v "${HOME}"/.triton:$HOME/.triton/  \
    --network host \
    --name pixtral \
    vllm/vllm-openai:latest \
        --port=5001 \
        --host=0.0.0.0 \
        --model=mistralai/Pixtral-12B-2409 \
        --seed 1234 \
        --tensor-parallel-size=1 \
        --gpu-memory-utilization 0.98 \
        --enforce-eager \
        --tokenizer_mode mistral \
        --limit_mm_per_prompt 'image=8' \
        --max-model-len=32768 \
        --max-num-batched-tokens=32768 \
        --max-log-len=100 \
        --download-dir=$HOME/.cache/huggingface/hub &>> logs.vllm_server.pixtral.txt

All seems to work for sending an image query. But as soon as I try any simple guided_json or guided_choice, it always fails.

from openai import OpenAI

base_url = 'http://IP:80/v1'  # replace IP
api_key = "EMPTY"

client_args = dict(base_url=base_url, api_key=api_key)
openai_client = OpenAI(**client_args)


prompt = """<all_documents>
<doc>
<name>roses.pdf</name>
<page>1</page>
<text>
I like red roses, and red elephants.
</text>
</doc>
</all_documents>


<response_format_instructions>

Ensure you follow this JSON schema, and ensure to use the same key names as the schema:
\`\`\`json
{"color": {"type": "string"}}
\`\`\`


</response_format_instructions>



What do I like?"""

guided_json = {"type": "object",
    "properties": {
        "color": {
        "type": "string"
        }
        }
}

messages = [{'role': 'user', 'content': prompt}]
stream = False
client_kwargs = dict(model='mistralai/Pixtral-12B-2409',
                     max_tokens=2048, stream=stream, messages=messages,
                     response_format=dict(type='json_object'),
                     extra_body=dict(guided_json=guided_json))
client = openai_client.chat.completions

responses = client.create(**client_kwargs)
text = responses.choices[0].message.content
print(text)

gives:

Traceback (most recent call last):
  File "/home/jon/h2ogpt/check_openai_json1.py", line 51, in <module>
    responses = client.create(**client_kwargs)
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/openai/_utils/_utils.py", line 274, in wrapper
    return func(*args, **kwargs)
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/openai/resources/chat/completions.py", line 668, in create
    return self._post(
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/openai/_base_client.py", line 936, in request
    return self._request(
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/openai/_base_client.py", line 1025, in _request
    return self._retry_request(
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/openai/_base_client.py", line 1074, in _retry_request
    return self._request(
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/openai/_base_client.py", line 1025, in _request
    return self._retry_request(
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/openai/_base_client.py", line 1074, in _retry_request
    return self._request(
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/openai/_base_client.py", line 1040, in _request
    raise self._make_status_error_from_response(err.response) from None
openai.InternalServerError: Error code: 500

and vllm shows:

INFO:     172.16.0.101:3146 - "GET /v1/models HTTP/1.1" 200 OK
INFO:     172.16.0.101:3146 - "GET /v1/models HTTP/1.1" 200 OK
INFO:     172.16.0.101:3146 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error

Model Input Dumps

No response

🐛 Describe the bug

See above.

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

The text was updated successfully, but these errors were encountered:

DarkLight1337 · 2024-09-13T00:35:35Z

Can you show the server-side stack trace?

pseudotensor · 2024-09-13T00:37:35Z

I did, it only shows that "Internal Server Error" as I mentioned there's literally nothing else in the server trace except that. No stack trace etc.

DarkLight1337 · 2024-09-13T00:47:55Z

To better debug the issue, can you use guided decoding in offline inference via LLM.chat method? That should show the full stack trace.

stikkireddy · 2024-09-13T16:49:12Z

i have the same issue but the environment i have access to;

the following just hangs :(.
the api server throws internal server error.

from vllm import LLM
llm = LLM(
  model="/root/models/mistralai/Pixtral-12B-2409",
  tokenizer_mode="mistral",
  served_model_name="mistralai/Pixtral-12B-2409",
  max_model_len=5*4096,
  guided_decoding_backend="outlines",
  limit_mm_per_prompt={"image": 5},
  tensor_parallel_size=4,
)

ywang96 · 2024-09-14T07:56:15Z

I don't think guided_decoding/outlines officially supports mistral tokenizer (we still need to double check on this), and I don't think it's really vLLM's responsibility to make sure they work with each other if they don't. However, if they are indeed incompatible, then we should disable guided_decoding when mistral tokenizer is present.

Perhaps @patrickvonplaten you might have some thoughts for this?

patrickvonplaten · 2024-09-16T15:55:31Z

For now can we raise a NotImplementedError? Make with a error message that asks for a contribution if people are interested in this feature?

ywang96 · 2024-09-16T16:05:56Z

For now can we raise a NotImplementedError? Make with a error message that asks for a contribution if people are interested in this feature?

Yea, I think that's a good idea and something rather straightforward to do!

gcalmettes · 2024-09-27T07:40:23Z

The latest code of lm-format-enforcer should now be compatible with the MistralTokenizer. There is no release yet, but installing the library from main should do the trick:

pip install git+https://github.com/noamgat/lm-format-enforcer.git --force-reinstall

@stikkireddy your code should run now, if you switch the guided_decoding_backend to lm-format-enforcer

from vllm import LLM
llm = LLM(
  model="/root/models/mistralai/Pixtral-12B-2409",
  tokenizer_mode="mistral",
  served_model_name="mistralai/Pixtral-12B-2409",
  max_model_len=5*4096,
  guided_decoding_backend="lm-format-enforcer",
  limit_mm_per_prompt={"image": 5},
  tensor_parallel_size=4,
)

pseudotensor added the bug Something isn't working label Sep 12, 2024

patrickvonplaten mentioned this issue Sep 16, 2024

[Bug]: MistralTokenizer object has no attribute 'get_vocab' #8358

Closed

1 task

ywang96 mentioned this issue Sep 16, 2024

[Misc][Bugfix] Disable guided decoding for mistral tokenizer #8521

Merged

ywang96 closed this as completed in #8521 Sep 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Pixtral + guided_json fails with Internal Server Error #8429

[Bug]: Pixtral + guided_json fails with Internal Server Error #8429

pseudotensor commented Sep 12, 2024 •

edited

Loading

DarkLight1337 commented Sep 13, 2024

pseudotensor commented Sep 13, 2024

DarkLight1337 commented Sep 13, 2024 •

edited

Loading

stikkireddy commented Sep 13, 2024

ywang96 commented Sep 14, 2024 •

edited

Loading

patrickvonplaten commented Sep 16, 2024

ywang96 commented Sep 16, 2024

gcalmettes commented Sep 27, 2024 •

edited

Loading

[Bug]: Pixtral + guided_json fails with Internal Server Error #8429

[Bug]: Pixtral + guided_json fails with Internal Server Error #8429

Comments

pseudotensor commented Sep 12, 2024 • edited Loading

Your current environment

Model Input Dumps

🐛 Describe the bug

Before submitting a new issue...

DarkLight1337 commented Sep 13, 2024

pseudotensor commented Sep 13, 2024

DarkLight1337 commented Sep 13, 2024 • edited Loading

stikkireddy commented Sep 13, 2024

ywang96 commented Sep 14, 2024 • edited Loading

patrickvonplaten commented Sep 16, 2024

ywang96 commented Sep 16, 2024

gcalmettes commented Sep 27, 2024 • edited Loading

pseudotensor commented Sep 12, 2024 •

edited

Loading

DarkLight1337 commented Sep 13, 2024 •

edited

Loading

ywang96 commented Sep 14, 2024 •

edited

Loading

gcalmettes commented Sep 27, 2024 •

edited

Loading