the openai-compatible model's answer is blank with streaming mode #2144

geosmart · 2024-01-23T14:34:03Z

Self Checks

I have searched for existing issues search for existing issues, including closed ones.
I confirm that I am using English to file this report (我已阅读并同意 Language Policy).

Dify version

0.4.9

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

I add a openai-compatible model(rwkv docker service), the api is working fine;
but in dify, the answer is not show.

I test blocking, the answer is show.

curl --location --request POST 'http://my-domain/v1/chat-messages' \
--header 'Authorization: Bearer app-xxx' \
--header 'Content-Type: application/json' \
--data-raw '{
    "inputs": {},
    "query": "who are you",
    "response_mode": "blocking",
    "conversation_id": "",
    "user": "dify-api"
}'
{"event": "message", "task_id": "d0ecb553-fa53-46e7-a76a-83f962795fa0", "id": "79d7e38e-9877-46bd-aad8-b2aa039ceb04", "message_id": "79d7e38e-9877-46bd-aad8-b2aa039ceb04", "mode": "chat", "answer": "\u6211\u662f\u4e00\u540d\u5168\u6808\u5f00\u53d1\u4eba\u5458\uff0c\u64c5\u957fpython,kotlin,markdown,\u719f\u6089flink,spark\u8ba1\u7b97\u7cfb\u7edf\uff0c\u5bf9mysql,elasticsearch\u7b49\u5b58\u50a8\u7cfb\u7edf\u6709\u5b9e\u6218\u7ecf\u9a8c\u3002\n", "metadata": {"usage": {"prompt_tokens": 72, "prompt_unit_price": "0", "prompt_price_unit": "0", "prompt_price": "0.0000000", "completion_tokens": 48, "completion_unit_price": "0", "completion_price_unit": "0", "completion_price": "0.0000000", "total_tokens": 120, "total_price": "0.0000000", "currency": "USD", "latency": 2.3984292540117167}}, "created_at": 1706019858, "conversation_id": "997fb182-88d7-46c9-a4d7-672437a6fd5a"}

I test the streaming, the answer is blank

curl --location --request POST 'http://my-domain/v1/chat-messages' \
--header 'Authorization: Bearer app-xxx' \
--header 'Content-Type: application/json' \
--data-raw '{
    "inputs": {},
    "query": "who are you",
    "response_mode": "streaming",
    "conversation_id": "",
    "user": "dify-api"
}'
event: ping

data: {"event": "message", "id": "f3f16af0-a327-43e8-9071-d143575f57d1", "task_id": "3e070c26-8d89-446e-bd9e-45200136eaa0", "message_id": "f3f16af0-a327-43e8-9071-d143575f57d1", "answer": "", "created_at": 1706019951, "conversation_id": "35e7467a-dbef-47c4-8336-d4330bc2b82a"}

data: {"event": "message_end", "task_id": "3e070c26-8d89-446e-bd9e-45200136eaa0", "id": "f3f16af0-a327-43e8-9071-d143575f57d1", "message_id": "f3f16af0-a327-43e8-9071-d143575f57d1", "conversation_id": "35e7467a-dbef-47c4-8336-d4330bc2b82a", "metadata": {"usage": {"prompt_tokens": 106, "prompt_unit_price": "0", "prompt_price_unit": "0", "prompt_price": "0.0000000", "completion_tokens": 0, "completion_unit_price": "0", "completion_price_unit": "0", "completion_price": "0.0000000", "total_tokens": 106, "total_price": "0.0000000", "currency": "USD", "latency": 14.101711608003825}}}

✔️ Expected Behavior

the streaming mode should working fine to show the answer

❌ Actual Behavior

with streaming mode the answer is blank.
also the preview page the answer is blank too

dosubot · 2024-01-23T14:35:01Z

🤖

Thank you for providing detailed information about the issue you're experiencing. It seems like the problem might be related to how the streaming mode is implemented in the dify codebase.

In the _handle_chat_generate_stream_response function, it generates LLMResultChunk objects for each message in the response. If the usage attribute of the message is not present, it still yields an LLMResultChunk object, but without setting the usage attribute. This could potentially result in an empty or blank response if the usage attribute is not present in the message.

Here is the relevant code:

def _handle_chat_generate_stream_response(self, model: str, prompt_messages: list[PromptMessage], 
                                              credentials: dict, response: Generator[MinimaxMessage, None, None]) \
        -> Generator[LLMResultChunk, None, None]:
        for message in response:
            if message.usage:
                usage = self._calc_response_usage(
                    model=model, credentials=credentials, 
                    prompt_tokens=message.usage['prompt_tokens'], 
                    completion_tokens=message.usage['completion_tokens']
                )
                yield LLMResultChunk(
                    model=model,
                    prompt_messages=prompt_messages,
                    delta=LLMResultChunkDelta(
                        index=0,
                        message=AssistantPromptMessage(
                            content=message.content,
                            tool_calls=[]
                        ),
                        usage=usage,
                        finish_reason=message.stop_reason if message.stop_reason else None,
                    ),
                )
            else:
                yield LLMResultChunk(
                    model=model,
                    prompt_messages=prompt_messages,
                    delta=LLMResultChunkDelta(
                        index=0,
                        message=AssistantPromptMessage(
                            content=message.content,
                            tool_calls=[]
                        ),
                        finish_reason=message.stop_reason if message.stop_reason else None,
                    ),
                )

You can find this code in the api/core/model_runtime/model_providers/minimax/llm/llm.py file.

If the usage attribute is not present in the message, the LLMResultChunk object is still yielded, but without setting the usage attribute. This could potentially result in an empty or blank response.

To resolve this issue, you might need to ensure that the usage attribute is always present in the message. If it's not possible to ensure this, you might need to modify the _handle_chat_generate_stream_response function to handle cases where the usage attribute is not present.

I hope this helps! If you have any other questions or need further clarification, please let me know.

Sources

api/core/model_runtime/model_providers/minimax/llm/llm.py

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

geosmart · 2024-01-24T01:28:42Z

I use openai_api_compatible model，so the source code is
openai_api_compatible/llm/llm.py

openai_api_compatible model test

blocking mode

i test my openai_api_compatible mode api, the blocking mode response has usage attribute.

curl --location --request POST 'http://my-domain/v1/chat/completions' \
--header 'User-Agent: Apifox/1.0.0 (https://apifox.com)' \
--header 'Content-Type: application/json' \
--data-raw '{
    "messages": [
        {
            "role": "system",
            "content": "xxx"
        },
        {
            "role": "user",
            "content": "who are you"
        }
    ],
    "temperature": 1.0,
    "top_p": 0.5,
    "max_tokens": 500,
    "stream": false
}'
# response
{
    "object": "chat.completion",
    "model": "rwkv-5-world-1b5",
    "usage": {
        "prompt_tokens": 0,
        "completion_tokens": 500,
        "total_tokens": 500
    },
    "choices": [
        {
            "message": {
                "role": "assistant",
                "content": "xxx
            },
            "index": 0,
            "finish_reason": "stop"
        }
    ]
}

streamimg mode

the sse response has no usage attribute. i think it's ok , openai.apifox.cn
the result is in the choices attr.

curl --location --request POST 'http://my-domain/v1/chat/completions' \
--header 'User-Agent: Apifox/1.0.0 (https://apifox.com)' \
--header 'Content-Type: application/json' \
--data-raw '{
    "messages": [
        {
            "role": "system",
            "content": "xxx"
        },
        {
            "role": "user",
            "content": "who are you"
        }
    ],
    "temperature": 1.0,
    "top_p": 0.5,
    "max_tokens": 500,
    "stream": true
}'
# response
data: {"object": "chat.completion.chunk", "model": "rwkv-5-world-1b5", "choices": [{"delta": {"content": " "}, "index": 0, "finish_reason": null}]}

data: {"object": "chat.completion.chunk", "model": "rwkv-5-world-1b5", "choices": [{"delta": {"content": "\u6211"}, "index": 0, "finish_reason": null}]}

data: {"object": "chat.completion.chunk", "model": "rwkv-5-world-1b5", "choices": [{"delta": {"content": "\u53eb"}, "index": 0, "finish_reason": null}]}

still i don't konw what's wrong

geosmart · 2024-01-24T02:59:29Z

I run the testcase of

dify/api/tests/integration_tests/model_runtime/openai_api_compatible/test_llm.py

Line 68 in 00f4e6e

def test_invoke_stream_model():

test_llm.py::test_invoke_stream_model FAILED                             [100%]
test_llm.py:67 (test_invoke_stream_model)

error at parse decoded_chunk

        for chunk in response.iter_lines(decode_unicode=True, delimiter='\n\n'):
            if chunk:
                decoded_chunk = chunk.strip().lstrip('data: ').lstrip()
                chunk_json = None
                try:
                    chunk_json = json.loads(decoded_chunk)
                # stream ended
                except json.JSONDecodeError as e:
                   # error to parse decoded_chunk
                    print("parse chunk_json error:" + e)
                    yield create_final_llm_result_chunk(
                        index=chunk_index + 1,
                        message=AssistantPromptMessage(content=""),
                        finish_reason="Non-JSON encountered."
                    )
                    break

decoded_chunk

{"object": "chat.completion.chunk", "model": "RWKV-5-World-1B5-v2-20231025-ctx4096", "choices": [{"delta": {"content": " I"}, "index": 0, "finish_reason": null}]}

data: {"object": "chat.completion.chunk", "model": "RWKV-5-World-1B5-v2-20231025-ctx4096", "choices": [{"delta": {"content": " am"}, "index": 0, "finish_reason": null}]}

data: [DONE]

@bowenliang123

bowenliang123 · 2024-01-24T03:14:54Z

The preset delimiter in openai-compatible provider is \n\n may not satisfy all the upstream LLMs of different types.

In my case, I would change the delimiter for qwen(千问) LLMs to \n\r,

if "qwen" in model.lower():
     delimiter = '\n\r'
else:
    delimiter = '\n\n'

geosmart · 2024-01-24T06:07:22Z

thanks very much. @bowenliang123

I change the souce code of openai_api_compatible/llm/llm.py to temporary integrate my rwkv model.

➜  dify docker exec -it dify-api cat core/model_runtime/model_providers/openai_api_compatible/llm/llm.py | grep 'rwkv' -C3
                )
            )

        if "rwkv" in model.lower():
            delimiter = '\r\n'
        else:
            delimiter = '\n\n'
--
                      chunk_json = json.loads(decoded_chunk)
                # stream ended
                except json.JSONDecodeError as e:
                    if "rwkv" in model.lower() and decoded_chunk == '[DONE]':
                        break
                    yield create_final_llm_result_chunk(
                        index=chunk_index + 1,

so maybe I need to impl a model_providers of rwkv in production

geosmart · 2024-01-24T06:24:02Z

In rwkv model api compatible case,
openai_api_compatible support to config chunk delimiter and end stop chunk in UI will be helpful

guchenhe · 2024-01-24T12:10:57Z

Hi @geosmart, seems like you've found a way to accommodate for it. To better support this on our end, my idea for an enhancement addressing these cases would be to add an optional delimiter field when configuring model credentials for OpenAI API-compatible models. When this field is filled by the user, we would use it to separate streaming chunks instead of \n\n as per OpenAI's API. I expect this to be a quick win and shouldn't cost a lot of time, would you like to make a go with a PR for it? :)

geosmart · 2024-01-24T12:41:49Z

ok, I will commit a pr for this

geosmart · 2024-01-25T02:25:27Z

openai_api_compatible model config window

jsboige · 2024-09-09T13:09:41Z

openai_api_compatible model config window

Thanks for this. It was the fix to get Oobabooga's text-generation-webui to work as a dify connector. Only found this issue after I was painfully able to debug the container, so I suppose it would need more visibility.

geosmart added the 🐞 bug Something isn't working label Jan 23, 2024

geosmart changed the title ~~the openai-compatible model's answer is blank~~ the openai-compatible model's answer is blank with streaming mode Jan 23, 2024

takatost assigned guchenhe Jan 23, 2024

guchenhe added 💪 enhancement New feature or request and removed 🐞 bug Something isn't working labels Jan 24, 2024

geosmart mentioned this issue Jan 25, 2024

feat: openai_api_compatible support config stream_mode_delimiter #2190

Merged

guchenhe linked a pull request Jan 26, 2024 that will close this issue

feat: openai_api_compatible support config stream_mode_delimiter #2190

Merged

guchenhe closed this as completed Jan 26, 2024

jsboige mentioned this issue Sep 10, 2024

typo in OpenAI response format oobabooga/text-generation-webui#6365

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

the openai-compatible model's answer is blank with streaming mode #2144

the openai-compatible model's answer is blank with streaming mode #2144

geosmart commented Jan 23, 2024

dosubot bot commented Jan 23, 2024 •

edited

Loading

geosmart commented Jan 24, 2024 •

edited

Loading

geosmart commented Jan 24, 2024 •

edited

Loading

bowenliang123 commented Jan 24, 2024 •

edited

Loading

geosmart commented Jan 24, 2024

geosmart commented Jan 24, 2024

guchenhe commented Jan 24, 2024

geosmart commented Jan 24, 2024

geosmart commented Jan 25, 2024 •

edited

Loading

jsboige commented Sep 9, 2024

the openai-compatible model's answer is blank with streaming mode #2144

the openai-compatible model's answer is blank with streaming mode #2144

Comments

geosmart commented Jan 23, 2024

Self Checks

Dify version

Cloud or Self Hosted

Steps to reproduce

✔️ Expected Behavior

❌ Actual Behavior

dosubot bot commented Jan 23, 2024 • edited Loading

Sources

geosmart commented Jan 24, 2024 • edited Loading

openai_api_compatible model test

blocking mode

streamimg mode

geosmart commented Jan 24, 2024 • edited Loading

bowenliang123 commented Jan 24, 2024 • edited Loading

geosmart commented Jan 24, 2024

geosmart commented Jan 24, 2024

guchenhe commented Jan 24, 2024

geosmart commented Jan 24, 2024

geosmart commented Jan 25, 2024 • edited Loading

jsboige commented Sep 9, 2024

dosubot bot commented Jan 23, 2024 •

edited

Loading

geosmart commented Jan 24, 2024 •

edited

Loading

geosmart commented Jan 24, 2024 •

edited

Loading

bowenliang123 commented Jan 24, 2024 •

edited

Loading

geosmart commented Jan 25, 2024 •

edited

Loading