Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the openai-compatible model's answer is blank with streaming mode #2144

Closed
2 tasks done
geosmart opened this issue Jan 23, 2024 · 10 comments · Fixed by #2190
Closed
2 tasks done

the openai-compatible model's answer is blank with streaming mode #2144

geosmart opened this issue Jan 23, 2024 · 10 comments · Fixed by #2190
Assignees
Labels
💪 enhancement New feature or request

Comments

@geosmart
Copy link
Contributor

Self Checks

Dify version

0.4.9

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

I add a openai-compatible model(rwkv docker service), the api is working fine;
but in dify, the answer is not show.

I test blocking, the answer is show.

curl --location --request POST 'http://my-domain/v1/chat-messages' \
--header 'Authorization: Bearer app-xxx' \
--header 'Content-Type: application/json' \
--data-raw '{
    "inputs": {},
    "query": "who are you",
    "response_mode": "blocking",
    "conversation_id": "",
    "user": "dify-api"
}'
{"event": "message", "task_id": "d0ecb553-fa53-46e7-a76a-83f962795fa0", "id": "79d7e38e-9877-46bd-aad8-b2aa039ceb04", "message_id": "79d7e38e-9877-46bd-aad8-b2aa039ceb04", "mode": "chat", "answer": "\u6211\u662f\u4e00\u540d\u5168\u6808\u5f00\u53d1\u4eba\u5458\uff0c\u64c5\u957fpython,kotlin,markdown,\u719f\u6089flink,spark\u8ba1\u7b97\u7cfb\u7edf\uff0c\u5bf9mysql,elasticsearch\u7b49\u5b58\u50a8\u7cfb\u7edf\u6709\u5b9e\u6218\u7ecf\u9a8c\u3002\n", "metadata": {"usage": {"prompt_tokens": 72, "prompt_unit_price": "0", "prompt_price_unit": "0", "prompt_price": "0.0000000", "completion_tokens": 48, "completion_unit_price": "0", "completion_price_unit": "0", "completion_price": "0.0000000", "total_tokens": 120, "total_price": "0.0000000", "currency": "USD", "latency": 2.3984292540117167}}, "created_at": 1706019858, "conversation_id": "997fb182-88d7-46c9-a4d7-672437a6fd5a"}

blocking

I test the streaming, the answer is blank

curl --location --request POST 'http://my-domain/v1/chat-messages' \
--header 'Authorization: Bearer app-xxx' \
--header 'Content-Type: application/json' \
--data-raw '{
    "inputs": {},
    "query": "who are you",
    "response_mode": "streaming",
    "conversation_id": "",
    "user": "dify-api"
}'
event: ping

data: {"event": "message", "id": "f3f16af0-a327-43e8-9071-d143575f57d1", "task_id": "3e070c26-8d89-446e-bd9e-45200136eaa0", "message_id": "f3f16af0-a327-43e8-9071-d143575f57d1", "answer": "", "created_at": 1706019951, "conversation_id": "35e7467a-dbef-47c4-8336-d4330bc2b82a"}

data: {"event": "message_end", "task_id": "3e070c26-8d89-446e-bd9e-45200136eaa0", "id": "f3f16af0-a327-43e8-9071-d143575f57d1", "message_id": "f3f16af0-a327-43e8-9071-d143575f57d1", "conversation_id": "35e7467a-dbef-47c4-8336-d4330bc2b82a", "metadata": {"usage": {"prompt_tokens": 106, "prompt_unit_price": "0", "prompt_price_unit": "0", "prompt_price": "0.0000000", "completion_tokens": 0, "completion_unit_price": "0", "completion_price_unit": "0", "completion_price": "0.0000000", "total_tokens": 106, "total_price": "0.0000000", "currency": "USD", "latency": 14.101711608003825}}}

blank-streaming

✔️ Expected Behavior

the streaming mode should working fine to show the answer

❌ Actual Behavior

with streaming mode the answer is blank.
also the preview page the answer is blank too

@geosmart geosmart added the 🐞 bug Something isn't working label Jan 23, 2024
Copy link

dosubot bot commented Jan 23, 2024

🤖

Thank you for providing detailed information about the issue you're experiencing. It seems like the problem might be related to how the streaming mode is implemented in the dify codebase.

In the _handle_chat_generate_stream_response function, it generates LLMResultChunk objects for each message in the response. If the usage attribute of the message is not present, it still yields an LLMResultChunk object, but without setting the usage attribute. This could potentially result in an empty or blank response if the usage attribute is not present in the message.

Here is the relevant code:

def _handle_chat_generate_stream_response(self, model: str, prompt_messages: list[PromptMessage], 
                                              credentials: dict, response: Generator[MinimaxMessage, None, None]) \
        -> Generator[LLMResultChunk, None, None]:
        for message in response:
            if message.usage:
                usage = self._calc_response_usage(
                    model=model, credentials=credentials, 
                    prompt_tokens=message.usage['prompt_tokens'], 
                    completion_tokens=message.usage['completion_tokens']
                )
                yield LLMResultChunk(
                    model=model,
                    prompt_messages=prompt_messages,
                    delta=LLMResultChunkDelta(
                        index=0,
                        message=AssistantPromptMessage(
                            content=message.content,
                            tool_calls=[]
                        ),
                        usage=usage,
                        finish_reason=message.stop_reason if message.stop_reason else None,
                    ),
                )
            else:
                yield LLMResultChunk(
                    model=model,
                    prompt_messages=prompt_messages,
                    delta=LLMResultChunkDelta(
                        index=0,
                        message=AssistantPromptMessage(
                            content=message.content,
                            tool_calls=[]
                        ),
                        finish_reason=message.stop_reason if message.stop_reason else None,
                    ),
                )

You can find this code in the api/core/model_runtime/model_providers/minimax/llm/llm.py file.

If the usage attribute is not present in the message, the LLMResultChunk object is still yielded, but without setting the usage attribute. This could potentially result in an empty or blank response.

To resolve this issue, you might need to ensure that the usage attribute is always present in the message. If it's not possible to ensure this, you might need to modify the _handle_chat_generate_stream_response function to handle cases where the usage attribute is not present.

I hope this helps! If you have any other questions or need further clarification, please let me know.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

@geosmart geosmart changed the title the openai-compatible model's answer is blank the openai-compatible model's answer is blank with streaming mode Jan 23, 2024
@geosmart
Copy link
Contributor Author

geosmart commented Jan 24, 2024

I use openai_api_compatible model,so the source code is
openai_api_compatible/llm/llm.py

openai_api_compatible model test

blocking mode

i test my openai_api_compatible mode api, the blocking mode response has usage attribute.

curl --location --request POST 'http://my-domain/v1/chat/completions' \
--header 'User-Agent: Apifox/1.0.0 (https://apifox.com)' \
--header 'Content-Type: application/json' \
--data-raw '{
    "messages": [
        {
            "role": "system",
            "content": "xxx"
        },
        {
            "role": "user",
            "content": "who are you"
        }
    ],
    "temperature": 1.0,
    "top_p": 0.5,
    "max_tokens": 500,
    "stream": false
}'
# response
{
    "object": "chat.completion",
    "model": "rwkv-5-world-1b5",
    "usage": {
        "prompt_tokens": 0,
        "completion_tokens": 500,
        "total_tokens": 500
    },
    "choices": [
        {
            "message": {
                "role": "assistant",
                "content": "xxx
            },
            "index": 0,
            "finish_reason": "stop"
        }
    ]
}

streamimg mode

the sse response has no usage attribute. i think it's ok , openai.apifox.cn
the result is in the choices attr.

curl --location --request POST 'http://my-domain/v1/chat/completions' \
--header 'User-Agent: Apifox/1.0.0 (https://apifox.com)' \
--header 'Content-Type: application/json' \
--data-raw '{
    "messages": [
        {
            "role": "system",
            "content": "xxx"
        },
        {
            "role": "user",
            "content": "who are you"
        }
    ],
    "temperature": 1.0,
    "top_p": 0.5,
    "max_tokens": 500,
    "stream": true
}'
# response
data: {"object": "chat.completion.chunk", "model": "rwkv-5-world-1b5", "choices": [{"delta": {"content": " "}, "index": 0, "finish_reason": null}]}

data: {"object": "chat.completion.chunk", "model": "rwkv-5-world-1b5", "choices": [{"delta": {"content": "\u6211"}, "index": 0, "finish_reason": null}]}

data: {"object": "chat.completion.chunk", "model": "rwkv-5-world-1b5", "choices": [{"delta": {"content": "\u53eb"}, "index": 0, "finish_reason": null}]}

still i don't konw what's wrong

@geosmart
Copy link
Contributor Author

geosmart commented Jan 24, 2024

I run the testcase of

test_llm.py::test_invoke_stream_model FAILED                             [100%]
test_llm.py:67 (test_invoke_stream_model)

error at parse decoded_chunk

        for chunk in response.iter_lines(decode_unicode=True, delimiter='\n\n'):
            if chunk:
                decoded_chunk = chunk.strip().lstrip('data: ').lstrip()
                chunk_json = None
                try:
                    chunk_json = json.loads(decoded_chunk)
                # stream ended
                except json.JSONDecodeError as e:
                   # error to parse decoded_chunk
                    print("parse chunk_json error:" + e)
                    yield create_final_llm_result_chunk(
                        index=chunk_index + 1,
                        message=AssistantPromptMessage(content=""),
                        finish_reason="Non-JSON encountered."
                    )
                    break

decoded_chunk

{"object": "chat.completion.chunk", "model": "RWKV-5-World-1B5-v2-20231025-ctx4096", "choices": [{"delta": {"content": " I"}, "index": 0, "finish_reason": null}]}

data: {"object": "chat.completion.chunk", "model": "RWKV-5-World-1B5-v2-20231025-ctx4096", "choices": [{"delta": {"content": " am"}, "index": 0, "finish_reason": null}]}

data: [DONE]

@bowenliang123

@bowenliang123
Copy link
Contributor

bowenliang123 commented Jan 24, 2024

The preset delimiter in openai-compatible provider is \n\n may not satisfy all the upstream LLMs of different types.

In my case, I would change the delimiter for qwen(千问) LLMs to \n\r,

if "qwen" in model.lower():
     delimiter = '\n\r'
else:
    delimiter = '\n\n'

@geosmart
Copy link
Contributor Author

thanks very much. @bowenliang123

I change the souce code of openai_api_compatible/llm/llm.py to temporary integrate my rwkv model.

➜  dify docker exec -it dify-api cat core/model_runtime/model_providers/openai_api_compatible/llm/llm.py | grep 'rwkv' -C3
                )
            )

        if "rwkv" in model.lower():
            delimiter = '\r\n'
        else:
            delimiter = '\n\n'
--
                      chunk_json = json.loads(decoded_chunk)
                # stream ended
                except json.JSONDecodeError as e:
                    if "rwkv" in model.lower() and decoded_chunk == '[DONE]':
                        break
                    yield create_final_llm_result_chunk(
                        index=chunk_index + 1,

so maybe I need to impl a model_providers of rwkv in production

@geosmart
Copy link
Contributor Author

In rwkv model api compatible case,
openai_api_compatible support to config chunk delimiter and end stop chunk in UI will be helpful

@guchenhe
Copy link
Collaborator

Hi @geosmart, seems like you've found a way to accommodate for it. To better support this on our end, my idea for an enhancement addressing these cases would be to add an optional delimiter field when configuring model credentials for OpenAI API-compatible models. When this field is filled by the user, we would use it to separate streaming chunks instead of \n\n as per OpenAI's API. I expect this to be a quick win and shouldn't cost a lot of time, would you like to make a go with a PR for it? :)

@geosmart
Copy link
Contributor Author

ok, I will commit a pr for this

@guchenhe guchenhe added 💪 enhancement New feature or request and removed 🐞 bug Something isn't working labels Jan 24, 2024
@geosmart
Copy link
Contributor Author

geosmart commented Jan 25, 2024

openai_api_compatible model config window
1111

@jsboige
Copy link

jsboige commented Sep 9, 2024

openai_api_compatible model config window

Thanks for this. It was the fix to get Oobabooga's text-generation-webui to work as a dify connector. Only found this issue after I was painfully able to debug the container, so I suppose it would need more visibility.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💪 enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants