-
Notifications
You must be signed in to change notification settings - Fork 8.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
the openai-compatible model's answer is blank with streaming mode #2144
Comments
🤖 Thank you for providing detailed information about the issue you're experiencing. It seems like the problem might be related to how the streaming mode is implemented in the dify codebase. In the Here is the relevant code: def _handle_chat_generate_stream_response(self, model: str, prompt_messages: list[PromptMessage],
credentials: dict, response: Generator[MinimaxMessage, None, None]) \
-> Generator[LLMResultChunk, None, None]:
for message in response:
if message.usage:
usage = self._calc_response_usage(
model=model, credentials=credentials,
prompt_tokens=message.usage['prompt_tokens'],
completion_tokens=message.usage['completion_tokens']
)
yield LLMResultChunk(
model=model,
prompt_messages=prompt_messages,
delta=LLMResultChunkDelta(
index=0,
message=AssistantPromptMessage(
content=message.content,
tool_calls=[]
),
usage=usage,
finish_reason=message.stop_reason if message.stop_reason else None,
),
)
else:
yield LLMResultChunk(
model=model,
prompt_messages=prompt_messages,
delta=LLMResultChunkDelta(
index=0,
message=AssistantPromptMessage(
content=message.content,
tool_calls=[]
),
finish_reason=message.stop_reason if message.stop_reason else None,
),
) You can find this code in the api/core/model_runtime/model_providers/minimax/llm/llm.py file. If the To resolve this issue, you might need to ensure that the I hope this helps! If you have any other questions or need further clarification, please let me know. SourcesThis response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
I use openai_api_compatible model,so the source code is openai_api_compatible model testblocking modei test my openai_api_compatible mode api, the blocking mode response has usage attribute. curl --location --request POST 'http://my-domain/v1/chat/completions' \
--header 'User-Agent: Apifox/1.0.0 (https://apifox.com)' \
--header 'Content-Type: application/json' \
--data-raw '{
"messages": [
{
"role": "system",
"content": "xxx"
},
{
"role": "user",
"content": "who are you"
}
],
"temperature": 1.0,
"top_p": 0.5,
"max_tokens": 500,
"stream": false
}'
# response
{
"object": "chat.completion",
"model": "rwkv-5-world-1b5",
"usage": {
"prompt_tokens": 0,
"completion_tokens": 500,
"total_tokens": 500
},
"choices": [
{
"message": {
"role": "assistant",
"content": "xxx
},
"index": 0,
"finish_reason": "stop"
}
]
}
streamimg modethe sse response has no usage attribute. i think it's ok , openai.apifox.cn curl --location --request POST 'http://my-domain/v1/chat/completions' \
--header 'User-Agent: Apifox/1.0.0 (https://apifox.com)' \
--header 'Content-Type: application/json' \
--data-raw '{
"messages": [
{
"role": "system",
"content": "xxx"
},
{
"role": "user",
"content": "who are you"
}
],
"temperature": 1.0,
"top_p": 0.5,
"max_tokens": 500,
"stream": true
}'
# response
data: {"object": "chat.completion.chunk", "model": "rwkv-5-world-1b5", "choices": [{"delta": {"content": " "}, "index": 0, "finish_reason": null}]}
data: {"object": "chat.completion.chunk", "model": "rwkv-5-world-1b5", "choices": [{"delta": {"content": "\u6211"}, "index": 0, "finish_reason": null}]}
data: {"object": "chat.completion.chunk", "model": "rwkv-5-world-1b5", "choices": [{"delta": {"content": "\u53eb"}, "index": 0, "finish_reason": null}]}
|
I run the testcase of
error at parse decoded_chunk for chunk in response.iter_lines(decode_unicode=True, delimiter='\n\n'):
if chunk:
decoded_chunk = chunk.strip().lstrip('data: ').lstrip()
chunk_json = None
try:
chunk_json = json.loads(decoded_chunk)
# stream ended
except json.JSONDecodeError as e:
# error to parse decoded_chunk
print("parse chunk_json error:" + e)
yield create_final_llm_result_chunk(
index=chunk_index + 1,
message=AssistantPromptMessage(content=""),
finish_reason="Non-JSON encountered."
)
break decoded_chunk
|
The preset delimiter in openai-compatible provider is In my case, I would change the delimiter for
|
thanks very much. @bowenliang123 I change the souce code of ➜ dify docker exec -it dify-api cat core/model_runtime/model_providers/openai_api_compatible/llm/llm.py | grep 'rwkv' -C3
)
)
if "rwkv" in model.lower():
delimiter = '\r\n'
else:
delimiter = '\n\n'
--
chunk_json = json.loads(decoded_chunk)
# stream ended
except json.JSONDecodeError as e:
if "rwkv" in model.lower() and decoded_chunk == '[DONE]':
break
yield create_final_llm_result_chunk(
index=chunk_index + 1, so maybe I need to impl a model_providers of rwkv in production |
In rwkv model api compatible case, |
Hi @geosmart, seems like you've found a way to accommodate for it. To better support this on our end, my idea for an enhancement addressing these cases would be to add an optional |
ok, I will commit a pr for this |
Thanks for this. It was the fix to get Oobabooga's text-generation-webui to work as a dify connector. Only found this issue after I was painfully able to debug the container, so I suppose it would need more visibility. |
Self Checks
Dify version
0.4.9
Cloud or Self Hosted
Self Hosted (Docker)
Steps to reproduce
I add a openai-compatible model(rwkv docker service), the api is working fine;
but in dify, the answer is not show.
I test blocking, the answer is show.
I test the streaming, the answer is blank
✔️ Expected Behavior
the streaming mode should working fine to show the answer
❌ Actual Behavior
with streaming mode the answer is blank.
also the preview page the answer is blank too
The text was updated successfully, but these errors were encountered: