Longer stop sequence not working in streaming mode #2577

andrePankraz · 2024-01-24T08:54:56Z

Have problems with LangChain default ReAct pipeline, because the stop tokens behavt differently between streaming and not streaming.

Example:

curl -X POST http://ai1.dev.init:8000/v1/chat/completions \
     -H "Content-Type: application/json" \
     -d '{
           "model": "mistralai/Mixtral-8x7B-Instruct-v0.1",
           "messages": [
               {
                   "role": "user",
                   "content": "Answer the following questions as best you can. You have access to the following tools:\n\nWikipedia: A wrapper around Wikipedia. Useful for when you need to answer general questions about people, places, companies, facts, historical events, or other subjects. Input should be a search query.\n\nUse the following format:\n\nQuestion: the input question you must answer\nThought: you should always think about what to do\nAction: the action to take, should be one of [Wikipedia]\nAction Input: the input to the action\nObservation: the result of the action\n... (this Thought/Action/Action Input/Observation can repeat N times)\nThought: I now know the final answer\nFinal Answer: the final answer to the original input question\n\nBegin!\n\nQuestion: what is LangChain?\nThought:"
               }
           ],
           "max_tokens": 1000,
           "n": 1,
           "stop": ["\nObservation"],
           "stream": false,
           "temperature": 0.2,
           "top_p": 0.1
         }'

Result:

{"id":"cmpl-5516211747854a32ad63d5ebb8d52d14","object":"chat.completion","created":522917,"model":"mistralai/Mixtral-8x7B-Instruct-v0.1","choices":[{"index":0,"message":{"role":"assistant","content":" I don't have any prior knowledge about LangChain, so I need to look it up. I will use the Wikipedia tool to find more information about it.\n\nAction: Wikipedia\nAction Input: LangChain\n"},"finish_reason":"stop"}],"usage":{"prompt_tokens":191,"total_tokens":239,"completion_tokens":48}}

It stops right before it would generate the stop sequence (multiple tokens): "\nObservation"

With stream: true - final chunks:

data: {"id": "cmpl-ca4890fbf3a741cc839d77792d9b01ac", "object": "chat.completion.chunk", "created": 522895, "model": "mistralai/Mixtral-8x7B-Instruct-v0.1", "choices": [{"index": 0, "delta": {"content": "Action"}, "finish_reason": null}]}

data: {"id": "cmpl-ca4890fbf3a741cc839d77792d9b01ac", "object": "chat.completion.chunk", "created": 522895, "model": "mistralai/Mixtral-8x7B-Instruct-v0.1", "choices": [{"index": 0, "delta": {"content": " Input"}, "finish_reason": null}]}

data: {"id": "cmpl-ca4890fbf3a741cc839d77792d9b01ac", "object": "chat.completion.chunk", "created": 522895, "model": "mistralai/Mixtral-8x7B-Instruct-v0.1", "choices": [{"index": 0, "delta": {"content": ":"}, "finish_reason": null}]}

data: {"id": "cmpl-ca4890fbf3a741cc839d77792d9b01ac", "object": "chat.completion.chunk", "created": 522895, "model": "mistralai/Mixtral-8x7B-Instruct-v0.1", "choices": [{"index": 0, "delta": {"content": " Lang"}, "finish_reason": null}]}

data: {"id": "cmpl-ca4890fbf3a741cc839d77792d9b01ac", "object": "chat.completion.chunk", "created": 522895, "model": "mistralai/Mixtral-8x7B-Instruct-v0.1", "choices": [{"index": 0, "delta": {"content": "Chain"}, "finish_reason": null}]}

data: {"id": "cmpl-ca4890fbf3a741cc839d77792d9b01ac", "object": "chat.completion.chunk", "created": 522895, "model": "mistralai/Mixtral-8x7B-Instruct-v0.1", "choices": [{"index": 0, "delta": {"content": "\n"}, "finish_reason": null}]}

data: {"id": "cmpl-ca4890fbf3a741cc839d77792d9b01ac", "object": "chat.completion.chunk", "created": 522895, "model": "mistralai/Mixtral-8x7B-Instruct-v0.1", "choices": [{"index": 0, "delta": {"content": "\n"}, "finish_reason": null}]}

data: {"id": "cmpl-ca4890fbf3a741cc839d77792d9b01ac", "object": "chat.completion.chunk", "created": 522895, "model": "mistralai/Mixtral-8x7B-Instruct-v0.1", "choices": [{"index": 0, "delta": {"content": "Observ"}, "finish_reason": null}]}

data: {"id": "cmpl-ca4890fbf3a741cc839d77792d9b01ac", "object": "chat.completion.chunk", "created": 522895, "model": "mistralai/Mixtral-8x7B-Instruct-v0.1", "choices": [{"index": 0, "delta": {}, "finish_reason": "stop"}], "usage": {"prompt_tokens": 191, "total_tokens": 238, "completion_tokens": 47}}

data: [DONE]

The final two stream tokens are "\n" and "Observ" - which are at the same time tjhe first two tokens of the full stop sequence "\nObservation"

Is this intentional? Shouldn't you gather tokens into chunks before you are sure, that no stop sequence matches (and beam search stuff) and than stream out multiple tokens as one chunk?

The text was updated successfully, but these errors were encountered:

rafan · 2024-03-20T16:03:20Z

I ran into the exact same issue. Feel like it streams asap instead of accumulating enough check no stop sequence before emitting.

cyc00518 · 2024-06-02T06:53:16Z

@njhill
Thank you for your contribution. However, after upgrading from version 0.3.0 to 0.4.3, it seems that there are issues with using LangChain ReAct. Everything appears to be fine throughout the process, but in the end, the streaming output is not received.

Has anyone else encountered this issue?

njhill mentioned this issue Mar 28, 2024

[BugFix] Fix handling of stop strings and stop token ids #3672

Merged

njhill closed this as completed in #3672 Apr 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Longer stop sequence not working in streaming mode #2577

Longer stop sequence not working in streaming mode #2577

andrePankraz commented Jan 24, 2024

rafan commented Mar 20, 2024

cyc00518 commented Jun 2, 2024

Longer stop sequence not working in streaming mode #2577

Longer stop sequence not working in streaming mode #2577

Comments

andrePankraz commented Jan 24, 2024

rafan commented Mar 20, 2024

cyc00518 commented Jun 2, 2024