Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Longer stop sequence not working in streaming mode #2577

Closed
andrePankraz opened this issue Jan 24, 2024 · 2 comments · Fixed by #3672
Closed

Longer stop sequence not working in streaming mode #2577

andrePankraz opened this issue Jan 24, 2024 · 2 comments · Fixed by #3672

Comments

@andrePankraz
Copy link

Have problems with LangChain default ReAct pipeline, because the stop tokens behavt differently between streaming and not streaming.

Example:

curl -X POST http://ai1.dev.init:8000/v1/chat/completions \
     -H "Content-Type: application/json" \
     -d '{
           "model": "mistralai/Mixtral-8x7B-Instruct-v0.1",
           "messages": [
               {
                   "role": "user",
                   "content": "Answer the following questions as best you can. You have access to the following tools:\n\nWikipedia: A wrapper around Wikipedia. Useful for when you need to answer general questions about people, places, companies, facts, historical events, or other subjects. Input should be a search query.\n\nUse the following format:\n\nQuestion: the input question you must answer\nThought: you should always think about what to do\nAction: the action to take, should be one of [Wikipedia]\nAction Input: the input to the action\nObservation: the result of the action\n... (this Thought/Action/Action Input/Observation can repeat N times)\nThought: I now know the final answer\nFinal Answer: the final answer to the original input question\n\nBegin!\n\nQuestion: what is LangChain?\nThought:"
               }
           ],
           "max_tokens": 1000,
           "n": 1,
           "stop": ["\nObservation"],
           "stream": false,
           "temperature": 0.2,
           "top_p": 0.1
         }'

Result:

{"id":"cmpl-5516211747854a32ad63d5ebb8d52d14","object":"chat.completion","created":522917,"model":"mistralai/Mixtral-8x7B-Instruct-v0.1","choices":[{"index":0,"message":{"role":"assistant","content":" I don't have any prior knowledge about LangChain, so I need to look it up. I will use the Wikipedia tool to find more information about it.\n\nAction: Wikipedia\nAction Input: LangChain\n"},"finish_reason":"stop"}],"usage":{"prompt_tokens":191,"total_tokens":239,"completion_tokens":48}}

It stops right before it would generate the stop sequence (multiple tokens): "\nObservation"

With stream: true - final chunks:

data: {"id": "cmpl-ca4890fbf3a741cc839d77792d9b01ac", "object": "chat.completion.chunk", "created": 522895, "model": "mistralai/Mixtral-8x7B-Instruct-v0.1", "choices": [{"index": 0, "delta": {"content": "Action"}, "finish_reason": null}]}

data: {"id": "cmpl-ca4890fbf3a741cc839d77792d9b01ac", "object": "chat.completion.chunk", "created": 522895, "model": "mistralai/Mixtral-8x7B-Instruct-v0.1", "choices": [{"index": 0, "delta": {"content": " Input"}, "finish_reason": null}]}

data: {"id": "cmpl-ca4890fbf3a741cc839d77792d9b01ac", "object": "chat.completion.chunk", "created": 522895, "model": "mistralai/Mixtral-8x7B-Instruct-v0.1", "choices": [{"index": 0, "delta": {"content": ":"}, "finish_reason": null}]}

data: {"id": "cmpl-ca4890fbf3a741cc839d77792d9b01ac", "object": "chat.completion.chunk", "created": 522895, "model": "mistralai/Mixtral-8x7B-Instruct-v0.1", "choices": [{"index": 0, "delta": {"content": " Lang"}, "finish_reason": null}]}

data: {"id": "cmpl-ca4890fbf3a741cc839d77792d9b01ac", "object": "chat.completion.chunk", "created": 522895, "model": "mistralai/Mixtral-8x7B-Instruct-v0.1", "choices": [{"index": 0, "delta": {"content": "Chain"}, "finish_reason": null}]}

data: {"id": "cmpl-ca4890fbf3a741cc839d77792d9b01ac", "object": "chat.completion.chunk", "created": 522895, "model": "mistralai/Mixtral-8x7B-Instruct-v0.1", "choices": [{"index": 0, "delta": {"content": "\n"}, "finish_reason": null}]}

data: {"id": "cmpl-ca4890fbf3a741cc839d77792d9b01ac", "object": "chat.completion.chunk", "created": 522895, "model": "mistralai/Mixtral-8x7B-Instruct-v0.1", "choices": [{"index": 0, "delta": {"content": "\n"}, "finish_reason": null}]}

data: {"id": "cmpl-ca4890fbf3a741cc839d77792d9b01ac", "object": "chat.completion.chunk", "created": 522895, "model": "mistralai/Mixtral-8x7B-Instruct-v0.1", "choices": [{"index": 0, "delta": {"content": "Observ"}, "finish_reason": null}]}

data: {"id": "cmpl-ca4890fbf3a741cc839d77792d9b01ac", "object": "chat.completion.chunk", "created": 522895, "model": "mistralai/Mixtral-8x7B-Instruct-v0.1", "choices": [{"index": 0, "delta": {}, "finish_reason": "stop"}], "usage": {"prompt_tokens": 191, "total_tokens": 238, "completion_tokens": 47}}

data: [DONE]

The final two stream tokens are "\n" and "Observ" - which are at the same time tjhe first two tokens of the full stop sequence "\nObservation"

Is this intentional? Shouldn't you gather tokens into chunks before you are sure, that no stop sequence matches (and beam search stuff) and than stream out multiple tokens as one chunk?

@rafan
Copy link

rafan commented Mar 20, 2024

I ran into the exact same issue. Feel like it streams asap instead of accumulating enough check no stop sequence before emitting.

@cyc00518
Copy link

cyc00518 commented Jun 2, 2024

@njhill
Thank you for your contribution. However, after upgrading from version 0.3.0 to 0.4.3, it seems that there are issues with using LangChain ReAct. Everything appears to be fine throughout the process, but in the end, the streaming output is not received.

Has anyone else encountered this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants