Streaming responses for Mistral models are getting truncated #215

ihmaws · 2024-09-25T04:19:29Z

Description
When using any of the Mistral models in streaming mode, the last token will get truncated if a stop sequence is added.

Reproducible Sample

from langchain_aws import ChatBedrock
from langchain_core.messages import HumanMessage, SystemMessage

messages = [
    SystemMessage(
        content="Respond in no more than 5-8 words."
    ),
    HumanMessage(
        content="Hello, how are you?"
    )
]

llm = ChatBedrock(
    model_id="mistral.mistral-small-2402-v1:0",
    streaming=True,
    model_kwargs={
        "stop": ["test"] #needed to trigger the bug; this causes Bedrock service to add last token alongside the stop_reason
    },
)

#print streamed chunks
for chunk in llm.stream(input=messages):
    print(chunk.content, end="", flush=True)

Example Truncated Response (Input = "Hello, how are you?")
I'm an AI, I don't have feelin

## Move yield of metrics chunk after generation chunk - when using mistral and streaming is enabled,the final chunk includes a stop_reason. There is nothing to say this final chunk doesn't also include some generated text. The existing implementation would result in that final chunk never getting sent back - this update moves the yield of the metrics chunk after the generation chunk - also included a change to include invocation metrics for cohere models Closes #215

- when using mistral and streaming is enabled,the final chunk includes a stop_reason. There is nothing to say this final chunk doesn't also include some generated text. The existing implementation would result in that final chunk never getting sent back - this update moves the yield of the metrics chunk after the generation chunk - also included a change to include invocation metrics for cohere models Closes langchain-ai#215 (cherry picked from commit e2c2f7c)

ihmaws · 2024-10-02T19:41:44Z

Thanks @3coins!

I have a backport to v0.1.18 ready (untested). I can create a PR once a branch to base off is ready: https://github.com/langchain-ai/langchain-aws/compare/v0.1.18...ihmaws:langchain-aws:dev/ihm/fix-mistral-streaming-v0.1.18?expand=1

ihmaws · 2024-10-03T13:24:59Z

Backport into v0.1 here: #222

_backport of fix into v0.1 branch_ Move yield of metrics chunk after generation chunk (#216) - when using mistral and streaming is enabled,the final chunk includes a stop_reason. There is nothing to say this final chunk doesn't also include some generated text. The existing implementation would result in that final chunk never getting sent back - this update moves the yield of the metrics chunk after the generation chunk - also included a change to include invocation metrics for cohere models Closes #215 (cherry picked from commit e2c2f7c)

langcarl bot added the investigate label Sep 25, 2024

ihmaws mentioned this issue Sep 25, 2024

Move yield of metrics chunk after generation chunk #216

Merged

3coins closed this as completed in #216 Oct 2, 2024

ihmaws mentioned this issue Oct 3, 2024

Move yield of metrics chunk after generation chunk (#216) #222

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Streaming responses for Mistral models are getting truncated #215

Streaming responses for Mistral models are getting truncated #215

ihmaws commented Sep 25, 2024

ihmaws commented Oct 2, 2024 •

edited

Loading

ihmaws commented Oct 3, 2024

Streaming responses for Mistral models are getting truncated #215

Streaming responses for Mistral models are getting truncated #215

Comments

ihmaws commented Sep 25, 2024

ihmaws commented Oct 2, 2024 • edited Loading

ihmaws commented Oct 3, 2024

ihmaws commented Oct 2, 2024 •

edited

Loading