Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streaming responses for Mistral models are getting truncated #215

Closed
ihmaws opened this issue Sep 25, 2024 · 2 comments · Fixed by #216
Closed

Streaming responses for Mistral models are getting truncated #215

ihmaws opened this issue Sep 25, 2024 · 2 comments · Fixed by #216

Comments

@ihmaws
Copy link
Contributor

ihmaws commented Sep 25, 2024

Description
When using any of the Mistral models in streaming mode, the last token will get truncated if a stop sequence is added.

Reproducible Sample

from langchain_aws import ChatBedrock
from langchain_core.messages import HumanMessage, SystemMessage

messages = [
    SystemMessage(
        content="Respond in no more than 5-8 words."
    ),
    HumanMessage(
        content="Hello, how are you?"
    )
]

llm = ChatBedrock(
    model_id="mistral.mistral-small-2402-v1:0",
    streaming=True,
    model_kwargs={
        "stop": ["test"] #needed to trigger the bug; this causes Bedrock service to add last token alongside the stop_reason
    },
)

#print streamed chunks
for chunk in llm.stream(input=messages):
    print(chunk.content, end="", flush=True)

Example Truncated Response (Input = "Hello, how are you?")
I'm an AI, I don't have feelin

@langcarl langcarl bot added the investigate label Sep 25, 2024
3coins pushed a commit that referenced this issue Oct 2, 2024
## Move yield of metrics chunk after generation chunk
- when using mistral and streaming is enabled,the final chunk includes a
stop_reason. There is nothing to say this final chunk doesn't also
include some generated text. The existing implementation would result in
that final chunk never getting sent back
- this update moves the yield of the metrics chunk after the generation
chunk
- also included a change to include invocation metrics for cohere models

Closes #215
ihmaws added a commit to ihmaws/langchain-aws that referenced this issue Oct 2, 2024
- when using mistral and streaming is enabled,the final chunk includes a
stop_reason. There is nothing to say this final chunk doesn't also
include some generated text. The existing implementation would result in
that final chunk never getting sent back
- this update moves the yield of the metrics chunk after the generation
chunk
- also included a change to include invocation metrics for cohere models

Closes langchain-ai#215

(cherry picked from commit e2c2f7c)
@ihmaws
Copy link
Contributor Author

ihmaws commented Oct 2, 2024

Thanks @3coins!

I have a backport to v0.1.18 ready (untested). I can create a PR once a branch to base off is ready: https://github.com/langchain-ai/langchain-aws/compare/v0.1.18...ihmaws:langchain-aws:dev/ihm/fix-mistral-streaming-v0.1.18?expand=1

@ihmaws
Copy link
Contributor Author

ihmaws commented Oct 3, 2024

Backport into v0.1 here: #222

3coins pushed a commit that referenced this issue Oct 4, 2024
_backport of fix into v0.1 branch_

Move yield of metrics chunk after generation chunk (#216)
- when using mistral and streaming is enabled,the final chunk includes a
stop_reason. There is nothing to say this final chunk doesn't also
include some generated text. The existing implementation would result in
that final chunk never getting sent back
- this update moves the yield of the metrics chunk after the generation
chunk
- also included a change to include invocation metrics for cohere models

Closes #215

(cherry picked from commit e2c2f7c)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant