-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move yield of metrics chunk after generation chunk #216
Move yield of metrics chunk after generation chunk #216
Conversation
d653f01
to
5806ac6
Compare
@3coins, Need urgent help here as there is issue with the production deployment GenAI Chatbot for Government e market place. (https://github.com/ihmaws)has done deep dive on the issue and suggested few resolution. Can you look and please accept the PR. This is first India Public sector generative AI chatbot, customer is asking for resolution on urgent basis |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ihmaws
Thanks for submitting this change. One minor comment about removing the block for cohere, looks good otherwise.
generation_chunk = _stream_response_to_generation_chunk( | ||
chunk_obj, | ||
provider=provider, | ||
output_key=output_key, | ||
messages_api=messages_api, | ||
coerce_content_to_string=coerce_content_to_string, | ||
) | ||
if generation_chunk: | ||
yield generation_chunk |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ends up adding an additional chunk to the response with an empty text. For example:
# Last text chunk with content
content='.' additional_kwargs={} response_metadata={'stop_reason': None} id='run-98ddf143-b333-47de-b8e5-b0dd55832cf6'
# empty content chunk processed for case when metrics are present
content='' additional_kwargs={} response_metadata={'stop_reason': 'stop', 'amazon-bedrock-invocationMetrics': {'inputTokenCount': 33, 'outputTokenCount': 10, 'invocationLatency': 230, 'firstByteLatency': 139}} id='run-98ddf143-b333-47de-b8e5-b0dd55832cf6'
# This one is processed with the _get_invocation_metrics_chunk
content='' additional_kwargs={} response_metadata={} id='run-98ddf143-b333-47de-b8e5-b0dd55832cf6' usage_metadata={'input_tokens': 33, 'output_tokens': 10, 'total_tokens': 43}
Ideally, we should be able to extract both the content and metrics in one go inside the _stream_response_to_generation_chunk
method, not sure why we ended up with 2 different statements. I am not sure if Bedrock or any of the models guarantee the metrics as a last chunk; if that ends up not the case, this logic will break because of the return statements. However, this change seems harmless, maybe we can tackle the other issue in a separate PR.
- when using mistral and streaming is enabled,the final chunk includes a stop_reason. There is nothing to say this final chunk doesn't also include some generated text. The existing implementation would result in that final chunk never getting sent back - this update moves the yield of the metrics chunk after the generation chunk
5806ac6
to
5274c1a
Compare
@ihmaws
Use this to verify:
|
Interesting, I'm not seeing that test failure. Let me try and have a look. Can you also run the test one more time and confirm you've got the latest code? |
Let me try pulling in latest. |
- when using mistral and streaming is enabled,the final chunk includes a stop_reason. There is nothing to say this final chunk doesn't also include some generated text. The existing implementation would result in that final chunk never getting sent back - this update moves the yield of the metrics chunk after the generation chunk - also included a change to include invocation metrics for cohere models Closes langchain-ai#215 (cherry picked from commit e2c2f7c)
_backport of fix into v0.1 branch_ Move yield of metrics chunk after generation chunk (#216) - when using mistral and streaming is enabled,the final chunk includes a stop_reason. There is nothing to say this final chunk doesn't also include some generated text. The existing implementation would result in that final chunk never getting sent back - this update moves the yield of the metrics chunk after the generation chunk - also included a change to include invocation metrics for cohere models Closes #215 (cherry picked from commit e2c2f7c)
Move yield of metrics chunk after generation chunk
Closes #215