Move yield of metrics chunk after generation chunk #216

ihmaws · 2024-09-25T04:38:28Z

Move yield of metrics chunk after generation chunk

when using mistral and streaming is enabled,the final chunk includes a stop_reason. There is nothing to say this final chunk doesn't also include some generated text. The existing implementation would result in that final chunk never getting sent back
this update moves the yield of the metrics chunk after the generation chunk
also included a change to include invocation metrics for cohere models

Closes #215

libs/aws/langchain_aws/llms/bedrock.py

SanjayTiwaryMLAI · 2024-10-01T06:18:38Z

@3coins, Need urgent help here as there is issue with the production deployment GenAI Chatbot for Government e market place. (https://github.com/ihmaws)has done deep dive on the issue and suggested few resolution. Can you look and please accept the PR.

This is first India Public sector generative AI chatbot, customer is asking for resolution on urgent basis

3coins

@ihmaws
Thanks for submitting this change. One minor comment about removing the block for cohere, looks good otherwise.

libs/aws/langchain_aws/llms/bedrock.py

3coins · 2024-10-02T16:54:19Z

libs/aws/langchain_aws/llms/bedrock.py

+            generation_chunk = _stream_response_to_generation_chunk(
+                chunk_obj,
+                provider=provider,
+                output_key=output_key,
+                messages_api=messages_api,
+                coerce_content_to_string=coerce_content_to_string,
+            )
+            if generation_chunk:
+                yield generation_chunk


This ends up adding an additional chunk to the response with an empty text. For example:

# Last text chunk with content content='.' additional_kwargs={} response_metadata={'stop_reason': None} id='run-98ddf143-b333-47de-b8e5-b0dd55832cf6' # empty content chunk processed for case when metrics are present content='' additional_kwargs={} response_metadata={'stop_reason': 'stop', 'amazon-bedrock-invocationMetrics': {'inputTokenCount': 33, 'outputTokenCount': 10, 'invocationLatency': 230, 'firstByteLatency': 139}} id='run-98ddf143-b333-47de-b8e5-b0dd55832cf6' # This one is processed with the _get_invocation_metrics_chunk content='' additional_kwargs={} response_metadata={} id='run-98ddf143-b333-47de-b8e5-b0dd55832cf6' usage_metadata={'input_tokens': 33, 'output_tokens': 10, 'total_tokens': 43}

Ideally, we should be able to extract both the content and metrics in one go inside the _stream_response_to_generation_chunk method, not sure why we ended up with 2 different statements. I am not sure if Bedrock or any of the models guarantee the metrics as a last chunk; if that ends up not the case, this logic will break because of the return statements. However, this change seems harmless, maybe we can tackle the other issue in a separate PR.

- when using mistral and streaming is enabled,the final chunk includes a stop_reason. There is nothing to say this final chunk doesn't also include some generated text. The existing implementation would result in that final chunk never getting sent back - this update moves the yield of the metrics chunk after the generation chunk

3coins · 2024-10-02T18:15:46Z

@ihmaws
One of integration tests is failing, can you verify if this passes for you, otherwise, update it.

======================================================= FAILURES ========================================================
__________________________________________________ test_bedrock_invoke __________________________________________________

chat = ChatBedrock(client=<botocore.client.BedrockRuntime object at 0x121fccd90>, region_name='us-west-2', model_id='anthropic.claude-3-sonnet-20240229-v1:0', model_kwargs={'temperature': 0})

    @pytest.mark.scheduled
    def test_bedrock_invoke(chat: ChatBedrock) -> None:
        """Test invoke tokens from ChatBedrock."""
        result = chat.invoke("I'm Pickle Rick", config=dict(tags=["foo"]))
        assert isinstance(result.content, str)
        assert "usage" in result.additional_kwargs
>       assert result.additional_kwargs["usage"]["prompt_tokens"] == 12
E       assert 13 == 12

tests/integration_tests/chat_models/test_bedrock.py:240: AssertionError

Use this to verify:

poetry run pytest tests/integration_tests/chat_models/test_bedrock.py -k "test_bedrock_invoke"

ihmaws · 2024-10-02T18:35:13Z

@ihmaws One of integration tests is failing, can you verify if this passes for you, otherwise, update it.

Interesting, I'm not seeing that test failure. Let me try and have a look. Can you also run the test one more time and confirm you've got the latest code?

3coins · 2024-10-02T18:56:17Z

Interesting, I'm not seeing that test failure. Let me try and have a look. Can you also run the test one more time and confirm you've got the latest code?

Let me try pulling in latest.

- when using mistral and streaming is enabled,the final chunk includes a stop_reason. There is nothing to say this final chunk doesn't also include some generated text. The existing implementation would result in that final chunk never getting sent back - this update moves the yield of the metrics chunk after the generation chunk - also included a change to include invocation metrics for cohere models Closes langchain-ai#215 (cherry picked from commit e2c2f7c)

_backport of fix into v0.1 branch_ Move yield of metrics chunk after generation chunk (#216) - when using mistral and streaming is enabled,the final chunk includes a stop_reason. There is nothing to say this final chunk doesn't also include some generated text. The existing implementation would result in that final chunk never getting sent back - this update moves the yield of the metrics chunk after the generation chunk - also included a change to include invocation metrics for cohere models Closes #215 (cherry picked from commit e2c2f7c)

ihmaws force-pushed the dev/ihmaws/fix-mistral-streaming branch from d653f01 to 5806ac6 Compare September 25, 2024 04:46

ihmaws commented Sep 25, 2024

View reviewed changes

libs/aws/langchain_aws/llms/bedrock.py Outdated Show resolved Hide resolved

3coins self-assigned this Oct 1, 2024

3coins reviewed Oct 2, 2024

View reviewed changes

ihmaws force-pushed the dev/ihmaws/fix-mistral-streaming branch from 5806ac6 to 5274c1a Compare October 2, 2024 17:27

ihmaws requested a review from 3coins October 2, 2024 17:29

3coins approved these changes Oct 2, 2024

View reviewed changes

3coins merged commit e2c2f7c into langchain-ai:main Oct 2, 2024
12 checks passed

ihmaws mentioned this pull request Oct 3, 2024

Move yield of metrics chunk after generation chunk (#216) #222

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move yield of metrics chunk after generation chunk #216

Move yield of metrics chunk after generation chunk #216

ihmaws commented Sep 25, 2024

SanjayTiwaryMLAI commented Oct 1, 2024 •

edited

Loading

3coins left a comment

3coins Oct 2, 2024

3coins commented Oct 2, 2024

ihmaws commented Oct 2, 2024

3coins commented Oct 2, 2024

Move yield of metrics chunk after generation chunk #216

Move yield of metrics chunk after generation chunk #216

Conversation

ihmaws commented Sep 25, 2024