Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move yield of metrics chunk after generation chunk #216

Merged
merged 1 commit into from
Oct 2, 2024

Conversation

ihmaws
Copy link
Contributor

@ihmaws ihmaws commented Sep 25, 2024

Move yield of metrics chunk after generation chunk

  • when using mistral and streaming is enabled,the final chunk includes a stop_reason. There is nothing to say this final chunk doesn't also include some generated text. The existing implementation would result in that final chunk never getting sent back
  • this update moves the yield of the metrics chunk after the generation chunk
  • also included a change to include invocation metrics for cohere models

Closes #215

@ihmaws ihmaws force-pushed the dev/ihmaws/fix-mistral-streaming branch from d653f01 to 5806ac6 Compare September 25, 2024 04:46
@SanjayTiwaryMLAI
Copy link

SanjayTiwaryMLAI commented Oct 1, 2024

@3coins, Need urgent help here as there is issue with the production deployment GenAI Chatbot for Government e market place. (https://github.com/ihmaws)has done deep dive on the issue and suggested few resolution. Can you look and please accept the PR.

This is first India Public sector generative AI chatbot, customer is asking for resolution on urgent basis

@3coins 3coins self-assigned this Oct 1, 2024
Copy link
Collaborator

@3coins 3coins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ihmaws
Thanks for submitting this change. One minor comment about removing the block for cohere, looks good otherwise.

libs/aws/langchain_aws/llms/bedrock.py Outdated Show resolved Hide resolved
Comment on lines +373 to +380
generation_chunk = _stream_response_to_generation_chunk(
chunk_obj,
provider=provider,
output_key=output_key,
messages_api=messages_api,
coerce_content_to_string=coerce_content_to_string,
)
if generation_chunk:
yield generation_chunk
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ends up adding an additional chunk to the response with an empty text. For example:

# Last text chunk with content
content='.' additional_kwargs={} response_metadata={'stop_reason': None} id='run-98ddf143-b333-47de-b8e5-b0dd55832cf6'

# empty content chunk processed for case when metrics are present
content='' additional_kwargs={} response_metadata={'stop_reason': 'stop', 'amazon-bedrock-invocationMetrics': {'inputTokenCount': 33, 'outputTokenCount': 10, 'invocationLatency': 230, 'firstByteLatency': 139}} id='run-98ddf143-b333-47de-b8e5-b0dd55832cf6'

# This one is processed with the _get_invocation_metrics_chunk
content='' additional_kwargs={} response_metadata={} id='run-98ddf143-b333-47de-b8e5-b0dd55832cf6' usage_metadata={'input_tokens': 33, 'output_tokens': 10, 'total_tokens': 43}

Ideally, we should be able to extract both the content and metrics in one go inside the _stream_response_to_generation_chunk method, not sure why we ended up with 2 different statements. I am not sure if Bedrock or any of the models guarantee the metrics as a last chunk; if that ends up not the case, this logic will break because of the return statements. However, this change seems harmless, maybe we can tackle the other issue in a separate PR.

  - when using mistral and streaming is enabled,the final
    chunk includes a stop_reason. There is nothing to say this final
    chunk doesn't also include some generated text. The existing
    implementation would result in that final chunk never getting
    sent back
  - this update moves the yield of the metrics chunk after the
    generation chunk
@ihmaws ihmaws force-pushed the dev/ihmaws/fix-mistral-streaming branch from 5806ac6 to 5274c1a Compare October 2, 2024 17:27
@ihmaws ihmaws requested a review from 3coins October 2, 2024 17:29
@3coins
Copy link
Collaborator

3coins commented Oct 2, 2024

@ihmaws
One of integration tests is failing, can you verify if this passes for you, otherwise, update it.

======================================================= FAILURES ========================================================
__________________________________________________ test_bedrock_invoke __________________________________________________

chat = ChatBedrock(client=<botocore.client.BedrockRuntime object at 0x121fccd90>, region_name='us-west-2', model_id='anthropic.claude-3-sonnet-20240229-v1:0', model_kwargs={'temperature': 0})

    @pytest.mark.scheduled
    def test_bedrock_invoke(chat: ChatBedrock) -> None:
        """Test invoke tokens from ChatBedrock."""
        result = chat.invoke("I'm Pickle Rick", config=dict(tags=["foo"]))
        assert isinstance(result.content, str)
        assert "usage" in result.additional_kwargs
>       assert result.additional_kwargs["usage"]["prompt_tokens"] == 12
E       assert 13 == 12

tests/integration_tests/chat_models/test_bedrock.py:240: AssertionError

Use this to verify:

poetry run pytest tests/integration_tests/chat_models/test_bedrock.py -k "test_bedrock_invoke"

@ihmaws
Copy link
Contributor Author

ihmaws commented Oct 2, 2024

@ihmaws One of integration tests is failing, can you verify if this passes for you, otherwise, update it.

Interesting, I'm not seeing that test failure. Let me try and have a look. Can you also run the test one more time and confirm you've got the latest code?

@3coins
Copy link
Collaborator

3coins commented Oct 2, 2024

Interesting, I'm not seeing that test failure. Let me try and have a look. Can you also run the test one more time and confirm you've got the latest code?

Let me try pulling in latest.

@3coins 3coins merged commit e2c2f7c into langchain-ai:main Oct 2, 2024
12 checks passed
ihmaws added a commit to ihmaws/langchain-aws that referenced this pull request Oct 2, 2024
- when using mistral and streaming is enabled,the final chunk includes a
stop_reason. There is nothing to say this final chunk doesn't also
include some generated text. The existing implementation would result in
that final chunk never getting sent back
- this update moves the yield of the metrics chunk after the generation
chunk
- also included a change to include invocation metrics for cohere models

Closes langchain-ai#215

(cherry picked from commit e2c2f7c)
3coins pushed a commit that referenced this pull request Oct 4, 2024
_backport of fix into v0.1 branch_

Move yield of metrics chunk after generation chunk (#216)
- when using mistral and streaming is enabled,the final chunk includes a
stop_reason. There is nothing to say this final chunk doesn't also
include some generated text. The existing implementation would result in
that final chunk never getting sent back
- this update moves the yield of the metrics chunk after the generation
chunk
- also included a change to include invocation metrics for cohere models

Closes #215

(cherry picked from commit e2c2f7c)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Streaming responses for Mistral models are getting truncated
3 participants