Model features: token usage #109

raspawar · 2024-10-18T06:19:09Z

Token Usage:

Added stream_usage option
added support for kwargs["stream_options"] = {"include_usage": <>}
test cases for above

Variants to use these options:

llm = ChatNVIDIA(temperature=0, max_tokens=5, stream_usage=True)
llm.stream("IGNROED", stream_usage=True)
llm.stream("Hello", stream_options={"include_usage": True})

mattf · 2024-10-18T12:38:27Z

libs/ai-endpoints/langchain_nvidia_ai_endpoints/chat_models.py

@@ -268,6 +268,13 @@ class ChatNVIDIA(BaseChatModel):
    top_p: Optional[float] = Field(None, description="Top-p for distribution sampling")
    seed: Optional[int] = Field(None, description="The seed for deterministic results")
    stop: Optional[Sequence[str]] = Field(None, description="Stop words (cased)")
+    stream_usage: bool = Field(
+        False,


the langchain user's expectation is that usage details are returned by default

mattf · 2024-10-18T12:40:02Z

libs/ai-endpoints/langchain_nvidia_ai_endpoints/chat_models.py

@@ -381,18 +388,38 @@ def _generate(
        response = self._client.get_req(payload=payload, extra_headers=extra_headers)
        responses, _ = self._client.postprocess(response)
        self._set_callback_out(responses, run_manager)
-        parsed_response = self._custom_postprocess(responses, streaming=False)
+        parsed_response = self._custom_postprocess(
+            responses, streaming=False, stream_usage=False


how about relying on streaming=False to imply stream_usage=False?

what if it is streaming but do not want stream usage details?

mattf · 2024-10-18T12:42:54Z

libs/ai-endpoints/langchain_nvidia_ai_endpoints/chat_models.py

-                "output_tokens": token_usage.get("completion_tokens", 0),
-                "total_tokens": token_usage.get("total_tokens", 0),
-            }
+            if (streaming and stream_usage) or not streaming:


how about always returning tokens usage if its in the response?

that makes sense, because does not matter parameter at this point

mattf · 2024-10-18T12:43:47Z

libs/ai-endpoints/tests/integration_tests/test_chat_models.py

@@ -441,3 +443,121 @@ def test_stop(
            assert isinstance(token.content, str)
            result += f"{token.content}|"
    assert all(target not in result for target in targets)
+
+
+def test_ai_endpoints_stream_token_usage() -> None:


all integration tests should accept and use a model and mode

mattf · 2024-10-18T12:44:05Z

libs/ai-endpoints/tests/integration_tests/test_chat_models.py

+    _test_stream(llm.stream("Hello", stream_usage=False), expect_usage=False)
+
+
+async def test_ai_endpoints_astream_token_usage() -> None:


all integration tests should accept and use a model and mode

mattf · 2024-10-18T12:44:28Z

libs/ai-endpoints/tests/unit_tests/test_bind_tools.py

@@ -294,14 +294,39 @@ def test_stream_usage_metadata(
    )

    llm = ChatNVIDIA(api_key="BOGUS")
-    response = reduce(add, llm.stream("IGNROED"))
+    response = reduce(add, llm.stream("IGNROED", stream_usage=True))


the default should be True

raspawar · 2024-10-19T16:11:36Z

@mattf when I was checking if it works for all models, I figured out that 28 models out of 66 models supported by chatNVIDIA do not support stream_option={"include_usage"} parameter.

 'google/gemma-7b',
 'meta/llama2-70b',
 'mistralai/mistral-7b-instruct-v0.2',
 'google/codegemma-7b',
 'google/gemma-2b',
 'google/recurrentgemma-2b',
 'mistralai/mistral-large',
 'snowflake/arctic',
 'databricks/dbrx-instruct',
 'seallms/seallm-7b-v2.5',
 'aisingapore/sea-lion-7b-instruct',
 'microsoft/phi-3-small-8k-instruct',
 'microsoft/phi-3-small-128k-instruct',
 'ibm/granite-34b-code-instruct',
 'google/codegemma-1.1-7b',
 'mediatek/breeze-7b-instruct',
 'google/gemma-2-9b-it',
 'google/gemma-2-27b-it',
 'deepseek-ai/deepseek-coder-6.7b-instruct',
 'google/gemma-2-2b-it',
 'rakuten/rakutenai-7b-instruct',
 'rakuten/rakutenai-7b-chat',
 'baichuan-inc/baichuan2-13b-chat',
 'thudm/chatglm3-6b',
 'yentinglin/llama-3-taiwan-70b-instruct',
 'qwen/qwen2-7b-instruct',
 'zyphra/zamba2-7b-instruct']

Out of 38 which are supported there is inconsistency in response as some models return total token_usage at the end where as some return with each token.

How should we handle this above 2 things?

mattf · 2024-10-25T13:46:57Z

@mattf when I was checking if it works for all models, I figured out that 28 models out of 66 models supported by chatNVIDIA do not support stream_option={"include_usage"} parameter.

...


Out of 38 which are supported there is inconsistency in response as some models return total token_usage at the end where as some return with each token.

How should we handle this above 2 things?

returning token usage should be the default. those endpoints will need to be fixed.

raspawar added 2 commits October 18, 2024 11:36

add code, tests for stream usage

e8492bf

add test cases for stream options

d8acb3f

raspawar requested a review from mattf October 18, 2024 06:29

mattf assigned raspawar Oct 18, 2024

mattf reviewed Oct 18, 2024

View reviewed changes

add mode, model in test cases

9dd243e

langchain-ai deleted a comment from raspawar Oct 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model features: token usage #109

Model features: token usage #109

raspawar commented Oct 18, 2024

mattf Oct 18, 2024

mattf Oct 18, 2024

raspawar Oct 18, 2024

mattf Oct 18, 2024

raspawar Oct 18, 2024

mattf Oct 18, 2024

mattf Oct 18, 2024

mattf Oct 18, 2024

raspawar commented Oct 19, 2024

mattf commented Oct 25, 2024

		_test_stream(llm.stream("Hello", stream_usage=False), expect_usage=False)


		async def test_ai_endpoints_astream_token_usage() -> None:

Model features: token usage #109

Are you sure you want to change the base?

Model features: token usage #109

Conversation

raspawar commented Oct 18, 2024

Token Usage:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

raspawar commented Oct 19, 2024

mattf commented Oct 25, 2024