Nvidia Nemotron integration with langchain with TritonTensorRTLLM #16719

Nima-Nilchian · 2024-01-29T09:38:42Z

Description

I'm trying to integrate my Nemotron LLM with langchain, I use the source code in langchain_nvidia_trt.llms.py, for having streaming but gives an exception.

Example Code

from llms import TritonTensorRTLLM

llm = TritonTensorRTLLM(server_url="localhost:8001", model_name="Nemotron-rlhf")
res = llm.invoke("HI")

Error Message and Stack Trace (if applicable)

and the Exeption is bellow

  File "/workspace/workspace/tens.py", line 4, in <module>
    res = llm.invoke("HI")
  File "/usr/local/lib/python3.10/dist-packages/langchain_core/language_models/llms.py", line 230, in invoke
    self.generate_prompt(
  File "/usr/local/lib/python3.10/dist-packages/langchain_core/language_models/llms.py", line 525, in generate_prompt
    return self.generate(prompt_strings, stop=stop, callbacks=callbacks, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/langchain_core/language_models/llms.py", line 698, in generate
    output = self._generate_helper(
  File "/usr/local/lib/python3.10/dist-packages/langchain_core/language_models/llms.py", line 562, in _generate_helper
    raise e
  File "/usr/local/lib/python3.10/dist-packages/langchain_core/language_models/llms.py", line 549, in _generate_helper
    self._generate(
  File "/workspace/workspace/llms.py", line 153, in _generate
    result: str = self._request(
  File "/workspace/workspace/llms.py", line 206, in _request
    result_str += token
TypeError: can only concatenate str (not "InferenceServerException") to str

the InferenceServerException is bellow:

unexpected inference output 'text_output' for model 'Nemotron-rlhf'

System Info

System Information

OS: Linux
OS Version: #163-Ubuntu SMP Fri Mar 17 18:26:02 UTC 2023
Python Version: 3.10.12 (main, Jun 11 2023, 05:26:28) [GCC 11.4.0]

Package Information

langchain_core: 0.1.16
langchain: 0.1.4
langchain_community: 0.0.16

Packages not installed (Not Necessarily a Problem)

The following packages were not found:

langgraph
langserve

The text was updated successfully, but these errors were encountered:

dosubot · 2024-01-29T09:41:36Z

🤖

Based on the error message you provided, it seems like the Triton Inference Server is not able to find the expected output 'text_output' for the model 'Nemotron-rlhf'. This could be due to a few reasons:

Model Configuration: The model configuration file might not be correctly set up. The output name in the configuration file should match the output name expected by the TritonTensorRTLLM class. In this case, it should be 'text_output'. Please check the model configuration file (usually a .pbtxt file) in the model repository of your Triton server.
Model Compatibility: The model 'Nemotron-rlhf' might not be compatible with the TritonTensorRTLLM class. The TritonTensorRTLLM class is designed to work with models that have a specific input and output structure. If your model does not adhere to this structure, you might need to modify the TritonTensorRTLLM class or use a different class that is compatible with your model.
Triton Server Version: There might be a version mismatch between the Triton server and the LangChain library. Please ensure that you are using a version of the Triton server that is compatible with the LangChain library.

Here is a sample model configuration for a text generation model:

name: "text_gen_model"
platform: "tensorrt_plan"
max_batch_size: 0
input [
  {
    name: "INPUT__0"
    data_type: TYPE_INT32
    dims: [ -1 ]
  }
]
output [
  {
    name: "text_output"
    data_type: TYPE_STRING
    dims: [ 1 ]
  }
]
instance_group [
  {
    count: 1
    kind: KIND_GPU
  }
]
dynamic_batching { }

In this configuration, the output name is 'text_output', which matches the output name expected by the TritonTensorRTLLM class.

If you've checked all these points and you're still encountering the InferenceServerException error, there might be an issue with the Triton server itself or the way your model is deployed. In this case, you might want to check the Triton server's documentation or reach out to its support for further assistance.

For more information on how to configure a model for the Triton server, you can refer to the Triton Inference Server documentation.

For more information on how the TritonTensorRTLLM class works, you can refer to the LangChain Nvidia Triton LLMs documentation.

I hope this helps! If you have any more questions or need further clarification, feel free to ask.

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

fciannella · 2024-01-29T18:00:45Z

Hi Nima,

we are aware of this issue, we are working on a fix.

Here is the bug I filed: #15045

@jdye64 for Viz.

dosubot bot added Ɑ: models Related to LLMs or chat model modules 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature labels Jan 29, 2024

dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Apr 29, 2024

dosubot bot closed this as not planned Won't fix, can't repro, duplicate, stale May 6, 2024

dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label May 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nvidia Nemotron integration with langchain with TritonTensorRTLLM #16719

Nvidia Nemotron integration with langchain with TritonTensorRTLLM #16719

Nima-Nilchian commented Jan 29, 2024 •

edited

Loading

dosubot bot commented Jan 29, 2024 •

edited

Loading

fciannella commented Jan 29, 2024

Nvidia Nemotron integration with langchain with TritonTensorRTLLM #16719

Nvidia Nemotron integration with langchain with TritonTensorRTLLM #16719

Comments

Nima-Nilchian commented Jan 29, 2024 • edited Loading

Description

Example Code

Error Message and Stack Trace (if applicable)

System Info

System Information

Package Information

Packages not installed (Not Necessarily a Problem)

dosubot bot commented Jan 29, 2024 • edited Loading

Sources

fciannella commented Jan 29, 2024

Nima-Nilchian commented Jan 29, 2024 •

edited

Loading

dosubot bot commented Jan 29, 2024 •

edited

Loading