Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nvidia Nemotron integration with langchain with TritonTensorRTLLM #16719

Closed
Nima-Nilchian opened this issue Jan 29, 2024 · 2 comments
Closed
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature Ɑ: models Related to LLMs or chat model modules

Comments

@Nima-Nilchian
Copy link

Nima-Nilchian commented Jan 29, 2024

Description

I'm trying to integrate my Nemotron LLM with langchain, I use the source code in langchain_nvidia_trt.llms.py, for having streaming but gives an exception.

Example Code

from llms import TritonTensorRTLLM

llm = TritonTensorRTLLM(server_url="localhost:8001", model_name="Nemotron-rlhf")
res = llm.invoke("HI")

Error Message and Stack Trace (if applicable)

and the Exeption is bellow

  File "/workspace/workspace/tens.py", line 4, in <module>
    res = llm.invoke("HI")
  File "/usr/local/lib/python3.10/dist-packages/langchain_core/language_models/llms.py", line 230, in invoke
    self.generate_prompt(
  File "/usr/local/lib/python3.10/dist-packages/langchain_core/language_models/llms.py", line 525, in generate_prompt
    return self.generate(prompt_strings, stop=stop, callbacks=callbacks, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/langchain_core/language_models/llms.py", line 698, in generate
    output = self._generate_helper(
  File "/usr/local/lib/python3.10/dist-packages/langchain_core/language_models/llms.py", line 562, in _generate_helper
    raise e
  File "/usr/local/lib/python3.10/dist-packages/langchain_core/language_models/llms.py", line 549, in _generate_helper
    self._generate(
  File "/workspace/workspace/llms.py", line 153, in _generate
    result: str = self._request(
  File "/workspace/workspace/llms.py", line 206, in _request
    result_str += token
TypeError: can only concatenate str (not "InferenceServerException") to str

the InferenceServerException is bellow:

unexpected inference output 'text_output' for model 'Nemotron-rlhf'

System Info

System Information

OS: Linux
OS Version: #163-Ubuntu SMP Fri Mar 17 18:26:02 UTC 2023
Python Version: 3.10.12 (main, Jun 11 2023, 05:26:28) [GCC 11.4.0]

Package Information

langchain_core: 0.1.16
langchain: 0.1.4
langchain_community: 0.0.16

Packages not installed (Not Necessarily a Problem)

The following packages were not found:

langgraph
langserve

@dosubot dosubot bot added Ɑ: models Related to LLMs or chat model modules 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature labels Jan 29, 2024
Copy link

dosubot bot commented Jan 29, 2024

🤖

Based on the error message you provided, it seems like the Triton Inference Server is not able to find the expected output 'text_output' for the model 'Nemotron-rlhf'. This could be due to a few reasons:

  1. Model Configuration: The model configuration file might not be correctly set up. The output name in the configuration file should match the output name expected by the TritonTensorRTLLM class. In this case, it should be 'text_output'. Please check the model configuration file (usually a .pbtxt file) in the model repository of your Triton server.

  2. Model Compatibility: The model 'Nemotron-rlhf' might not be compatible with the TritonTensorRTLLM class. The TritonTensorRTLLM class is designed to work with models that have a specific input and output structure. If your model does not adhere to this structure, you might need to modify the TritonTensorRTLLM class or use a different class that is compatible with your model.

  3. Triton Server Version: There might be a version mismatch between the Triton server and the LangChain library. Please ensure that you are using a version of the Triton server that is compatible with the LangChain library.

Here is a sample model configuration for a text generation model:

name: "text_gen_model"
platform: "tensorrt_plan"
max_batch_size: 0
input [
  {
    name: "INPUT__0"
    data_type: TYPE_INT32
    dims: [ -1 ]
  }
]
output [
  {
    name: "text_output"
    data_type: TYPE_STRING
    dims: [ 1 ]
  }
]
instance_group [
  {
    count: 1
    kind: KIND_GPU
  }
]
dynamic_batching { }

In this configuration, the output name is 'text_output', which matches the output name expected by the TritonTensorRTLLM class.

If you've checked all these points and you're still encountering the InferenceServerException error, there might be an issue with the Triton server itself or the way your model is deployed. In this case, you might want to check the Triton server's documentation or reach out to its support for further assistance.

For more information on how to configure a model for the Triton server, you can refer to the Triton Inference Server documentation.

For more information on how the TritonTensorRTLLM class works, you can refer to the LangChain Nvidia Triton LLMs documentation.

I hope this helps! If you have any more questions or need further clarification, feel free to ask.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

@fciannella
Copy link

Hi Nima,

we are aware of this issue, we are working on a fix.

Here is the bug I filed: #15045

@jdye64 for Viz.

@dosubot dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Apr 29, 2024
@dosubot dosubot bot closed this as not planned Won't fix, can't repro, duplicate, stale May 6, 2024
@dosubot dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label May 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature Ɑ: models Related to LLMs or chat model modules
Projects
None yet
Development

No branches or pull requests

2 participants