NVIDIA Triton+TRT-LLM connector needs to handle dynamic model parameters #15045

fciannella · 2023-12-22T02:17:48Z

dosubot · 2023-12-22T02:18:00Z

🤖

I am Dosu, a friendly bot here to lend a hand while you're waiting for the human maintainers to respond. I'm designed to help you troubleshoot bugs, answer queries, and guide you around becoming a contributor. So, feel free to bounce anything off me. Let's get your issue sorted! 🛠️😀

Thank you for your detailed request. I understand that you want the NVIDIA Triton+TRT-LLM connector in LangChain to handle dynamic model parameters, discover input and output parameters from the server, and provide a list of mandatory parameters in the error code if a parameter is missing in the request.

Here's how you can achieve this:

Handle Dynamic Model Parameters: You can modify the _generate_inputs method to accept a dictionary of parameters instead of individual parameters. This way, you can pass any number of parameters dynamically.

def _generate_inputs(
    self,
    prompt: Sequence[Sequence[str]],
    params: Dict[str, Any],
    stream: bool = True,
) -> List[grpcclient.InferRequestedOutput]:
    """Create the input for the triton inference server."""
    query = np.array(prompt).astype(object)
    inputs = []
    for param, value in params.items():
        input_data = np.array([value]).astype(np.float32).reshape((1, -1))
        inputs.append(self._prepare_tensor(param, input_data))
    inputs.append(self._prepare_tensor("text_input", query))
    inputs.append(self._prepare_tensor("stream", np.array([[stream]], dtype=bool)))
    return inputs

Discover Input and Output Parameters: You can use the get_model_metadata method of the InferenceServerClient class to get the metadata of the model which includes the input and output parameters.

def get_model_metadata(self) -> Dict[str, Any]:
    """Get the metadata of the model."""
    return self.client.get_model_metadata(self.model_name, as_json=True)

Provide a List of Mandatory Parameters in the Error Code: You can define a list of mandatory parameters and check if these parameters are present in the request. If a parameter is missing, you can raise an error with the list of missing parameters.

MANDATORY_PARAMS = ["tokens", "top_k", "top_p", "temperature", "repetition_penalty", "length_penalty", "beam_width"]

def validate_params(self, params: Dict[str, Any]) -> None:
    """Validate that all mandatory parameters are present."""
    missing_params = [param for param in self.MANDATORY_PARAMS if param not in params]
    if missing_params:
        raise TritonTensorRTRuntimeError(f"Missing mandatory parameters: {missing_params}")

Remember to call validate_params before calling _generate_inputs in the _generate and _stream methods.

You can find the current structure of the _generate_inputs method in the NVIDIA Triton+TRT-LLM connector in LangChain in the libs/partners/nvidia-trt/langchain_nvidia_trt/llms.py file.

I hope this helps! If you have any further questions, feel free to ask.

Best,
Dosu

Sources

libs/partners/nvidia-trt/langchain_nvidia_trt/llms.py

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

mkhludnev · 2024-02-27T20:40:11Z

Here's an example of passing max_tokens via sample_parameters in one backend triton-inference-server/vllm_backend#34

dosubot bot added Ɑ: models Related to LLMs or chat model modules 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature labels Dec 22, 2023

hinthornw added the 🔌: nvidia Related to nvidia integrations label Dec 22, 2023

fciannella mentioned this issue Jan 29, 2024

Nvidia Nemotron integration with langchain with TritonTensorRTLLM #16719

Closed

mkhludnev mentioned this issue Feb 5, 2024

Support TensorRT-LLM? #12474

Closed

dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Jun 1, 2024

dosubot bot closed this as not planned Won't fix, can't repro, duplicate, stale Jun 8, 2024

dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Jun 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NVIDIA Triton+TRT-LLM connector needs to handle dynamic model parameters #15045

NVIDIA Triton+TRT-LLM connector needs to handle dynamic model parameters #15045

fciannella commented Dec 22, 2023 •

edited

Loading

dosubot bot commented Dec 22, 2023 •

edited

Loading

mkhludnev commented Feb 27, 2024

NVIDIA Triton+TRT-LLM connector needs to handle dynamic model parameters #15045

NVIDIA Triton+TRT-LLM connector needs to handle dynamic model parameters #15045

Comments

fciannella commented Dec 22, 2023 • edited Loading

System Info

Who can help?

Information

Related Components

Reproduction

Expected behavior

dosubot bot commented Dec 22, 2023 • edited Loading

Sources

mkhludnev commented Feb 27, 2024

fciannella commented Dec 22, 2023 •

edited

Loading

dosubot bot commented Dec 22, 2023 •

edited

Loading