Error: unpack_from requires a buffer of at least ... bytes for unpacking ... bytes at offset 4 (actual buffer size is ...) #7391

adisabolic · 2024-06-28T08:05:27Z

Description
I get the following error when using a TYPE_STRING input field in a triton model with Python backend:

{'error': "Failed to process the request(s) for model instance 'string_test_0', message: error: unpack_from requires a buffer of at least 50529031 bytes for unpacking 50529027 bytes at offset 4 (actual buffer size is 7)\n\nAt:\n  /opt/tritonserver/backends/python/triton_python_backend_utils.py(117): deserialize_bytes_tensor\n"}

Looking at the /opt/tritonserver/backends/python/triton_python_backend_utils.py file, the line 117 is:

sb = struct.unpack_from("<{}s".format(l), val_buf, offset)[0]

Triton Information
What version of Triton are you using?

I am using the docker base image nvcr.io/nvidia/tritonserver:24.04-py3 with CUDA 12.4 installed on my Ubuntu machine 22.04.4 LTS (Jammy Jellyfish). Python version is 3.10.12.

To Reproduce

I was able to make a minimal example for which I get the error.

`config.pbtxt` of model

name: "string_test"
backend: "python"

input [
  {
    name: "INPUT0"
    data_type: TYPE_STRING
    dims: [ 1 ]
  }
]
output [
  {
    name: "OUTPUT0"
    data_type: TYPE_STRING
    dims: [ 1 ]
  }
]

`model.py` of model

import sys
import json

sys.path.append('../../')
import triton_python_backend_utils as pb_utils
import numpy as np


class TritonPythonModel:
    """This model always returns the input that it has received.
    """

    def initialize(self, args):
        self.model_config = json.loads(args['model_config'])

    def execute(self, requests):
        """ This function is called on inference request.
        """
        responses = []
        for request in requests:
            in_0 = pb_utils.get_input_tensor_by_name(request, "INPUT0")
            out_tensor_0 = pb_utils.Tensor("OUTPUT0", in_0.as_numpy().astype(np.object_))
            print(f"INPUT VALUE: {in_0.as_numpy()[0].decode()}")
            responses.append(pb_utils.InferenceResponse([out_tensor_0]))
        return responses

example client script

import requests

URL = "http://localhost:8120/v2/models/string_test/infer"


def main():
    data = {
        "name": "string_test",
        "inputs": [
            {
                "name": "INPUT0",
                "shape": [1],
                "datatype": "BYTES",
                "data": ["Hi!"]
            }
        ]
    }
    res = requests.post(URL, json=data)
    print(res.json())
    return


if __name__ == "__main__":
    main()

I have tried various things - from changing CUDA and Triton versions, changing Nvidia package versions, etc. None seems to work.

The text was updated successfully, but these errors were encountered:

ohad83 · 2024-07-01T11:06:55Z

Something similar happened to me as well, we figured out a few things:

First and most important, it seems the problem is the protobuf library version. Downgrading python's protobuf library from 5.27.2 to 5.27.1 fixed it.

Second, the problem seems to be parsing a string or bytes length. It's transported as 1 byte of length then the buffer. However, instead of expanding the byte to 4 bytes by zero-extending, it repeats the byte 4 times. For example, in your example, the data is 3 bytes long (Hi!), so instead of parsing the size as 0x00000003 it's parsed as 0x03030303 (the 50529027 that appears in your error).

I'm not sure if the problem is general in the protobuf library or specifically with how triton uses it, but since I've only seen the problem in triton I tend to think it's in the library usage.

adisabolic · 2024-07-01T13:20:31Z

@ohad83 Thank you for your help - but unfortunately downgrading protobuf didn't work for me. I have tried several older versions of protobuf with no success.

Although it is not the best solution, I did manage to get it working - by downgrading to Triton version 2.42 (docker container version 24.01), that was the newest Triton version which worked for me.

SunXuan90 · 2024-07-02T09:20:12Z

I used 24.05 with conda packed environments for python models. At first everything was fine, then I repacked the environment and caused this problem. I fixed it by using pip inside container to install packages and stick to those versions when packing. Something definitely broke with recent updates. I haven't tried 24.06 yet.

coder-2014 · 2024-07-05T01:57:12Z

I found the reason：The triton Python backend already contains the numpy package. If your conda environment package also contains numpy and the version is inconsistent, compatibility issues may occur. The solution is to keep the numpy in your conda environment package consistent with the Python backend

statiraju · 2024-07-11T20:26:09Z

@adisabolic Do you still need support?

suhaneshivam · 2024-09-18T14:32:44Z

I am also facing the same problem with Container Version 24.05 and Triton Inference Server Version 2.46.0.

`config.pbtxt` of model:

name: "sd_turbo"
backend: "python"
max_batch_size: 8

input [
  {
    name: "prompt"
    data_type: TYPE_STRING
    dims: [ -1 ]
    
  },
  {
    name: "negative_prompt"
    data_type: TYPE_STRING
    dims: [ -1 ]
    optional: true 
    
  },
  {
    name: "gen_args"
    data_type: TYPE_STRING
    dims: [ -1 ]
    optional: true
  }

]

output [
  {
    name: "generated_image"
    data_type: TYPE_STRING	
    dims: [ -1 ]
  }
]

instance_group [
  {
    kind: KIND_GPU
  }
]

`model.py` of model:

import json
import numpy as np
import torch
import triton_python_backend_utils as pb_utils

from diffusers import AutoPipelineForText2Image

from io import BytesIO
import base64


def encode_images(images):
    encoded_images = []
    for image in images:
        buffer = BytesIO()
        image.save(buffer, format="JPEG")
        img_str = base64.b64encode(buffer.getvalue())
        encoded_images.append(img_str.decode("utf8"))
    
    return encoded_images


class TritonPythonModel:

    def initialize(self, args):
        
        self.model_dir = args['model_repository']
        self.model_ver = args['model_version']
    
    
        device='cuda'
        self.pipe =AutoPipelineForText2Image.from_pretrained(f'{self.model_dir}/{self.model_ver}/checkpoint',torch_dtype=torch.float16).to(device)
        self.pipe.unet.enable_xformers_memory_efficient_attention()

    def execute(self, requests):
        
        logger = pb_utils.Logger
        responses = []
        for request in requests:
            prompt = pb_utils.get_input_tensor_by_name(request, "prompt").as_numpy().item().decode("utf-8")
            negative_prompt = pb_utils.get_input_tensor_by_name(request, "negative_prompt")
            gen_args = pb_utils.get_input_tensor_by_name(request, "gen_args")
            
            input_args = dict(prompt=prompt)
            
            if negative_prompt:
                input_args["negative_prompt"] = negative_prompt.as_numpy().item().decode("utf-8")
            
            if gen_args:
                gen_args = json.loads(gen_args.as_numpy().item().decode("utf-8"))
                input_args.update(gen_args)            
            
            images = self.pipe(**input_args).images
            encoded_images = encode_images(images)
            
            responses.append(pb_utils.InferenceResponse([pb_utils.Tensor("generated_image", np.array(encoded_images).astype(object))]))
        
        return responses

I am getting the following error:

"Logging Verbose Error: \nFailed to process the request(s) for model instance 'sd_turbo.tar.gz_0_0', message: error: unpack_from requires a buffer of at least 1010580544 bytes for unpacking 1010580540 bytes at offset 4 (actual buffer size is 64)\n\nAt:\n  /opt/tritonserver/backends/python/triton_python_backend_utils.py(117): deserialize_bytes_tensor\n\n"

I am extending the base container with the following python deps:

diffusers 
sentencepiece 
accelerate 
transformers
conda-pack
xformers
pillow

I am using aws sagemaker to invoke the endpoint. Below is how I am making the requests:

inputs = dict(
    prompt="Infinity pool on top of a high rise overlooking Central Park",
    negative_prompt="blur, signature, low detail, low quality",
    gen_args=json.dumps(dict(num_inference_steps=1, guidance_scale=0)),
)

payload = {
    "inputs": [
        {"name": name, "shape": [1, 1], "datatype": "BYTES", "data": [data]}
        for name, data in inputs.items()
    ]
}

response = runtime_sm_client.invoke_endpoint(
    EndpointName=endpoint_name,
    ContentType="application/octet-stream",
    Body=json.dumps(payload).encode("utf-8"),
    TargetModel="sd_turbo.tar.gz",
)

What is the issue?

KrishnanPrash · 2024-10-18T01:57:12Z

Hello @adisabolic @suhaneshivam @ohad83 ,
Thank you @coder-2014 for diagnosing this issue.

The following PR should resolve this issue: triton-inference-server/python_backend#384. This fix will tentatively be a part of the 24.11 release.

KrishnanPrash closed this as completed Oct 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error: unpack_from requires a buffer of at least ... bytes for unpacking ... bytes at offset 4 (actual buffer size is ...) #7391

Error: unpack_from requires a buffer of at least ... bytes for unpacking ... bytes at offset 4 (actual buffer size is ...) #7391

adisabolic commented Jun 28, 2024 •

edited

Loading

ohad83 commented Jul 1, 2024

adisabolic commented Jul 1, 2024

SunXuan90 commented Jul 2, 2024 •

edited

Loading

coder-2014 commented Jul 5, 2024

statiraju commented Jul 11, 2024

suhaneshivam commented Sep 18, 2024 •

edited

Loading

KrishnanPrash commented Oct 18, 2024 •

edited

Loading

Error: unpack_from requires a buffer of at least ... bytes for unpacking ... bytes at offset 4 (actual buffer size is ...) #7391

Error: unpack_from requires a buffer of at least ... bytes for unpacking ... bytes at offset 4 (actual buffer size is ...) #7391

Comments

adisabolic commented Jun 28, 2024 • edited Loading

config.pbtxt of model

model.py of model

example client script

ohad83 commented Jul 1, 2024

adisabolic commented Jul 1, 2024

SunXuan90 commented Jul 2, 2024 • edited Loading

coder-2014 commented Jul 5, 2024

statiraju commented Jul 11, 2024

suhaneshivam commented Sep 18, 2024 • edited Loading

config.pbtxt of model:

model.py of model:

KrishnanPrash commented Oct 18, 2024 • edited Loading

adisabolic commented Jun 28, 2024 •

edited

Loading

`config.pbtxt` of model

`model.py` of model

SunXuan90 commented Jul 2, 2024 •

edited

Loading

suhaneshivam commented Sep 18, 2024 •

edited

Loading

`config.pbtxt` of model:

`model.py` of model:

KrishnanPrash commented Oct 18, 2024 •

edited

Loading