PandasCodec unexpected behavior #737

pablobgar · 2022-09-22T15:25:25Z

Hi,

I am having some problems with PandasCodec and grpc requests.

On the client side, I have the following code to send a pandas df:

data = pd.DataFrame(data={'col1': ["test_string", "test_string"], 'col2': ["test_string", "test_string"]})

i_request = PandasCodec.encode_request(data)
i_request.parameters = types.Parameters(content_type="pd")

inference_request_g = converters.ModelInferRequestConverter.from_types(
    i_request, model_name=model_name, model_version=None
)

grpc_channel = grpc.insecure_channel("localhost:8081")
grpc_stub = dataplane.GRPCInferenceServiceStub(grpc_channel)
response = grpc_stub.ModelInfer(inference_request_g)

And I have this code fragment to receive it on the server side:

async def predict(self, payload: types.InferenceRequest) -> types.InferenceResponse:

        if payload.parameters.content_type == "pd":
            inference_data = PandasCodec.decode_request(payload)

            print(inference_data)

I get this output:

             col1            col2
0  b'test_string'  b'test_string'
1  b'test_string'  b'test_string'

The same dataframe, but with binary strings. I was expecting to get as a result the same dataframe that I have in the client, since I am using the same codec for encoding and decoding.

Is this an expected behavior? Maybe I am doing something wrong?

Thanks for your help.

adriangonz · 2022-09-22T16:10:55Z

Hey @pablobgar ,

Thanks for providing that high level of detail! 👍

The PandasCodec will look at the content_types from each individual column (i.e. from each entry in the inputs list) to do any further conversion if required. For inputs with datatype BYTES, it will leave them as they are, unless they are flagged with content_type: "str", in which case it will convert them as strings.

Therefore, my guess is that either the PandasCodec.encode_request() method is not adding is not adding that str label to the string columns, or that the ModelInferRequestConverter.from_types() method is not adding the content_type label into the gRPC payload. To clarify this point, could you share the content of the i_request and inference_request_g payloads?

PS: within an MLModel instance, you can use the self.decode_request(payload) helper method, which will look at the content_type flags for you (i.e. without the need for an explicit if content_type == "...", then ....).

pablobgar · 2022-09-23T11:09:50Z

Hi @adriangonz

Thanks for your quick response!!

This is the content of i_request:

{'id': None, 'parameters': {'content_type': 'pd', 'headers': None}, 'inputs': [{'name': 'col1', 'shape': [2], 'datatype': 'BYTES', 'parameters': None, 'data': [b'test_string', b'test_string']}, {'name': 'col2', 'shape': [2], 'datatype': 'BYTES', 'parameters': None, 'data': [b'test_string', b'test_string']}], 'outputs': None}

And this is inference_request_g:

model_name: "test"
parameters {
  key: "content_type"
  value {
    string_param: "pd"
  }
}
inputs {
  name: "col1"
  datatype: "BYTES"
  shape: 2
  contents {
    bytes_contents: "test_string"
    bytes_contents: "test_string"
  }
}
inputs {
  name: "col2"
  datatype: "BYTES"
  shape: 2
  contents {
    bytes_contents: "test_string"
    bytes_contents: "test_string"
  }
}

adriangonz · 2022-09-23T11:37:57Z

Hey @pablobgar ,

Right, that seems aligned with what was described on my comment above. The PandasCodec.encode_request method is not setting the right content type at the individual input level.

Could you try explicitly setting str as the content type for the inputs of i_request? As in,

from mlserver.codecs import StringCodec

for input in i_request.inputs:
	input.parameters = Parameters(content_type=StringCodec.ContentType)

pablobgar · 2022-09-23T12:51:30Z

This is the resulting request:

model_name: "arte"
parameters {
  key: "content_type"
  value {
    string_param: "pd"
  }
}
inputs {
  name: "col1"
  datatype: "BYTES"
  shape: 2
  parameters {
    key: "content_type"
    value {
      string_param: "str"
    }
  }
  contents {
    bytes_contents: "test_string"
    bytes_contents: "test_string"
  }
}
inputs {
  name: "col2"
  datatype: "BYTES"
  shape: 2
  parameters {
    key: "content_type"
    value {
      string_param: "str"
    }
  }
  contents {
    bytes_contents: "test_string"
    bytes_contents: "test_string"
  }
}

But i have the same result on the server:

             col1            col2
0  b'test_string'  b'test_string'
1  b'test_string'  b'test_string'

adriangonz · 2022-09-26T16:40:13Z

Hey @pablobgar,

Thanks for trying that out.

Instead of calling the PandasCodec.decode_request(payload) method directly, could you call the self.decode_request(payload, default_codec=PandasCodec) method instead? This method should ensure that each input gets decoded appropriately first .

pablobgar · 2022-09-27T11:43:30Z

Hi @adriangonz

I have the same result with self.decode_request.

It seems that decoding is not performed in any case when the content type is bytes and PandasCodec is used.
https://github.com/SeldonIO/MLServer/blob/1.1.0/mlserver/codecs/pandas.py#L17-L19

adriangonz · 2022-09-28T15:57:03Z

The decoding of the individual columns should occur within the self.decode_request method in the MLModel base class though... That one calls this decode_inference_request helper, which makes a first pass through the entries of the inputs array to see if there is any decoding needed at the input level (e.g. like decoding bytes into strings):

MLServer/mlserver/codecs/utils.py

Lines 129 to 130 in 6bb9836

    
           for request_input in inference_request.inputs: 
        
               decode_request_input(request_input, metadata_inputs)

So it's strange why that still doesn't work for you...

Just to be on the safe side, I've added a small test case on the PR linked below, which should handle the same use case. Do you see any differences with your custom code?

https://github.com/SeldonIO/MLServer/pull/752/files#diff-3b8a58a4d021803b3171b886bb9162fd659e671131f3f61036f9210cb5d0bc7cR186-R204

pablobgar · 2022-09-28T17:06:22Z

Hi @adriangonz

Now it is working correctly if I add Parameters(content_type=StringCodec.ContentType). Maybe I did something wrong when I tested it a few days ago, or I didn't test the combination of adding this and use the self.decode_request method, sorry.

This is my code:

data = pd.DataFrame(
    data={
        "col1": ["test_string", "test_string"],
        "col2": ["test_string", "test_string"],
    }
)

i_request = PandasCodec.encode_request(data)
i_request.parameters = types.Parameters(content_type="pd")

for input in i_request.inputs:
    input.parameters = types.Parameters(content_type=StringCodec.ContentType)

But then where I see a problem is the PandasCodec.encode_request(). Shouldn't it add this parameter to columns that are strings automatically?

adriangonz · 2022-09-29T16:11:15Z

Hey @pablobgar ,

Thanks for the update!

On the last point, I totally agree. And TBH, the PandasCodec should also be able to decode strings. To put it another way, since it's the only which knows when a series should be encoded as a string, it should also be able to decode a bytes array into a string series.

We'll keep this issue open to tackle this last point.

adriangonz mentioned this issue Sep 28, 2022

Add test case for decoding pandas frames #752

Merged

adriangonz mentioned this issue May 23, 2023

Fix content type annotations for pandas codecs #1162

Merged

adriangonz closed this as completed in #1162 May 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PandasCodec unexpected behavior #737

PandasCodec unexpected behavior #737

pablobgar commented Sep 22, 2022 •

edited

Loading

adriangonz commented Sep 22, 2022

pablobgar commented Sep 23, 2022 •

edited

Loading

adriangonz commented Sep 23, 2022

pablobgar commented Sep 23, 2022

adriangonz commented Sep 26, 2022

pablobgar commented Sep 27, 2022

adriangonz commented Sep 28, 2022

pablobgar commented Sep 28, 2022

adriangonz commented Sep 29, 2022

PandasCodec unexpected behavior #737

PandasCodec unexpected behavior #737

Comments

pablobgar commented Sep 22, 2022 • edited Loading

adriangonz commented Sep 22, 2022

pablobgar commented Sep 23, 2022 • edited Loading

adriangonz commented Sep 23, 2022

pablobgar commented Sep 23, 2022

adriangonz commented Sep 26, 2022

pablobgar commented Sep 27, 2022

adriangonz commented Sep 28, 2022

pablobgar commented Sep 28, 2022

adriangonz commented Sep 29, 2022

pablobgar commented Sep 22, 2022 •

edited

Loading

pablobgar commented Sep 23, 2022 •

edited

Loading