Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NDArray with values being lists not supported - RESOLVED: Proto lists were not being deep-copied #600

Closed
axsaucedo opened this issue May 28, 2019 · 4 comments · Fixed by #817
Milestone

Comments

@axsaucedo
Copy link
Contributor

axsaucedo commented May 28, 2019

EDIT with comment below:
Ok, I've been able to narrow it down further. In order to replicate this, the only thing required is to send a request on the form of an array of arrays.

More specifically, it all works well if you send a request with data of the form of [[1,2,3]] or ["hello", "world"] or [["hello", "world"]]`.

However if we send a request with data of the form of [[1,2], [3,4]] or [["hello", "world"], ["hello", "world"]], then the inner arrays don't get converted from proto.

This can be reproduced by creating a python wrapper that just prints the input, and sending it the following request:

curl -X POST -H 'Content-Type: application/json' \
    -d "{'data': {'names': [], 'ndarray': [[1,2], [3,4,5]]}}" \
    http://localhost:31707/seldon/default/text-example/api/v0.1/predictions

This would then print:

[values {
  number_value: 1.0
}
values {
  number_value: 2.0
}
 values {
  number_value: 3.0
}
values {
  number_value: 4.0
}
values {
  number_value: 5.0
}
]

The reason why it prints it as "Values" is because it's a proto array. It should be printing it as a python array (i.e. [[1,2],[3,4,5]])

PREVIOUS TEXT (not fully accurate - see edit above for accurate description)

There is currently an issue that was uncovered when building the Kubeflow Seldon example ()#589, which may require either constraining the implementation or extending the type-support functionality. This is basically in the way that SeldonCore currently handles numpy arrays of tokens (i.e. lists of strings).

Currently the example only sends one sentence as a test which works correctly. However if more than one sentence is sent to the SeldonEngine for processing, it then is unable to process this request.

The reason why this is the case is because the return value is of the format np.array(list(str), list(str)), which translates into np.array(list(str).extend(list(str)).

To be more specific with an example, if we send as example np.array(["example one", "example 2")], the spacytokenizer image would return the value np.array.([list("example", "1"), list("example", "2")]), which SeldonEngine then converts into np.array(["example", "1", "example", "2"]).

It may be possible that using JSON objects may solve this issue (i.e. #595)

@axsaucedo
Copy link
Contributor Author

I can confirm there is some strighly strange/unexpected behaviour: when the output of a container/step is an array of arrays, the next step will receive an array of PROTO lists, as opposed to just an array of python lists. This may not be a problem.

A simple end to end test was carried out with two containers, one that receives strings and splits each into words as [str1, str2] -> [[tokens from str1], [tokens from str2]). What happens is basically that the next step receives an array [ LIST_PROTO_1, LIST_PROTO_2, ...].

The example can be found here: https://github.com/axsaucedo/test_string

When running with the following command:

curl -X POST -H 'Content-Type: application/json' \
    -d "{'data': {'names': [], 'ndarray': ['hello world', 'this is another']}}" \
    http://localhost:31707/seldon/default/text-example/api/v0.1/predictions

If we look at the logs of the container step-two, we can see that the input it receives is of the following format:

[values {
  string_value: "hello"
}
values {
  string_value: "world"
}, 
 values {
  string_value: "this"
}
values {
  string_value: "is"
}
values {
  string_value: "another"
}
]

This could cause issues if the expected input is a list, as opposed to a proto list.

@axsaucedo
Copy link
Contributor Author

Ok, I've been able to narrow it down further. In order to replicate this, the only thing required is to send a request on the form of an array of arrays.

More specifically, it all works well if you send a request with data of the form of [[1,2,3]] or ["hello", "world"] or [["hello", "world"]]`.

However if we send a request with data of the form of [[1,2], [3,4]] or [["hello", "world"], ["hello", "world"]], then the inner arrays don't get converted from proto.

This can be reproduced by creating a python wrapper that just prints the input, and sending it the following request:

curl -X POST -H 'Content-Type: application/json' \
    -d "{'data': {'names': [], 'ndarray': [[1,2], [3,4,5]]}}" \
    http://localhost:31707/seldon/default/text-example/api/v0.1/predictions

This would then print:

[values {
  number_value: 1.0
}
values {
  number_value: 2.0
}
 values {
  number_value: 3.0
}
values {
  number_value: 4.0
}
values {
  number_value: 5.0
}
]

The reason why it prints it as "Values" is because it's a proto array. It should be printing it as a python array (i.e. [[1,2],[3,4,5]])

@axsaucedo
Copy link
Contributor Author

Right, I have finally found the issue. The issue was in the Python Wrapper in this line:
https://github.com/SeldonIO/seldon-core/pull/817/files#diff-c0d1b86b9926e2cb3e9c2be9a387ee37L175

The issue was that when the Proto was converted into a numpy array, the action performed was a shallow conversion, which meant that the array of protos was converted into a numpy array of protos. The fix is to use the MessageToDict utility that the google json library provides, which performs a deep copy, and converts the full array into a python array.

Since the python wrapper v12 this won't be an issue for REST as it doesn't use protos at all, but the two tests that were added would fail for any image pre-v12.

@axsaucedo
Copy link
Contributor Author

This will be fixed once #817 is merged, and should be fixed on all model servers and images with #818

@axsaucedo axsaucedo changed the title NDArray with values being lists of strings not supported NDArray with values being lists not supported - RESOLVED: Proto lists were not being deep-copied Aug 26, 2019
agrski pushed a commit that referenced this issue Dec 2, 2022
…es (#600)

* Respect MLServer content type

* update huggingface example notebook as returned prediction is not base64 encoded anymore with this change
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants