Fix failure due to invalid elementCount when unpacking tensors in DmlExecutionProvider #19221

mtavenrath · 2024-01-22T10:40:35Z

Description

I've replaced *_data_size function calls used to determine the element count by computing the element_count based on the dimensions.

Motivation and Context

To benchmark the performance difference between external tensors and protobuf tensors I've exported a medium sized model (1.5GB) with the script below from a protobuf only onnx file to a onnx file with external tensors. As a result I was not able to load the model anymore into the dml EP due to *_data_size being 0.

import sys

# onnx_model is an in-memory ModelProto
onnx_model = onnx.load(sys.argv[1])

# Save the ONNX model
onnx.save(onnx_model, sys.argv[2], save_as_external_data=True, all_tensors_to_one_file=False, size_threshold=0, convert_attribute=False)

…ExecutionProvider

mtavenrath · 2024-01-22T18:04:13Z

@jeffbloo @PatriceVignola Hi, I am not sure if this a bug in the input network after the conversion or within the code uploading the tensor.

PatriceVignola · 2024-01-29T18:33:14Z

@jeffbloo @PatriceVignola Hi, I am not sure if this a bug in the input network after the conversion or within the code uploading the tensor.

Would you be able to use this existing helper internally?

onnxruntime/onnxruntime/core/framework/tensorprotoutils.cc

Line 303 in 9e69606

    
           Status UnpackTensor(const ONNX_NAMESPACE::TensorProto& tensor, const void* raw_data, size_t raw_data_len, \

If the function from tensorprotoutils works, then we should just start using it to avoid code duplication and make sure we benefits from improvement to this logic in the future. If the function from tensorprotoutils fails, then we should probably fail here as well since it means it's most likely a model or converter issue.

Either way, we should probably use those functions.

mtavenrath · 2024-02-01T07:27:33Z

This function is using the same attributes and the code I had trouble with. I'll give it a try anyways and if it fails as well I'll try to find a sufficiently large public model which will reproduce the same problem and file an issue.

PatriceVignola · 2024-02-05T07:40:57Z

It's also failing on some Stable Diffusion models converted through Olive, so this is a pretty high priority issue. It affects the 1.17 release so we'll probably need to release a patch.

I'm fine with merging this change in (I confirmed that it fixes the SD crash locally), but it's pretty weird that those functions started returning 0 all of a sudden. I'm going to try to find the root cause in the meantime.

mtavenrath · 2024-02-05T08:27:43Z

I guess this means

onnxruntime/onnxruntime/core/framework/tensorprotoutils.cc

Line 303 in 9e69606

    
           Status UnpackTensor(const ONNX_NAMESPACE::TensorProto& tensor, const void* raw_data, size_t raw_data_len, \

is broken as well since it's using the same fields?

PatriceVignola · 2024-02-05T08:51:52Z

It's not. The way we are unpacking the tensor in the DML EP has been regressed and we are trying to unpack external data as if it was internal data. I'll have a fix soon.

PatriceVignola · 2024-02-05T09:14:33Z

@mtavenrath This PR should address the issue #19415. It works for the SDXL crash, but let me know if it also works for your internal model.

mtavenrath · 2024-02-06T16:30:29Z

@PatriceVignola #19415 fixes the issue for my internal model as well. Closing this PR.

Fix failure due to invalid elementCount when unpacking tensors in Dml…

5f52904

…ExecutionProvider

mtavenrath closed this Feb 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix failure due to invalid elementCount when unpacking tensors in DmlExecutionProvider #19221

Fix failure due to invalid elementCount when unpacking tensors in DmlExecutionProvider #19221

mtavenrath commented Jan 22, 2024

mtavenrath commented Jan 22, 2024

PatriceVignola commented Jan 29, 2024

mtavenrath commented Feb 1, 2024

PatriceVignola commented Feb 5, 2024

mtavenrath commented Feb 5, 2024

PatriceVignola commented Feb 5, 2024

PatriceVignola commented Feb 5, 2024

mtavenrath commented Feb 6, 2024

Fix failure due to invalid elementCount when unpacking tensors in DmlExecutionProvider #19221

Fix failure due to invalid elementCount when unpacking tensors in DmlExecutionProvider #19221

Conversation

mtavenrath commented Jan 22, 2024

Description

Motivation and Context

mtavenrath commented Jan 22, 2024

PatriceVignola commented Jan 29, 2024

mtavenrath commented Feb 1, 2024

PatriceVignola commented Feb 5, 2024

mtavenrath commented Feb 5, 2024

PatriceVignola commented Feb 5, 2024

PatriceVignola commented Feb 5, 2024

mtavenrath commented Feb 6, 2024