-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support >2G model export | torchlib(feat) #1003
Conversation
onnx_model, check_type=True, strict_mode=False, data_prop=True | ||
) | ||
onnx.checker.check_model(onnx_model, full_check=True) | ||
if not cache_model_to_disk: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the follow up PR we will remove the checks altogether from a discussion with Aaron: we should not check it here.
Returns: | ||
The estimated size of the tensor in bytes. | ||
""" | ||
return tensor.numel() * tensor.element_size() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks GPT
Codecov Report
@@ Coverage Diff @@
## main #1003 +/- ##
==========================================
- Coverage 77.23% 77.20% -0.03%
==========================================
Files 112 112
Lines 14009 14022 +13
Branches 1447 1450 +3
==========================================
+ Hits 10820 10826 +6
- Misses 2828 2833 +5
- Partials 361 363 +2
|
Test Results 18 files ± 0 18 suites ±0 1h 7m 49s ⏱️ + 2m 49s For more details on these errors, see this check. Results for commit db58031. ± Comparison against base commit b7d2939. ♻️ This comment has been updated with latest results. |
**export_kwargs | ||
) | ||
onnx_model = onnx.load_from_string(proto) | ||
onnx.load_external_data_for_model(onnx_model, temp_dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the trick is the 2GB limitation only applies when serializing, but not on in memory ModelProto.
Makes me think _export_onnx
should return ModelProto
instead of the serialized string. But that is not supported by pybind. We can probably create a c++ pybind api that returns ModelProto and initializers in separate serialized strings, then in python deserialize and combine them together. Only python api is exposed and used here which gives us ModelProto directly.
The benefit is we skip checking size, nor write to disk. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we return a list of initializers?
Taking this further, we don’t even need to pass in initializers to _export_onnx. We can serialize them ourselves outside with onnx. This way we don’t need to change the PyTorch c++ implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested with
def _add_initializers(model_proto: onnx.ModelProto, initializers: Mapping[str, torch.Tensor]):
tensor_protos = []
for name, tensor in initializers.items():
print(name, "0")
tensor_numpy = tensor.detach().numpy()
print(name)
tensor_proto = onnx.helper.make_tensor(
name=name,
data_type=onnx.helper.np_dtype_to_tensor_dtype(tensor_numpy.dtype),
dims=tensor_numpy.shape,
vals=tensor_numpy,
)
print(name, "done1")
tensor_protos.append(tensor_proto)
model_proto.graph.initializer.extend(tensor_protos)
But onnx.helper.make_tensor
is very slow.
I think returning a list of TensorProtos can work. But for now it seems to me allowing the compatibility may be nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*compatibility with torch2.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we will move away from torchscript eventually
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's try make_tensor(...torch_tensor.data_ptr().to_bytes(torch_tensor.element_size() * torch_tensor.numel(), byteorder=sys.byteorder), raw=True)
and if that doesn't work or is still slow then we go back to initial solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be
import ctypes
torch_tensor = torch.tensor([2, 3])
raw_data = bytes(ctypes.c_ubyte*torch_tensor.element_size()*torch_tensor.numel()).from_address(torch_tensor.data_ptr())
tensor_proto = make_tensor(..., vals=raw_data, raw=True)
I was misled by someone doing data_ptr().to_bytes()
, but what that really does was converting the pointer integer itself to bytes... lol
I'm not sure if it is worth it, looks hacky, but this should resemble what _export_onnx
is doing on c++ side. If that is still slow then there is nothing more we can do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
onnx.checker.check_model(onnx_model, full_check=True) | ||
if not cache_model_to_disk: | ||
# Only check the model if it is in memory. | ||
# Otherwise the checker and shape_inference will fail because |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For shape inference, can we still load shape and element type from model file (not initializer files) and then run infer_shape?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could, but we also don’t need to because PyTorch supplies all the shape info.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A drawback due to onnx/onnx#5487, we don't have much inner node shape info left now that modules are functions.
_estimate_tensor_size(tensor) for tensor in self.initializers.values() | ||
) | ||
|
||
# Treat models > 1GB as large models so that we have ample room |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Humm, maybe increase to 1.8 GB? I never see a model > 100MB without initializers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
onnx_model = onnx.load_from_string(proto) | ||
cache_model_to_disk = include_initializers and large_model | ||
|
||
if cache_model_to_disk: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whether or not storing initializers should be controlled by a user flag. Assume that I export a 1GB model on remote machine. I want to visualize it locally. I really don't want to download its initializers with home internet. If this flag can be turned on, I will be able to just download the structure of model and debug faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think once the user get the model proto, they can do whatever they want (aka remove all the data)? A user has full control when they get the dynamo export output as an object.
Further yet include_initializers is already an argument
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with @wschin 's goal and @justinchuby 's explanation. A thing to consider for ExportOutput.save
or ExportOutputSerializer
.
Would be nice to mention the perf impact / comparison too. |
Done |
Support >2G model export by caching the model to disk when necessary.
Tested locally with
test_save_initializer_to_files_for_large_model
Fixes #493
cc @wschin