Device Aware Serialization/Deserialization #311
narendasan
started this conversation in
RFCs
Replies: 3 comments 1 reply
-
@andi4191 What if we looked at current device information to select device? Like say a module is on device 0 then we target device 0 unless it is specified otherwise in the compile spec since we have the device info from the provided module? This would make the device field optional, give us a strong default that makes sense in terms of pytorch and how it works. |
Beta Was this translation helpful? Give feedback.
0 replies
-
Any progress on this functionality? Does it work? |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Goal
We want to be able to make TRTorch programs portable across machines that may not have the exact same GPU configuration but are still capable of running the embedded engine. Therefore we need to be able to select the correct device to deserialize the engine on. The proposed solution is to include in the serialized program information about the engine compatibility and preferences on which devices to select.
Device Information and Hierarchy
Relevant information:
Device selection hierarchy: (1 is broadest allowable, N is most preferable)
There are two separate cases to consider, when the engine is created for a GPU and when it's created for DLA. For GPU if there is no device with the compute capability used to build the engine then the runtime should error out, with an error like
Unable to deserialize trtorch program because engines were build targeting compute capability X, however no device with compute capability X available
. If there is more than one device:IF GPU:
IF DLA:
Compile Time:
At compile time the user is able to select the target device explicitly through existing TRTorch APIs.
User level API:
trtorch::CompileSpec::Device
https://github.com/NVIDIA/TRTorch/blob/08b2455c3ad540017c8929bbcab4669db7bdee7d/cpp/api/include/trtorch/trtorch.h#L261
Example Usecases
{device_type = DeviceType::kGPU, gpu_id = 0}
- Compute Capability
- GPU ID
- Device Name
- GPU or DLA
{device_type = DeviceType::kDLA, dla_core = 1}
@andi4191: For DLA do we correct the GPU ID if its the wrong device?
Source of Truth:
Split:
ConversionCtx
manages active device during conversion and building,CompileGraph
post conversion and build will use the data inCompileSpec.device
to create aruntime::DeviceInfo
struct. This will then get passed toAddEngineToGraph
, which passes it to theTRTEngine
constructorRuntime Time:
At runtime the device information is mostly opaque to the user. It would be nice to have some API through PyTorch to move engines around but for the time being the device the engine is deserialized on is where it will run.
DeviceInfo
Struct (a field in TRTEngine):User level API:
Ideally this could be done through PyTorch APIs:
Potential WAR: Overwrite load
where
Source of Truth:
core::runtime::TRTEngine
(device_info)Execution: (
execute_engine
)If current device != the device id in device_info then set to correct device. If input data is on the wrong device then we will throw a warning about having to move data and move tensors to the correct device via
aten::Tensor::to
. Output tensors should be created on the target device.Serialization Deserialization (
torch.jit.save
/torch.jit.load
)Serialization Format:
std::vector<std::string>
length 2 (This should be asserted in the deserializer)- 0: serialized engine
- 1: serialized device struct
Serialization:
Takes
TRTEngine
object, serializes CUDA engine via TRT API anddevice_info
viaserialize_device_info
.Deserialization:
Takes a
std::vector<std::string>
len 2, passes values toTRTEngine
constructor.TRTEngine
deserialization constructor will set the correct device before deserializing cuda engine, checking if there is an available device that matches device specifications. We use the rules defined above to select the correct device to deserialize on. To break ties right now we we use lowest device ID. End resultTRTEngine
is constructed, with execution context targeting the specified GPU indevice_info
.Next Steps
We need to think about how to give users control of where engines are located at runtime
Beta Was this translation helpful? Give feedback.
All reactions