Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: doc update #2967

Merged
merged 1 commit into from
Jul 3, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 16 additions & 14 deletions docsrc/ts/getting_started_with_cpp_api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,7 @@ As you can see it is pretty similar to the Python API. When you call the ``forwa

Compiling with Torch-TensorRT in C++
-------------------------------------
We are also at the point were we can compile and optimize our module with Torch-TensorRT, but instead of in a JIT fashion we must do it ahead-of-time (AOT) i.e. before we start doing actual inference work
We are also at the point where we can compile and optimize our module with Torch-TensorRT, but instead of in a JIT fashion we must do it ahead-of-time (AOT) i.e. before we start doing actual inference work
since it takes a bit of time to optimize the module, it would not make sense to do this every time you run the module or even the first time you run it.

With our module loaded, we can feed it into the Torch-TensorRT compiler. When we do so we must provide some information on the expected input size and also configure any additional settings.
Expand All @@ -113,9 +113,10 @@ With our module loaded, we can feed it into the Torch-TensorRT compiler. When we

mod.to(at::kCUDA);
mod.eval();

auto in = torch::randn({1, 1, 32, 32}, {torch::kCUDA});
auto trt_mod = torch_tensorrt::CompileGraph(mod, std::vector<torch_tensorrt::CompileSpec::InputRange>{{in.sizes()}});
std::vector<torch_tensorrt::core::ir::Input> inputs{torch_tensorrt::core::ir::Input({1, 3, 224, 224})};
torch_tensorrt::ts::CompileSpec cfg(inputs);
auto trt_mod = torch_tensorrt::ts::compile(mod, cfg);
auto in = torch::randn({1, 3, 224, 224}, {torch::kCUDA});
auto out = trt_mod.forward({in});

Thats it! Now the graph runs primarily not with the JIT compiler but using TensorRT (though we execute the graph using the JIT runtime).
Expand All @@ -131,11 +132,11 @@ We can also set settings like operating precision to run in FP16.
mod.to(at::kCUDA);
mod.eval();

auto in = torch::randn({1, 1, 32, 32}, {torch::kCUDA}).to(torch::kHALF);
auto input_sizes = std::vector<torch_tensorrt::CompileSpec::InputRange>({in.sizes()});
torch_tensorrt::CompileSpec info(input_sizes);
info.enable_precisions.insert(torch::kHALF);
auto trt_mod = torch_tensorrt::CompileGraph(mod, info);
auto in = torch::randn({1, 3, 224, 224}, {torch::kCUDA}).to(torch::kHALF);
std::vector<torch_tensorrt::core::ir::Input> inputs{torch_tensorrt::core::ir::Input({1, 3, 224, 224})};
torch_tensorrt::ts::CompileSpec cfg(inputs);
cfg.enable_precisions.insert(torch::kHALF);
auto trt_mod = torch_tensorrt::ts::compile(mod, cfg);
auto out = trt_mod.forward({in});

And now we are running the module in FP16 precision. You can then save the module to load later.
Expand Down Expand Up @@ -179,11 +180,12 @@ If you want to save the engine produced by Torch-TensorRT to use in a TensorRT a
mod.to(at::kCUDA);
mod.eval();

auto in = torch::randn({1, 1, 32, 32}, {torch::kCUDA}).to(torch::kHALF);
auto input_sizes = std::vector<torch_tensorrt::CompileSpec::InputRange>({in.sizes()});
torch_tensorrt::CompileSpec info(input_sizes);
info.enabled_precisions.insert(torch::kHALF);
auto trt_mod = torch_tensorrt::ConvertGraphToTRTEngine(mod, "forward", info);
auto in = torch::randn({1, 3, 224, 224}, {torch::kCUDA}).to(torch::kHALF);

std::vector<torch_tensorrt::core::ir::Input> inputs{torch_tensorrt::core::ir::Input({1, 3, 224, 224})};
torch_tensorrt::ts::CompileSpec cfg(inputs);
cfg.enabled_precisions.insert(torch::kHALF);
auto trt_mod = torch_tensorrt::ts::convert_method_to_trt_engine(mod, "forward", cfg);
std::ofstream out("/tmp/engine_converted_from_jit.trt");
out << engine;
out.close();
Expand Down
Loading