v2.0.0-rc.3
Training
ort
now supports a (currently limited subset of) ONNX Runtime's Training API. You can use the on-device Training API for fine-tuning, online learning, or even full pretraining, on any CPU or GPU.
The train-clm
example pretrains a language model from scratch. There's also a 'simple' API and related example, which offers a basically one-line training solution akin to 🤗 Transformers' Trainer API:
trainer.train(
TrainingArguments::new(dataloader)
.with_lr(7e-5)
.with_max_steps(5000)
.with_ckpt_strategy(CheckpointStrategy::Steps(500))
)?;
You can learn more about training with ONNX Runtime here. Please try it out and let us know how we can improve the training experience!
ONNX Runtime v1.18
ort
now ships with ONNX Runtime v1.18.
The CUDA 12 build requires cuDNN 9.x, so if you're using CUDA 12, you need to update cuDNN. The CUDA 11 build still requires cuDNN 8.x.
IoBinding
IoBinding
's previously rather unsound API has been reworked and actually documented.
Output selection & pre-allocation
Sometimes, you don't need to calculate all of the outputs of a session. Other times, you need to pre-allocate a session's outputs to save on slow device copies or expensive re-allocations. Now, you can do both of these things without IoBinding
through a new API: OutputSelector
.
let options = RunOptions::new()?.with_outputs(
OutputSelector::no_default()
.with("output")
.preallocate("output", Tensor::<f32>::new(&Allocator::default(), [1, 3, 224, 224])?)
);
let outputs = model.run_with_options(inputs!["input" => input.view()]?, &options)?;
In this example, each call to run_with_options
that uses the same options
struct will use the same allocation in memory, saving the cost of re-allocating the output; and any outputs that aren't the output
aren't even calculated.
Value ergonomics
String tensors are now Tensor<String>
instead of DynTensor
. They also no longer require an allocator to be provided to create or extract them. Additionally, Map
s can also have string keys, and no longer require allocators.
Since value specialization, IntoTensorElementType
was used to describe only primitive (i.e. f32, i64) elements. This has since been changed to PrimitiveTensorElementType
, which is a subtrait of IntoTensorElementType
. If you have type bounds that depended on IntoTensorElementType
, you probably want to update them to use PrimitiveTensorElementType
instead.
Custom operators
Operator kernels now support i64
, string, Vec<f32>
, Vec<i64>
, and TensorRef
attributes, among most other previously missing C API features.
Additionally, the API for adding an operator to a domain has been changed slightly; it is now .add::<Operator>()
instead of .add(Operator)
.
Other changes
- 80be206 & 8ae23f2 Miscellaneous WASM build fixes.
- 1c0a5e4 Allow downcasting
ValueRef
&ValueRefMut
. - ce5aaba Add
EnvironmentBuilder::with_telemetry
.- pyke binaries were never compiled with telemetry support, only Microsoft-provided Windows builds of ONNX Runtime had telemetry enabled by default; if you are using Microsoft binaries, this will now allow you to disable telemetry.
- 23fce78
ExecutionProviderDispatch::error_on_failure
will immediately error out session creation if the registration of an EP fails. - d59ac43
RunOptions
is now taken by reference instead of via anArc
. - d59ac43 Add
Session::run_async_with_options
. - a92dd30 Enable support for SOCKS proxies when downloading binaries.
- 19d66de Add AMD MIGraphX execution provider.
- 882f657 Bundle
libonnxruntime
in library builds where crate-type=rlib
/staticlib
. - 860e449 Fix build for
i686-pc-windows-msvc
. - 1d89f82 Support
pkg-config
.
If you have any questions about this release, we're here to help:
Thank you to Florian Kasischke, cagnolone, Ryo Yamashita, and Julien Cretin for contributing to this release!
Thank you to Johannes Laier, Noah, Yunho Cho, Okabintaro, and Matouš Kučera, whose support made this release possible. If you'd like to support ort
as well, consider supporting us on Open Collective 💖
❤️💚💙💛