Release v2.0.0-rc.3 · pykeio/ort

Training

ort now supports a (currently limited subset of) ONNX Runtime's Training API. You can use the on-device Training API for fine-tuning, online learning, or even full pretraining, on any CPU or GPU.

The train-clm example pretrains a language model from scratch. There's also a 'simple' API and related example, which offers a basically one-line training solution akin to 🤗 Transformers' Trainer API:

trainer.train(
	TrainingArguments::new(dataloader)
		.with_lr(7e-5)
		.with_max_steps(5000)
		.with_ckpt_strategy(CheckpointStrategy::Steps(500))
)?;

You can learn more about training with ONNX Runtime here. Please try it out and let us know how we can improve the training experience!

ONNX Runtime v1.18

ort now ships with ONNX Runtime v1.18.

The CUDA 12 build requires cuDNN 9.x, so if you're using CUDA 12, you need to update cuDNN. The CUDA 11 build still requires cuDNN 8.x.

`IoBinding`

IoBinding's previously rather unsound API has been reworked and actually documented.

Output selection & pre-allocation

Sometimes, you don't need to calculate all of the outputs of a session. Other times, you need to pre-allocate a session's outputs to save on slow device copies or expensive re-allocations. Now, you can do both of these things without IoBinding through a new API: OutputSelector.

let options = RunOptions::new()?.with_outputs(
	OutputSelector::no_default()
		.with("output")
		.preallocate("output", Tensor::<f32>::new(&Allocator::default(), [1, 3, 224, 224])?)
);

let outputs = model.run_with_options(inputs!["input" => input.view()]?, &options)?;

In this example, each call to run_with_options that uses the same options struct will use the same allocation in memory, saving the cost of re-allocating the output; and any outputs that aren't the output aren't even calculated.

Value ergonomics

String tensors are now Tensor<String> instead of DynTensor. They also no longer require an allocator to be provided to create or extract them. Additionally, Maps can also have string keys, and no longer require allocators.

Since value specialization, IntoTensorElementType was used to describe only primitive (i.e. f32, i64) elements. This has since been changed to PrimitiveTensorElementType, which is a subtrait of IntoTensorElementType. If you have type bounds that depended on IntoTensorElementType, you probably want to update them to use PrimitiveTensorElementType instead.

Custom operators

Operator kernels now support i64, string, Vec<f32>, Vec<i64>, and TensorRef attributes, among most other previously missing C API features.

Additionally, the API for adding an operator to a domain has been changed slightly; it is now .add::<Operator>() instead of .add(Operator).

Other changes

80be206 & 8ae23f2 Miscellaneous WASM build fixes.
1c0a5e4 Allow downcasting ValueRef & ValueRefMut.
ce5aaba Add EnvironmentBuilder::with_telemetry.
- pyke binaries were never compiled with telemetry support, only Microsoft-provided Windows builds of ONNX Runtime had telemetry enabled by default; if you are using Microsoft binaries, this will now allow you to disable telemetry.
23fce78 ExecutionProviderDispatch::error_on_failure will immediately error out session creation if the registration of an EP fails.
d59ac43 RunOptions is now taken by reference instead of via an Arc.
d59ac43 Add Session::run_async_with_options.
a92dd30 Enable support for SOCKS proxies when downloading binaries.
19d66de Add AMD MIGraphX execution provider.
882f657 Bundle libonnxruntime in library builds where crate-type=rlib/staticlib.
860e449 Fix build for i686-pc-windows-msvc.
1d89f82 Support pkg-config.

If you have any questions about this release, we're here to help:

Thank you to Florian Kasischke, cagnolone, Ryo Yamashita, and Julien Cretin for contributing to this release!

Thank you to Johannes Laier, Noah, Yunho Cho, Okabintaro, and Matouš Kučera, whose support made this release possible. If you'd like to support ort as well, consider supporting us on Open Collective 💖

❤️💚💙💛

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v2.0.0-rc.3