Releases: pykeio/ort
v2.0.0-rc.9
🌴 Undo The Flattening (d4f82fc)
A previous ort
release 'flattened' all exports, such that everything was exported at the crate root - ort::{TensorElementType, Session, Value}
. This was done at a time when ort
didn't export much, but now it exports a lot, so this was leading to some big, ugly use
blocks.
rc.9
now has most exports behind their respective modules - Session
is now imported as ort::session::Session
, Tensor
as ort::value::Tensor
, etc. rust-analyzer
and some quick searches on docs.rs can help you find the right paths to import.
📦 Tensor extract
optimization (1dbad54)
Previously, calling any of the extract_tensor_*
methods would have to call back to ONNX Runtime to determine the value's ValueType
to ensure it was OK to extract. This involved a lot of FFI calls and a few allocations which could have a notable performance impact in hot loops.
Since a value's type never changes after it is created, the ValueType
is now created when the Value
is constructed (i.e. via Tensor::from_array
or returned from a session). This makes extract_tensor_*
a lot cheaper!
Note that this does come with some breaking changes:
- Raw tensor extract methods return
&[i64]
for their dimensions instead ofVec<i64>
. Value::dtype()
andTensor::memory_info()
now return&ValueType
and&MemoryInfo
respectively, instead of their non-borrowed counterparts.ValueType::Tensor
now has an extra field for symbolic dimensions,dimension_symbols
, so you might have to updatematch
es onValueType
.
🚥 Threading management (87577ef)
2.0.0-rc.9
introduces a new trait: ThreadManager
. This allows you to define custom thread create & join functions for session & environment thread pools! See the thread_manager.rs
test for an example of how to create your own ThreadManager
and apply it to a session, or an environment's GlobalThreadPoolOptions
(previously EnvironmentGlobalThreadPoolOptions
).
Additionally, sessions may now opt out of the environment's global thread pool if one is configured.
🧠 Shape inference for custom operators (87577ef)
ort
now provides ShapeInferenceContext
, an interface for custom operators to provide a hint to ONNX Runtime about the shape of the operator's output tensors based on its inputs, which may open the doors to memory optimizations.
See the updated custom_operators.rs
example to see how it works.
📃 Session output refactor (8a16adb)
SessionOutputs
has been slightly refactored to reduce memory usage and slightly increase performance. Most notably, it no longer derefs to a &BTreeMap
.
The new SessionOutputs
interface closely mirrors BTreeMap
's API, so most applications require no changes unless you were explicitly dereferencing to a &BTreeMap
.
🛠️ LoRA Adapters (d877fb3)
ONNX Runtime v1.20.0 introduces a new Adapter
format for supporting LoRA-like weight adapters, and now ort
has it too!
An Adapter
essentially functions as a map of tensors, loaded from disk or memory and copied to a device (typically whichever device the session resides on). When you add an Adapter
to RunOptions
, those tensors are automatically added as inputs (except faster, because they don't need to be copied anywhere!)
With some modification to your ONNX graph, you can add LoRA layers using optional inputs which Adapter
can then override. (Hopefully ONNX Runtime will provide some documentation on how this can be done soon, but until then, it's ready to use in ort
!)
let model = Session::builder()?.commit_from_file("tests/data/lora_model.onnx")?;
let lora = Adapter::from_file("tests/data/adapter.orl", None)?;
let mut run_options = RunOptions::new()?;
run_options.add_adapter(&lora)?;
let outputs = model.run_with_options(ort::inputs![Tensor::<f32>::from_array(([4, 4], vec![1.0; 16]))?]?, &run_options)?;
🗂️ Prepacked weights (87577ef)
PrepackedWeights
allows multiple sessions to share the same weights across multiple sessions. If you create multiple Session
s from one model file, they can all share the same memory!
Currently, ONNX Runtime only supports prepacked weights for the CPU execution provider.
‼️ Dynamic dimension overrides (87577ef)
You can now override dynamic dimensions in a graph using SessionBuilder::with_dimension_override
, allowing ONNX Runtime to do more optimizations.
🪶 Customizable workload type (87577ef)
Not all workloads need full performance all the time! If you're using ort
to perform background tasks, you can now set a session's workload type to prioritize either efficiency (by lowering scheduling priority or utilizing more efficient CPU cores on some architectures), or performance (the default).
let session = Session::builder()?.commit_from_file("tests/data/upsample.onnx")?;
session.set_workload_type(WorkloadType::Efficient)?;
Other features
- 28e00e3 Update to ONNX Runtime v1.20.0.
- 552727e Expose the
ortsys!
macro.- Note that this commit also made
ort::api()
return&ort_sys::OrtApi
instead ofNonNull<ort_sys::OrtApi>
.
- Note that this commit also made
- 82dcf84 Add
AsPointer
trait.- Structs that previously had a
ptr()
method now have anAsPointer
implementation instead.
- Structs that previously had a
- b51f60c Add config entries to
RunOptions
. - 67fe38c Introduce the
ORT_CXX_STDLIB
environment variable (mirroringCXXSTDLIB
) to allow changing the C++ standard library ort links to.
Fixes
- c1c736b Fix
ValueRef
&ValueRefMut
leaking value memory. - 2628378 Query
MemoryInfo
'sDeviceType
instead of its allocation device to determine whetherTensor
s can be extracted. - e220795 Allow
ORT_PREFER_DYNAMIC_LINK
to work even whencuda
ortensorrt
are enabled. - 1563c13 Add missing downcast implementations for
Sequence<T>
. - Returned Ferris to the docs.rs page 🦀
If you have any questions about this release, we're here to help:
Thank you to Thomas, Johannes Laier, Yunho Cho, Phu Tran, Bartek, Noah, Matouš Kučera, Kevin Lacker, and Okabintaro, whose support made this release possible. If you'd like to support ort
as well, consider contributing on Open Collective 💖
🩷💜🩷💜
v2.0.0-rc.7
Breaking: Infallible functions
The following functions have been updated to return T
instead of ort::Result<T>
:
MemoryInfo::memory_type
MemoryInfo::allocator_type
MemoryInfo::allocation_device
MemoryInfo::device_id
Value::<T>::memory_info
Value::<T>::dtype
Features
ValueType
now implementsDisplay
.- 7f71e6c Implement
Sync
forValue<T>
. - abd527b Arbitrarily configurable execution providers allows you to add custom configuration options to the CANN, CUDA, oneDNN, QNN, TensorRT, VITIS, and XNNPACK execution providers.
- This also fixes a bug when attempting to configure TensorRT's
ep_context_embed_mode
.
- This also fixes a bug when attempting to configure TensorRT's
- e16fd5b Add more options to the CUDA execution provider, including user compute streams and SDPA kernel configuration.
- bd3c891 Implement
Send
forAllocator
. - 6de6aa5 Add
Session::overridable_initializers
to get a list of overridable initializers in the graph. - c8b36f3 Allow loading a session with external initializers in memory.
- 2e1f014 Allow upgrading a
ValueRef
orValueRefMut
to aValue
in certain cases. - f915bca Adds
SessionBuilder::with_config_entry
for adding custom session config options. - ae7b594 Adds an environment variable,
ORT_PREFER_DYNAMIC_LINK
, to override whether or notort
should prefer static or dynamic libs whenORT_LIB_LOCATION
is specified. - 1e2e7b0 Add functions for explicit data re-synchronization for
IoBinding
. - b19cff4 Add
::ptr()
to every C-backed struct to exposeort_sys
pointers. - d0ee395 Implement
Clone
forMemoryInfo
.
Fixes
- b58595c The oneDNN execution provider now registers using a more recent API internally. (Also,
with_arena_allocator
is nowwith_use_arena
.) - cf1be86 Remove the lifetime bound for
IoBinding
so it can be stored in a struct alongside a session. - Multiple fixes to static linking for Linux, macOS, and Android.
- b1fb8c0
Sequence::extract_sequence
now returnsValue<T>
instead ofValueRef<T>
. - 542f210 Make
Environment
andExecutionProvider
Send + Sync
. - fbe8cbf (Sorta) handle error messages for non-English localities on Windows.
Other changes
- API documentation is now back on docs.rs!
- Improved error messages in multiple areas
If you have any questions about this release, we're here to help:
Thank you to Brad Neuman, web3nomad, and Julien Cretin for contributing to this release!
Thank you to Thomas, Johannes Laier, Yunho Cho, Phu Tran, Bartek, Noah, Matouš Kučera, Kevin Lacker, and Okabintaro, whose support made this release possible. If you'd like to support ort
as well, consider supporting us on Open Collective 💖
💜🩷💜🩷
v2.0.0-rc.6
ort::Error
refactor
ort::Error
is no longer an enum, but rather an opaque struct with a message and a new ErrorCode
field.
ort::Error
still implements std::error::Error
, so this change shouldn't be too breaking; however, if you were previously match
ing on ort::Error
s, you'll have to refactor your code to instead match on the error's code (acquired with the Error::code()
function).
AllocationDevice
refactor
The AllocationDevice
type has also been converted from an enum to a struct. Common devices like CUDA or DirectML are accessible via associated constants like AllocationDevice::CUDA
& AllocationDevice::DIRECTML
.
Features
- 60f6eca Update to ONNX Runtime v1.19.2.
- 9f4527c Added
ModelMetadata::custom_keys()
to get a Vec of all custom keys. - bfa791d Add various
SessionBuilder
options affecting compute & graph optimizations. - 5e6fc6b Expose the underlying
Allocator
API. You can now allocate & free buffers acquired from a session or operator kernel context. - 52422ae Added
ValueType::Optional
. - 2576812 Added the Vitis AI execution provider for new AMD Ryzen AI chips.
- 41ef65a Added the RKNPU execution provider for certain Rockchip NPUs.
- 6b3e7a0 Added
KernelContext::par_for
, allowing operator kernels to use ONNX Runtime's thread pool without needing an extra dependency on a crate like rayon.
Fixes
- edcb219 Make environment initialization thread-safe. This should eliminate intermittent segfaults when running tests concurrently, like seen in #278.
- 3072279 Linux dylibs no longer require version symlinks, fixing #269.
- bc70a0a Fixed unsatisfiable lifetime bounds when creating
Tensor
s from&CowArray
s. - 6592b17 Providing more inputs than the model expects no longer segfaults.
- b595048 Shave off dependencies by removing
tracing
'sattributes
feature - a--no-default-features
build ofort
now only builds 9 crates! - c7ddbdb Removed the
operator-libraries
feature - you can still useSessionBuilder::with_operator_library
, it's just no longer gated behind the feature!
If you have any questions about this release, we're here to help:
Love ort
? Consider supporting us on Open Collective 💖
❤️💚💙💛
v2.0.0-rc.5
Possibly breaking
- Pre-built static libraries (i.e. not
cuda
ortensorrt
) are now linked with/MD
instead of/MT
on Windows; i.e. MSVC CRT is no longer statically linked. This should resolve linking issues in some cases (particularly crates using other FFI libraries), but may cause issues for others. I have personally tested this in 2 internal pyke projects that depend onort
& many FFI libraries and haven't encountered any issues, but your mileage may vary.
Definitely breaking
- 069ddfd
ort
now depends onndarray
0.16. - e2c4549
wasm32-unknown-unknown
support has been removed.- Getting
wasm32-unknown-unknown
working in the first place was basically a miracle. Hacking ONNX Runtime to work outside of Emscripten took a lot of effort, but recent changes to Emscripten and ONNX Runtime have made this exponentially more difficult. Given I am not adequately versed on ONNX Runtime's internals, the nigh-impossibility of debugging weird errors,and the vow I took to write as little C++ as possible ever since I learned Rust, it's no longer feasible for me to work on WASM support forort
. - If you were using
ort
in WASM, I suggest you use and/or support the development of alternative WASM-supporting ONNX inference crates liketract
or WONNX.
- Getting
Features
- ab293f8 Update to ONNX Runtime v1.19.0.
- ecf76f9 Use the URL hash for downloaded model filenames. Models previously downloaded & cached with
commit_from_url
will be redownloaded. - 9d25514 Add missing configuration keys for some execution providers.
- 733b7fa New callbacks for the simple
Trainer
API, just like HF'sTrainerCallbacks
! This allows you to write custom logging/LR scheduling callbacks. See the updatedtrain-clm-simple
example for usage details.
Fixes
If you have any questions about this release, we're here to help:
Love ort
? Consider supporting us on Open Collective 💖
❤️💚💙💛
v2.0.0-rc.4
This release addresses important linking issues with rc3, particularly regarding CUDA on Linux.
cuDNN 9 is no longer required for CUDA 12 builds (but is still the default); set the ORT_CUDNN_VERSION
environment variable to 8
to use cuDNN 8 with CUDA 12.
If you have any questions about this release, we're here to help:
Love ort
? Consider supporting us on Open Collective 💖
❤️💚💙💛
v2.0.0-rc.3
Training
ort
now supports a (currently limited subset of) ONNX Runtime's Training API. You can use the on-device Training API for fine-tuning, online learning, or even full pretraining, on any CPU or GPU.
The train-clm
example pretrains a language model from scratch. There's also a 'simple' API and related example, which offers a basically one-line training solution akin to 🤗 Transformers' Trainer API:
trainer.train(
TrainingArguments::new(dataloader)
.with_lr(7e-5)
.with_max_steps(5000)
.with_ckpt_strategy(CheckpointStrategy::Steps(500))
)?;
You can learn more about training with ONNX Runtime here. Please try it out and let us know how we can improve the training experience!
ONNX Runtime v1.18
ort
now ships with ONNX Runtime v1.18.
The CUDA 12 build requires cuDNN 9.x, so if you're using CUDA 12, you need to update cuDNN. The CUDA 11 build still requires cuDNN 8.x.
IoBinding
IoBinding
's previously rather unsound API has been reworked and actually documented.
Output selection & pre-allocation
Sometimes, you don't need to calculate all of the outputs of a session. Other times, you need to pre-allocate a session's outputs to save on slow device copies or expensive re-allocations. Now, you can do both of these things without IoBinding
through a new API: OutputSelector
.
let options = RunOptions::new()?.with_outputs(
OutputSelector::no_default()
.with("output")
.preallocate("output", Tensor::<f32>::new(&Allocator::default(), [1, 3, 224, 224])?)
);
let outputs = model.run_with_options(inputs!["input" => input.view()]?, &options)?;
In this example, each call to run_with_options
that uses the same options
struct will use the same allocation in memory, saving the cost of re-allocating the output; and any outputs that aren't the output
aren't even calculated.
Value ergonomics
String tensors are now Tensor<String>
instead of DynTensor
. They also no longer require an allocator to be provided to create or extract them. Additionally, Map
s can also have string keys, and no longer require allocators.
Since value specialization, IntoTensorElementType
was used to describe only primitive (i.e. f32, i64) elements. This has since been changed to PrimitiveTensorElementType
, which is a subtrait of IntoTensorElementType
. If you have type bounds that depended on IntoTensorElementType
, you probably want to update them to use PrimitiveTensorElementType
instead.
Custom operators
Operator kernels now support i64
, string, Vec<f32>
, Vec<i64>
, and TensorRef
attributes, among most other previously missing C API features.
Additionally, the API for adding an operator to a domain has been changed slightly; it is now .add::<Operator>()
instead of .add(Operator)
.
Other changes
- 80be206 & 8ae23f2 Miscellaneous WASM build fixes.
- 1c0a5e4 Allow downcasting
ValueRef
&ValueRefMut
. - ce5aaba Add
EnvironmentBuilder::with_telemetry
.- pyke binaries were never compiled with telemetry support, only Microsoft-provided Windows builds of ONNX Runtime had telemetry enabled by default; if you are using Microsoft binaries, this will now allow you to disable telemetry.
- 23fce78
ExecutionProviderDispatch::error_on_failure
will immediately error out session creation if the registration of an EP fails. - d59ac43
RunOptions
is now taken by reference instead of via anArc
. - d59ac43 Add
Session::run_async_with_options
. - a92dd30 Enable support for SOCKS proxies when downloading binaries.
- 19d66de Add AMD MIGraphX execution provider.
- 882f657 Bundle
libonnxruntime
in library builds where crate-type=rlib
/staticlib
. - 860e449 Fix build for
i686-pc-windows-msvc
. - 1d89f82 Support
pkg-config
.
If you have any questions about this release, we're here to help:
Thank you to Florian Kasischke, cagnolone, Ryo Yamashita, and Julien Cretin for contributing to this release!
Thank you to Johannes Laier, Noah, Yunho Cho, Okabintaro, and Matouš Kučera, whose support made this release possible. If you'd like to support ort
as well, consider supporting us on Open Collective 💖
❤️💚💙💛
v2.0.0-rc.2
Changes
- f30ba57 Update to ONNX Runtime v1.17.3
- New: CUDA 12 binaries.
ort
will automatically detect CUDA 12/11 in your environment and install the correct binary. - New: Binaries for ROCm on Linux.
- Note that WASM is still on v1.17.1.
- New: CUDA 12 binaries.
- b12c43c Support for
wasm32-unknown-unknown
,wasm32-wasi
- With some minor limitations; see https://ort.pyke.io/setup/webassembly.
- Thank you to Yunho Cho, whose sponsorship made this possible! If you'd also like to support us, you may do so on Open Collective 💖
- cedeb55 Swap specialized value
upcast
anddowncast
function names to reflect their actual meaning (thanks @/messense for pointing this out!) - de3bca4 Fix a segfault with custom operators.
- 681da43 Fix compatibility with older versions of
rustc
. - 63a1818 Accept
ValueRefMut
as a session input. - 8383879 Add a function to create tensors from a raw device pointer, allowing you to create tensors directly from a CUDA buffer.
- 4af33b1 Re-export
ort-sys
asort::sys
.
If you have any questions about this release, we're here to help:
Love ort
? Consider supporting us on Open Collective 💖
❤️💚💙💛
v2.0.0-rc.1
Value specialization
The Value
struct has been refactored into multiple strongly-typed structs: Tensor<T>
, Map<K, V>
, and Sequence<T>
, and their type-erased variants: DynTensor
, DynMap
, and DynSequence
.
Values returned by session inference are now DynValue
s, which behave exactly the same as Value
in previous versions.
Tensors created from Rust, like via the new Tensor::new
function, can be directly and infallibly extracted into its underlying data via extract_tensor
(no try_
):
let allocator = Allocator::new(&session, MemoryInfo::new(AllocationDevice::CUDAPinned, 0, AllocatorType::Device, MemoryType::CPUInput)?)?;
let tensor = Tensor::<f32>::new(&allocator, [1, 128, 128, 3])?;
let array = tensor.extract_array();
// no need to specify type or handle errors - Tensor<f32> can only extract into an f32 ArrayView
You can still extract tensors, maps, or sequence values normally from a DynValue
using try_extract_*
:
let generated_tokens: ArrayViewD<f32> = outputs["output1"].try_extract_tensor()?;
DynValue
can be upcast()
ed to the more specialized types, like DynMap
or Tensor<T>
:
let tensor: Tensor<f32> = value.upcast()?;
let map: DynMap = value.upcast()?;
Similarly, a strongly-typed value like Tensor<T>
can be downcast back into a DynValue
or DynTensor
.
let dyn_tensor: DynTensor = tensor.downcast();
let dyn_value: DynValue = tensor.into_dyn();
Tensor extraction directly returns an ArrayView
extract_tensor
(and now try_extract_tensor
) now return an ndarray::ArrayView
directly, instead of putting it behind the old ort::Tensor<T>
type (not to be confused with the new specialized value type). This means you don't have to .view()
on the result:
-let generated_tokens: Tensor<f32> = outputs["output1"].extract_tensor()?;
-let generated_tokens = generated_tokens.view();
+let generated_tokens: ArrayViewD<f32> = outputs["output1"].try_extract_tensor()?;
Full support for sequence & map values
You can now construct and extract Sequence
/Map
values.
Value views
You can now obtain a view of any Value
via the new view()
and view_mut()
functions, which operate similar to ndarray
's own view system. These views can also now be passed into session inputs.
Mutable tensor extraction
You can extract a mutable ArrayViewMut
or &mut [T]
from a mutable reference to a tensor.
let (raw_shape, raw_data) = tensor.extract_raw_tensor_mut();
Device-allocated tensors
You can now create a tensor on device memory with Tensor::new
& an allocator:
let allocator = Allocator::new(&session, MemoryInfo::new(AllocationDevice::CUDAPinned, 0, AllocatorType::Device, MemoryType::CPUInput)?)?;
let tensor = Tensor::<f32>::new(&allocator, [1, 128, 128, 3])?;
The data will be allocated by the device specified by the allocator. You can then use the new mutable tensor extraction to modify the tensor's data.
What if custom operators were 🚀 blazingly 🔥 fast 🦀?
You can now write custom operator kernels in Rust. Check out the custom-ops
example.
Custom operator library feature change
Since custom operators can now be written completely in Rust, the old custom-ops
feature, which enabled loading custom operators from an external dynamic library, has been renamed to operator-libraries
.
Additionally, Session::with_custom_ops_lib
has been renamed to Session::with_operator_library
, and the confusingly named Session::with_enable_custom_ops
(which does not enable custom operators in general, but rather attempts to load onnxruntime-extensions
) has been updated to Session::with_extensions
to reflect its actual behavior.
Asynchronous inference
Session
introduces a new run_async
method which returns inference results via a future. It's also cancel-safe, so you can simply cancel inference with something like tokio::select!
or tokio::time::timeout
.
If you have any questions about this release, we're here to help:
Love ort
? Consider supporting us on Open Collective 💖
❤️💚💙💛
v2.0.0-alpha.4
Features
- af97600 Add support for extracting sequences & maps.
Changes
- 153d7af Remove built-in ONNX Model Zoo structs (note that you can still use
with_model_downloaded
, just now only with URLs)
This is likely one of the last alpha releases before v2.0 becomes stable 🎉
Love ort
? Consider supporting us on Open Collective 💖
❤️💚💙💛
v2.0.0-alpha.3
Fixes
- 863f1f3 Pin Model Zoo URLs to the old repo structure, new models will be coming soon.
Features
- 32e7fab Add
ort::init_from
on featureload-dynamic
to set the path to the dylib at runtime. - 52559e4 Cache downloaded binaries & models across all projects. Please update to save my bandwidth =)
- 534a42a Removed
with_log_level
. Instead, logging level will be controlled entirely bytracing
. - a9e146b Implement
TryFrom<(Vec<i64>, Arc<Box<[T]>>)>
forValue
, makingdefault-features = false
more ergonomic - 152f97f Add
Value::dtype()
to get the dtype of a tensor value.
Changes
- 32e7fab Remove the dependency on
once_cell
. - acfa782 Remove the
ORT_STRATEGY
environment variable. No need to specifyORT_STRATEGY=system
anymore, you only need to setORT_LIB_LOCATION
.
Love ort
? Consider supporting us on Open Collective 💖
❤️💚💙💛