#X is numpy array on cpu, create an OrtValue and place it on cuda device id = 0
-ortvalue = onnxruntime.OrtValue.ortvalue_from_numpy(X, 'cuda', 0)
-ortvalue.device_name() # 'cuda'
-ortvalue.shape() # shape of the numpy array X
-ortvalue.data_type() # 'tensor(float)'
-ortvalue.is_tensor() # 'True'
-np.array_equal(ortvalue.numpy(), X) # 'True'
+
+
+ONNX Runtime loads and runs inference on a model in ONNX graph format, or ORT format (for memory and disk constrained environments).
+The data consumed and produced by the model can be specified and accessed in the way that best matches your scenario.
+
+
+InferenceSession is the main class of ONNX Runtime. It is used to load and run an ONNX model,
+as well as specify environment and application configuration options.
+session = onnxruntime.InferenceSession('model.onnx')
-#ortvalue can be provided as part of the input feed to a model
-ses = onnxruntime.InferenceSession('model.onnx')
-res = sess.run(["Y"], {"X": ortvalue})
+outputs = session.run([output names], inputs)
+ONNX and ORT format models consist of a graph of computations, modeled as operators,
+and implemented as optimized operator kernels for different hardware targets.
+ONNX Runtime orchestrates the execution of operator kernels via execution providers.
+An execution provider contains the set of kernels for a specific execution target (CPU, GPU, IoT etc).
+Execution provides are configured using the providers parameter. Kernels from different execution
+providers are chosen in the priority order given in the list of providers. In the example below
+if there is a kernel in the CUDA execution provider ONNX Runtime executes that on GPU. If not
+the kernel is executed on CPU.
+session = onnxruntime.InferenceSession(model,
+ providers=['CUDAExecutionProvider', 'CPUExecutionProvider'])
+
-
-
-
By default, ONNX Runtime always places input(s) and output(s) on CPU, which
-is not optimal if the input or output is consumed and produced on a device
-other than CPU because it introduces data copy between CPU and the device.
-ONNX Runtime provides a feature, IO Binding, which addresses this issue by
-enabling users to specify which device to place input(s) and output(s) on.
-Here are scenarios to use this feature.
-
(In the following code snippets, model.onnx is the model to execute,
-X is the input data to feed, and Y is the output data.)
-
Scenario 1:
+
The list of available execution providers can be found here: Execution Providers.
+
Since ONNX Runtime 1.10, you must explicitly specify the execution provider for your target.
+Running on CPU is the only time the API allows no explicit setting of the provider parameter.
+In the examples that follow, the CUDAExecutionProvider and CPUExecutionProvider are used, assuming the application is running on NVIDIA GPUs.
+Replace these with the execution provider specific to your environment.
+
You can supply other session configurations via the session options parameter. For example, to enable
+profiling on the session:
+
options = onnxruntime.SessionOptions()
+options.enable_profiling=True
+session = onnxruntime.InferenceSession('model.onnx', sess_options=options, providers=['CUDAExecutionProvider', 'CPUExecutionProvider']))
+
+
+
+