Onnx-mlir has runtime utilities to compile and run ONNX models in Python.
These utilities are implemented by the OnnxMlirCompiler
compiler interface
(include/OnnxMlirCompiler.h) and the ExecutionSession
class
(src/Runtime/ExecutionSession.hpp).
Both utilities have an associated Python binding generated by pybind library.
Using pybind, a C/C++ binary can be directly imported by the Python interpreter. For onnx-mlir, there are five such libraries, one to compile onnx-mlir models, two to run the models and the other two are to compile and run the models.
- The shapred library to compile onnx-mlir models is generated by
PyOMCompileSession
(src/Compiler/PyOMCompileSession.hpp) and build as a shared library tobuild/Debug/lib/PyCompile.cpython-<target>.so
. - The shared library to run onnx-mlir models is generated by
PyExecutionSession
(src/Runtime/PyExecutionSession.hpp) and built as a shared library tobuild/Debug/lib/PyRuntimeC.cpython-<target>.so
. - The Python library to run onnx-mlir models (src/Runtime/python/PyRuntime.py).
- The shared library to compile and run onnx-mlir models is generated by
PyOMCompileExecutionSessionC
(src/Runtime/PyOMCompileExecutionSession.hpp) and built as a shared library tobuild/Debug/lib/PyCompileAndRuntimeC.cpython-<target>.so
. - The Python library to compile run onnx-mlir models (src/Runtime/python/PyCompileAndRuntime.py). This library takes an .onnx file and the options as inputs, it will load it and then compile and run it.
The module can be imported normally by the Python interpreter as long as it is in your PYTHONPATH. Another alternative is to create a symbolic link to it in your working directory.
cd <working directory>
ln -s <path to the shared library to copmpile onnx-mlir models>(e.g. `build/Debug/lib/PyCompile.cpython-<target>.so`) .
ln -s <path to the shared library to run onnx-mlir models>(e.g. `build/Debug/lib/PyRuntimeC.cpython-<target>.so`) .
ln -s <path to the Python library to run onnx-mlir models>(e.g. src/Runtime/python/PyRuntime.py) .
ln -s <path to the shared library to compile and run onnx-mlir models>(e.g. `build/Debug/lib/PyCompileAndRuntimeC.cpython-<target>.so`) .
ln -s <path to the Python library to compile and run onnx-mlir models>(e.g. src/Runtime/python/PyCompileAndRuntime.py) .
python3
An ONNX model is a computation graph and it is often the case that the graph has a single entry point to trigger the computation. Below is an example of doing inference for a model that has a single entry point.
import numpy as np
from PyRuntime import OMExecutionSession
model = 'model.so' # LeNet from ONNX Zoo compiled with onnx-mlir
# Create a session for this model.
session = OMExecutionSession(shared_lib_path=model)
# Input and output signatures of the default entry point.
print("input signature in json", session.input_signature())
print("output signature in json",session.output_signature())
# Do inference using the default entry point.
a = np.full((1, 1, 28, 28), 1, np.dtype(np.float32))
outputs = session.run(input=[a])
for output in outputs:
print(output.shape)
In case a computation graph has multiple entry points, users have to set a specific entry point to do inference. Below is an example of doing inference with multiple entry points.
import numpy as np
from PyRuntime import OMExecutionSession
model = 'multi-entry-points-model.so'
# Create a session for this model.
session = OMExecutionSession(shared_lib_path=model, use_default_entry_point=False) # False to manually set an entry point.
# Query entry points in the model.
entry_points = session.entry_points()
for entry_point in entry_points:
# Set the entry point to do inference.
session.set_entry_point(name=entry_point)
# Input and output signatures of the current entry point.
print("input signature in json", session.input_signature())
print("output signature in json",session.output_signature())
# Do inference using the current entry point.
a = np.arange(10).astype('float32')
b = np.arange(10).astype('float32')
outputs = session.run(input=[a, b])
for output in outputs:
print(output.shape)
If a model was compiled by using --tag
, the value of --tag
must be passed to OMExecutionSession.
Using tags is useful when there are multiple sessions for multiple models in the same python script.
Below is an example of doing multiple inferences using tags.
import numpy as np
from PyRuntime import OMExecutionSession
encoder_model = 'encoder/model.so' # Assumed that the model was compiled using `--tag=encoder`
decoder_model = 'decoder/model.so' # Assumed that the model was compiled using `--tag=decoder`
# Create a session for the encoder model.
encoder_sess = OMExecutionSession(shared_lib_path=encoder_model, tag="encoder")
# Create a session for the decoder model.
decoder_sess = OMExecutionSession(shared_lib_path=decoder_model, tag="decoder")
In case two models were NOT compiled by using --tag
, they must be compiled
with different .so filenames if they are to be used in the same process. Indeed,
when no tags are given, we use the file name as its default tag.
Below is an example of doing multiple inferences without using tags.
import numpy as np
from PyRuntime import OMExecutionSession
encoder_model = 'my_encoder.so'
decoder_model = 'my_decoder.so'
# Create a session for the encoder model.
encoder_sess = OMExecutionSession(shared_lib_path=encoder_model) # tag will be `my_encoder` by default.
# Create a session for the decoder model.
decoder_sess = OMExecutionSession(shared_lib_path=decoder_model) # tag will be `my_decoder` by default.
To use functions without tags, e.g. run_main_graph
, set tag = "NONE"
.
The complete interface to OMExecutionSession
can be seen in the sources mentioned previously.
However, using the constructor and run method is enough to perform inferences.
def __init__(self, shared_lib_path: str, tag: str, use_default_entry_point: bool):
"""
Args:
shared_lib_path: relative or absolute path to your .so model.
tag: a string that was passed to `--tag` when compiling the .so model. By default, it is the output file name without its extension, namely, `filename` in `filename.so`
use_default_entry_point: use the default entry point that is `run_main_graph_{tag}` or not. Set to True by default.
"""
def run(self, input: List[ndarray]) -> List[ndarray]:
"""
Args:
input: A list of NumPy arrays, the inputs of your model.
Returns:
A list of NumPy arrays, the outputs of your model.
"""
def input_signature(self) -> str:
"""
Returns:
A string containing a JSON representation of the model's input signature.
"""
def output_signature(self) -> str:
"""
Returns:
A string containing a JSON representation of the model's output signature.
"""
def entry_points(self) -> List[str]:
"""
Returns:
A list of entry point names.
"""
def set_entry_point(self, name: str):
"""
Args:
name: an entry point name.
"""
An ONNX model can be compiled directly from the command line. The resulting library can then be executed using Python as shown in the previous sections. At times, it might be convenient to also compile a model directly in Python. This section explores the Python methods to do so.
The OMCompileSession object will take a file name while constructing. For the compilation, compile()
will take a flags
string as an input which will override any default options set from the env var.
import numpy as np
from PyCompile import OMCompileSession
# Load onnx model and create OMCompileSession object.
file = './mnist.onnx'
compiler = OMCompileSession(file)
# Generate the library file. Success when rc == 0 while set the opt as "-O3"
rc = compiler.compile("-O3")
# Get the output file name
model = compiler.get_compiled_file_name()
if rc:
print("Failed to compile with error code", rc)
exit(1)
print("Compiled onnx file", file, "to", model, "with rc", rc)
The PyCompile
module exports the OMCompileSession
class to drive the
compilation of a ONNX model into an executable model.
Typically, a compiler object is created for a given model by giving it the file name of the ONNX model.
Then, all the compiler options can be set as a whole std::string
to generate the desired executable.
Finally, the compilation itself is performed by calling the compile()
command where the user passes the options string as the input of this function.
The compile()
commands returns a return code reflecting the status of the compilation.
A zero value indicates success, and nonzero values reflect the error code.
Because different Operating Systems may have different suffixes for libraries,
the output file name can be retrieved using the get_compiled_file_name()
method.
The complete interface to OnnxMlirCompiler can be seen in the sources mentioned previously. However, using the constructor and the methods below are enough to compile models.
def __init__(self, file_name: str):
"""
Constructor for an ONNX model contained in a file.
Args:
file_name: relative or absolute path to your ONNX model.
"""
def __init__(self, input_buffer: void *, buffer_size: int):
"""
Constructor for an ONNX model contained in an input buffer.
Args:
input_buffer: buffer containing the protobuf representation of the model.
buffer_size: byte size of the input buffer.
"""
def compile(self, flags: str):
"""
Method to compile a model from a file.
Args:
flags: all the options users would like to set.
Returns:
Zero on success, error code on failure.
"""
def compile_from_array(self, output_base_name: str, target: OnnxMlirTarget):
"""
Method to compile a model from an array.
Args:
output_base_name: base name (relative or absolute, without suffix)
where the compiled model should be written into.
target: target for the compiler's output. Typical values are
OnnxMlirTarget.emit_lib or emit_jni.
Returns:
Zero on success, error code on failure.
"""
def get_compiled_file_name(self):
"""
Method to provide the full (absolute or relative) output compiled file name, including
its suffix.
Returns:
String containing the fle name after successful compilation; empty string on failure.
"""
def get_error_message(self):
"""
Method to provide the compilation error message.
Returns:
String containing the error message; empty string on success.
"""
import numpy as np
from PyCompileAndRuntime import OMCompileExecutionSession
# Load onnx model and create OMCompileExecutionSession object.
inputFileName = './mnist.onnx'
# Set the full name of compiled model
sharedLibPath = './mnist.so'
# Set the compile option as "-O3"
session = OMCompileExecutionSession(inputFileName,sharedLibPath,"-O3")
# Print the models input/output signature, for display.
# Signature functions for info only, commented out if they cause problems.
session.print_input_signature()
session.print_output_signature()
# Do inference using the default entry point.
a = np.full((1, 1, 28, 28), 1, np.dtype(np.float32))
outputs = session.run(input=[a])
for output in outputs:
print(output.shape)
The PyCompileAndRuntime is a new class, which combines compile and execution. Its constructor takes the .onnx
input file and compile the model with the options given by the user and then run the model with an input.
def __init__(self, input_model_path: str, compiled_file_path: str, flags: str, use_default_entry_point: bool):
"""
Constructor for an ONNX model contained in a file.
Args:
input_model_path: relative or absolute path to your ONNX model.
compiled_file_path: relative or absolute path to your compiled file.
flags: all the options users would like to set.
use_default_entry_point: use the default entry point that is `run_main_graph` or not. Set to True by default.
"""
def get_compiled_result(self):
"""
Method to provide the results of the compilation.
Returns:
Int containing the results. 0 represents successful compilation; others on failure.
"""
def get_compiled_file_name(self):
"""
Method to provide the full (absolute or relative) output file name, including
its suffix.
Returns:
String containing the fle name after successful compilation; empty string on failure.
"""
def get_error_message(self):
"""
Method to provide the compilation error message.
Returns:
String containing the error message; empty string on success.
"""
def entry_points(self) -> List[str]:
"""
Returns:
A list of entry point names.
"""
def set_entry_point(self, name: str):
"""
Args:
name: an entry point name.
"""
def run(self, input: List[ndarray]) -> List[ndarray]:
"""
Args:
input: A list of NumPy arrays, the inputs of your model.
Returns:
A list of NumPy arrays, the outputs of your model.
"""
def input_signature(self) -> str:
"""
Returns:
A string containing a JSON representation of the model's input signature.
"""
def output_signature(self) -> str:
"""
Returns:
A string containing a JSON representation of the model's output signature.
"""