Skip to content

Latest commit

 

History

History
126 lines (100 loc) · 3.96 KB

UsingPyRuntime.md

File metadata and controls

126 lines (100 loc) · 3.96 KB

Using PyRuntime

onnx-mlir has a runtime utility to run ONNX models compiled as a shared library by onnx-mlir --EmitLib. The runtime is implemented in C++ by the ExecutionSession class (src/Runtime/ExecutionSession.hpp) and has an associated Python binding generated by pybind library.

PyRuntime Module

Using pybind, a C/C++ binary can be directly imported by the Python interpreter. For onnx-mlir, such binary is generated by PyExecutionSession (src/Runtime/PyExecutionSession.hpp) and built as a shared library to build/Debug/lib/PyRuntime.cpython-<target>.so.

Configuring and using PyRuntime

Configuration

The module can be imported normally by the Python interpreter as long as it is in your PYTHONPATH. Another alternative is to create a symbolic link to it in your working directory.

cd <working directory>
ln -s <path to PyRuntime>
python3

Running the PyRuntime interface

An ONNX model is a computation graph and it is often the case that the graph has a single entry point to trigger the computation. Below is an example of doing inference for a model that has a single entry point.

import numpy as np
from PyRuntime import ExecutionSession

model = 'model.so' # LeNet from ONNX Zoo compiled with onnx-mlir

# Create a session for this model.
session = ExecutionSession(shared_lib_path=model)
# Input and output signatures of the default entry point.
print("input signature in json", session.input_signature())
print("output signature in json",session.output_signature())
# Do inference using the default entry point.
a = np.full((1, 1, 28, 28), 1, np.dtype(np.float32))
outputs = session.run(input=[a])

for output in outputs:
    print(output.shape)

In case a computation graph has multiple entry points, users have to set a specific entry point to do inference. Below is an example of doing inference with multiple entry points.

import numpy as np
from PyRuntime import ExecutionSession

model = 'multi-entry-points-model.so'

# Create a session for this model.
session = ExecutionSession(shared_lib_path=model, use_default_entry_point=False) # False to manually set an entry point.

# Query entry points in the model.
entry_points = session.entry_points()

for entry_point in entry_points:
  # Set the entry point to do inference.
  session.set_entry_point(name=entry_point)
  # Input and output signatures of the current entry point.
  print("input signature in json", session.input_signature())
  print("output signature in json",session.output_signature())
  # Do inference using the current entry point.
  a = np.arange(10).astype('float32')
  b = np.arange(10).astype('float32')
  outputs = session.run(input=[a, b])
  for output in outputs:
    print(output.shape)

PyRuntime model API

The complete interface to ExecutionSession can be seen in the sources mentioned previously. However, using the constructor and run method is enough to perform inferences.

def __init__(self, shared_lib_path: str, use_default_entry_point: bool):
    """
    Args:
        shared_lib_path: relative or absolute path to your .so model.
        use_default_entry_point: use the default entry point that is `run_main_graph` or not. Set to True by default.
    """

def run(self, input: List[ndarray]) -> List[ndarray]:
    """
    Args:
        input: A list of NumPy arrays, the inputs of your model.

    Returns:
        A list of NumPy arrays, the outputs of your model.
    """

def input_signature(self) -> str:
    """
    Returns:
        A string containing a JSON representation of the model's input signature.
    """

def output_signature(self) -> str:
    """
    Returns:
        A string containing a JSON representation of the model's output signature.
    """

def entry_points(self) -> List[str]:
    """
    Returns:
        A list of entry point names.
    """

def set_entry_point(self, name: str):
    """
    Args:
        name: an entry point name.
    """