Dates are in YYYY-MM-DD format.
- Added a
providers
parameter toSessionFromOnnx
to specify execution providers for ONNX-Runtime and a corresponding--providers
argument to CLI tools.
CompareFunc.simple()
will now correctly display the minimum required tolerances when usingelemwise
mode. Note that in elementwise comparison mode, each element of the output is compared against both tolerances, and only counted as a mismatch if both are exceeded. Hence, the minimum required tolerances apply if only one type of tolerance is being used. When both absolute/relative tolerance are set, the requirements may be lower.
- Added a
memory_pool_limits
parameter toCreateConfig
. - Added a
--pool-limit
/--memory-pool-limit
argument to command-line tools.
- Changed the default base calibrator class to
IInt8EntropyCalibrator2
since it works across both GPU and DLA. To preserve the old behavior, specify--calibration-base-class=IInt8MinMaxCalibrator
on the command-line or specify theBaseClass
argument inCalibrator
in the Python API. - Deprecated
--workspace
command-line option andmax_workspace_size
parameter inCreateConfig
. Use--pool-limit
andmemory_pool_limits
respectively instead.
- Removed deprecated module
polygraphy.util.serde
. Usepolygraphy.json
instead. - Removed
--tactic-replay
command-line option. Use--load-tactics
/--save-tactics
instead.
- Added support for
MeanVarianceNormalization
toPluginRefRunner
.
- Added a
profiling_verbosity
parameter toCreateConfig()
. - Added support for displaying layer-level engine information in
inspect model
for newer versions of TensorRT.
- Added a new
add()
API toRunResults
to make it easier to create custom output data. Added a new example to demonstrate how to use this API.
- Deprecated
--mode
option ininspect model
; a new--show
option has been introduced which can be used to individually control what is displayed. - Command-line tools will now use
IInt8EntropyCalibrator2
for calbration if DLA and int8 mode are enabled since the default does not work with DLA.
- Removed several deprecated submodules of
polygraphy.common
:constants
,cuda
,exception
,func
. These can now be found under the top-levelpolygraphy
module instead.
- Improved the help messages of various subtools, including
run
,debug build
, anddebug reduce
. - Added a default value for
--artifacts-dir
indebug
subtools.
- Fixed a bug in
surgeon insert
where data types of graph output tensors would not be preserved. - Fixed broken links in various READMEs.
- Added an
OnnxFromBytes
loader that can deserialize ONNX models. - Added an
obey_precision_constraints
argument toCreateConfig
and corresponding--obey-precision-constraints
CLI argument.
- Deprecated
strict_types
option inCreateConfig
and corresponding--strict-types
CLI argument.
- Added various examples, a CLI User Guide and directory for how-to guides.
- Added an experimental
template trt-config
tool to generate template scripts that create TensorRT builder configurations. - Added
--hide-fail-output
to makedebug
subtools suppress output from failed iterations. - Added experimental support for DLA.
- Added a
data to-input
tool that can combine inputs/outputs created by--save-inputs
/--save-outputs
. The resulting file is compatible with--load-inputs
.
- Updated
debug
subtools to show captured output on failed iterations. - The logger will now emit all
CRITICAL
messages tostderr
instead ofstdout
. - Renamed
CompareFunc.basic_compare_func
toCompareFunc.simple
. The old name is preserved as an alias. - The
--good
and--bad
arguments indiff-tactics
can now also accept single files instead of directories.
- Fixed a bug where
debug reduce
would crash when ONNX models includedConstant
nodes whose outputs needed to be marked as model outputs.
- Added support for
K
,M
, andG
suffixes to CLI arguments that expect a number of bytes (e.g.--workspace
). These correspond toKiB
,MiB
, andGiB
respectively. For example,--workspace=16M
is equivalent to--workspace=16777216
. - Added a
copy_outputs_to_host
parameter inTrtRunner.infer()
, which, when set toFalse
, will cause the runner to returnDeviceView
s instead of NumPy arrays for inference outputs. This allows us to avoid a device-to-host and host-to-device copy if we want outputs to remain on the device. - Added a
view()
method toDeviceArray
s to create read-onlyDeviceView
s over their data. - Added a
PluginRefRunner
which provides CPU reference implementations for TensorRT plugins and a corresponding--pluginref
runner option inpolygraphy run
.
-
Marked old shape syntax (
<name>,dim0xdim1x...xdimN,<dtype>
) as deprecated since it leads to ambiguity when parsing shapes including named dynamic dimensions.For example, compare:
--input-shapes input0,xxyxz
and:
--input-shapes input0:[x,y,z]
For now, the old syntax continues to work for shapes without named dimensions, but it will be removed in a future version of Polygraphy.
The newer syntax, which was originally introduced in Polygraphy 0.25.0, uses the list syntax already present in other parts of Polygraphy. For example,
--val-range [0,1]
inrun
and--attrs axes=[0,1]
insurgeon insert
use the same syntax. -
Made several performance improvements in the Polygraphy CUDA wrapper.
-
Added a loud warning when the deprecated
--int-min
/--int-max
or--float-min
/--float-max
options are used. These are superseded by--val-range
which allows you to specify data ranges on a per-input basis.
- Removed various deprecated aliases:
ModifyOnnx
,SessionFromOnnxBytes
,ModifyNetwork
,ModifyGraph
- Removed the
to-json
tool which was used to convert Pickled data generated by Polygraphy 0.26.1 and older to JSON. Polygraphy 0.27.0 and later only support reading and writing data in JSON format. - Removed deprecated legacy submodule
polygraphy.util.misc
which was just an alias forpolygraphy.util
.
- Improved the quality of several examples and added information on how to load serialized TensorRT engines as well as how to use custom input data.
- Added
inspect capability
subtool that will partition a ONNX graph into supported and unsupported subgraphs for usage within TensorRT. - Added Windows support to the CUDA wrapper in
polygraphy/cuda/cuda.py
.
SaveOnnx
will now create parent directories if they do not already exist.
- Fixed a bug where
ExtractSubgraph
would modify the original graph instead of creating a new graph.
- Fixed various typos, added more details to some tool READMEs.
- Added
polygraphy.config
as a top-level import so that it no longer needs to be imported separately (i.e.from polygraphy import config
).
- Fixed a bug where
surgeon sanitize
would not re-run shape inference after overriding model input shapes, causing constant folding to be sub-optimal.
- CLI tools will no longer print long stack traces on user error.
- Fixed a bug where
surgeon
subtools would not work with ONNX models without an.onnx
extension. - Fixed a bug where
surgeon insert
would not correctly run shape inference if the inserted node replaced the graph outputs. - Fixed a bug where
POLYGRAPHY_AUTOINSTALL_DEPS
would not work correctly for nested modules, e.g.mod.lazy_import("onnx.shape_inference")
.
- Added an
--ignore-fail-code
option todebug
subtools to ignore certain types of failures. - Added a highly experimental
OnnxLikeFromNetwork
loader that can generate a file using the ONNX format based on a TensorRT network. The resulting model is not valid ONNX, but is useful for visualization. - Added a
onnx-like-trt-network
type inconvert
to generate ONNX-like models from TensorRT networks. - Added support for custom installation commands during dependency autoinstall.
This can be configured using
config.INSTALL_CMD
or by setting thePOLYGRAPHY_INSTALL_CMD
environment variable. - Added support for loading external data in
InferShapes
. - Added a
--no-per-pass-shape-inference
argument tosurgeon sanitize
to disable shape inference between constant-folding passes. - Added a
--external-data-size-threshold
CLI option for saving external data for ONNX models. - Added a
--no-save-all-tensors-to-one-file
CLI option to avoid saving ONNX external data to a single file.
- Improved logic for auto-permuting tensors in
basic_compare_func
. The new logic can handle an arbitrary number of dimensions. For example, if two tensors with shapes(1, 3, 45, 45, 45)
and(1, 45, 45, 45, 3)
are being compared,basic_compare_func
will now guess that the latter should be transposed using a permutation of(0, 4, 1, 2, 3)
to match the former. - Improved display of
Profile
in logging messages. - Updated NumPy array encoding to use
base64
. In some cases, this can reduce file sizes by a factor of 4. - Updated
debug precision
default direction toforward
as this typically leads to better results. - Added a
--no-strict-types
flag todebug precision
in case strict types needs to be disabled for any reason. FoldConstants
will no longer run shape inference if shape folding is disabled.InferShapes
will now automatically write large models to the disk to work around the 2 GiB protobuf size limitation. The threshold can be configured using thesave_to_disk_threshold_bytes
parameter.
- Fixed a bug in
inspect model
where engine output bindings would all be printed on one line. - Fixed a bug where using
set_profile
in theTrtRunner
would sometimes cause input shape checks ininfer
to fail even when shapes were valid. - Fixed a bug in
inspect model
where engine output bindings would display the wrong shapes for profiles after the first. - Fixed a bug where
debug precision
would incorrectly mark constant layer outputs and non-execution tensors to run in higher precision. - Fixed a bug where
debug precision
would crash if engine building failed. It now continues to the next iteration, counting the previous one as a failure. - Fixed a bug where
InferShapes
would require--external-data-dir
to be set even if the external data were in the same directory as the model. - Fixed a bug where
--data-loader-script
would not provide data in therun
tool if int8 calibration were enabled in TensorRT.
- Added a
--log-file
option to CLI tools to store logging output to a file. - Added an
--iteration-info
argument todebug
subtools so that--check
commands can get information about the current iteration. - Added an experimental
debug repeat
subtool, which is more generic than the existingdebug
subtools.
- Swapping NumPy arrays to the disk is now disabled by default. It can be re-enabled by setting the
POLYGRAPHY_ARRAY_SWAP_THRESHOLD_MB
environment variable.
- Added support for per-output
check_error_stat
, which allows different metrics to be checked for different outputs.
- Moved JSON utilities into a separate
polygraphy.json
submodule. For backwards compatibility, they remain accessible viapolygraphy.util
as well. - The
max
value forcheck_error_stat
inbasic_compare_func
now only checks the maximum absolute/relative tolerances. The previous behavior of checking the values element-wise is preserved in theelemwise
option, which is now the default.
- Fixed a bug where data loader would not cast value ranges provided for
bool
input types, which could lead to generating out-of-bound values.
- Fixed a bug where NumPy arrays smaller than 8 MiB would be serialized to disk unnecessarily.
- Added a
check_error_stat
option inbasic_compare_func
and corresponding--check-error-stat
CLI option to control which statistic (e.g. mean, median, max) of the error is used to determine whether outputs matched.
- A histogram of output/error values will now be displayed at INFO severity on comparison failures. Otherwise, it is displayed at VERBOSE severity.
- Fixed a bug where histogram display would wrap to subsequent lines.
- Added more information about absolute/relative difference in
basic_compare_func
. For example, it will now print a histogram of the distribution of the outputs and errors.
- Added mean absolute/relative error to
OutputCompareResult
, which is returned byComparator.compare_accuracy
. This makes it easier to programmatically access this information.
- Several improvements to the quality of error messages and warnings.
- Fixed a bug where
basic_compare_func
andDataLoader
would issue warnings when default tolerances/value ranges were used.
- Fixed a bug where command-line tools would fail if a
--timing-cache
argument was provided but the file did not exist.
basic_compare_func
will now issue warnings ifatol
/rtol
contain invalid keys.DataLoader
will now issue warnings ifval_range
contains invalid keys.
- Added a
tactic_sources
parameter inCreateConfig
to control TensorRT's tactic sources. - Added a
--tactic-sources
argument to CLI tools. - Added a
DeviceView
class in thecuda
submodule to represent views of GPU memory.DeviceArray
is now a subclass ofDeviceView
. - Added support for accepting
DeviceView
s or device pointers in theCalibrator
. This means that you can now run calibration using data already on the GPU. - Added support for
DeviceView
s inTrtRunner.infer()
. Note thatDeviceView
s cannot be used for input shape-tensors, which must be allocated on the host. - Added support for using
trt.IInt8Calibrator
as theBaseClass
ofCalibrator
. - Exposed some lower level functions like
malloc
,free
, andmemcpy
in the Polygraphy CUDA wrapper. - Added a
set_profile()
method toTrtRunner
to control the active optimization profile. - Added
-q
/--quiet
option to CLI tools. This can be used to suppress logging output without eliminating all output like--silent
does. - Added a
to_trt()
method toProfile
to convert it to a TensorRTIOptimizationProfile
. - Added
--force-fallback-shape-inference
option todebug reduce
. - Added
--fail-regex
option todebug reduce
to distinguish among different types of falures based on command output.
- Changed
TRT_LOGGER
toget_trt_logger()
to make it work properly with lazy imports. - Further improved lazy imports such that no modules are required in order to import Polygraphy modules. Using functionality from Polygraphy modules still requires dependencies.
- Various submodules have been restructured. The old import structure is preserved for backwards compatibility.
- Added
Profile.fill_defaults()
which makes it possible to automatically fill a TensorRT optimization profile with sane default values. - It is now possible to provide TensorRT optimization profile shapes for a subset of the network inputs. In such
cases, the rest of the profile will be populated automatically with
Profile.fill_defaults()
surgeon extract
will no longer run shape inference unless it is required, e.g. ifauto
is specified for one of the shape/data type arguments.- ONNX shape inference will now be skipped when
--force-fallback-shape-inference
is enabled insurgeon extract/sanitize
. debug reduce
will now freeze intermediate shapes in the model if--model-input-shapes
is provided.IterationResult
s now storeLazyNumpyArray
rather thannp.ndarray
. The public interface forIterationResult
will automatically pack or unpacknp.ndarray
s into/fromLazyNumpyArray
, so the change is completely transparent. This can significantly reduce memory usage for tools likedebug reduce
andrun
.
- Attempting to load a non-existent file will now cause a friendly error message to be displayed rather than crashing.
surgeon sanitize
will no longer override shapes other than those specified in--override-input-shapes
.
- Removed optional
symbol
parameter fromlazy_import
.
- For security reasons, all serialization/deserialization code in Polygraphy has been updated to use JSON
instead of
pickle
. Use the includedto-json
tool to convert data serialized withpickle
to JSON format. - Split
TacticReplayer
into separateTacticRecorder
andTacticReplayer
classes. This provides more fine-grained control over whether to record or replay tactics. - Deprecated
--tactic-replay
in favor of--save-tactics
and--load-tactics
. - Changed
check_finite
parameter inComparator.validate()
tocheck_inf
, since it checks whether values are non-finite rather than the opposite.
- Polygraphy will now validate command-line arguments so that code-injection is not possible.
debug diff-tactics
will now work correctly when replay files are in nested directories.
- Added
--force-fallback-shape-inference
option tosurgeon sanitize
in case ONNX shape inference doesn't work well enough to allow for folding. - Added a
--calibration-base-class
option to allow changing base class for the TensorRT int8 calibrator.
FoldConstants
will no longer fail if a constant folding pass fails. Seterror_ok=False
to disable this behavior.
- Added support for saving/loading ONNX models with externally stored weights.
- Added support for automatically installing dependencies as they are needed. This behavior can be enabled by
setting the
POLYGRAPHY_AUTOINSTALL_DEPS
environment variable to1
. When auto-install is enabled, Polygraphy can also automatically upgrade existing packages if a newer version is requested. - Added
error_ok
option inInferShapes
, which can be set toFalse
to make the loader raise an error when shape inference fails.
val_range
in the DataLoader now falls back to the default range if no range is specified for an input.atol
andrtol
inCompareFunc.basic_compare_func
now fall back to the default tolerance values if no tolerance is specified for an output.- Folding shapes is now optional in
FoldConstants
. surgeon sanitize
now includes a--no-fold-shapes
option to disable shape folding.
- Fixed a bug in
surgeon insert
where input tensors would be disconnected from all their consumers. Previously, in a model with branches, if one entire branch was replaced bysurgeon insert
, the other branch would be invalidated. This is no longer the case. run
will now attempt to avoid introducing a dependency on theonnx
Python package when using an ONNX model when--trt
is the only specified runner.- When
--force-fallback-shape-inference
is set insurgeon extract
, it will now correctly ignore shapes already inferred in the model. - ONNX loaders will no longer make a copy of the model unnecessarily. If a copy is desired, the
copy
parameter can be set toTrue
for loaders that may modify the model. InferShapes
/infer_shapes
will now work with ONNX models larger than 2 GiB if a path to the model is provided instead of anonnx.ModelProto
- Fixed a bug where
FoldConstants
would not count nodes within subgraphs.
- Removed
OnnxTfRunner
and associated CLI options.
- Added
--partitioning
flag tosurgeon sanitize
to control how ONNX-GraphSurgeon partitions the graph during constant folding. - Added
--cleanup
flag tosurgeon sanitize
to remove dead layers in ONNX models.
ExtractSubgraph
loader will now fallback to using shapes/dtypes defined in the model when none are specified.
surgeon sanitize
no longer runs inference when the--override-input-shapes
option is set. Instead, intermediate shapes are cleared.surgeon extract
will no longer override shapes or data types already set in the model when running fallback shape inference.
- Added support for list attributes in
surgeon insert
. - Added
val_range
parameter to data loader, which is more generic thanint_range
/float_range
, which are now deprecated. - Added support for per-input data ranges to
val_range
parameter. - Added
--val-range
CLI option to set input ranges on the command-line. - Added
:
as a valid separator for various options and[dim0,...,dimN]
as valid syntax for shapes. For example, you can now optionally use:instead of:--inputs input0:[3,4]:int64 input1:[4,64,64]:float32
The new and old styles cannot be mixed.--inputs input0,3x4,int64 input1,4x64x64,float32
- Added support for specifying per-output top-k values to CLI tools.
- Added
--trt-config-script
argument, which allows CLI tools to accept scripts that define functions that create TensorRT builder configurations. - Added
--data-loader-script
argument, which allows CLI tools to accept scripts that define data loaders. - Added a new example for the
convert
CLI tool, which shows how to use a custom data loader for int8 calibration on the command-line.
- Fixed a bug where
debug reduce
would remove branches even if they were required to reproduce failures.
- Added support for string input types in
OnnxrtRunner
. - Added
reduce
subtool todebug
which can reduce failing ONNX models to the smallest possible failing subgraph.
- ONNX loaders will no longer modify the original model provided, but instead make a shallow copy.
- Added an example to
dev/
showing how to write new command-line tools.
- Verbose TensorRT network logging will no longer fail to show attributes for layers on older versions of TensorRT.
convert
can now automatically determine the output model type based on the file extension.- Added immediately evaluated functional variants for all loaders exported by Polygraphy.
The variants use the same name as the loaders, except
snake_case
instead ofPascalCase
. See the example for details.
- Polygraphy no longer has
numpy
as an install requirement. Note however that most, but not all, APIs and CLI tools in Polygraphy still do requirenumpy
.
- Removed
func.invoke()
since immediately evaluated functions now supersede it.
- Fixed a bug where some
debug
subtools would write engines to the wrong path.
- Added
FoldConstants
loader for ONNX models. - Added
ExtractSubgraph
loader for ONNX models.
- Moved
--fp-to-fp16
option toconvert
.
- Added
ConvertToFp16
as a separate loader for ONNX models. - Added
InferShapes
loader for ONNX models.
surgeon sanitize
will now run shape inference by default.Modify<X>
loaders have been renamed toModify<X>Outputs
to better reflect their purpose.surgeon sanitize
can now run multiple passes of constant folding to deal with nodes that may not be folded after the first pass (for example,Shape
nodes in cases where ONNX shape inference does not complete).
- Added an experimental
debug
subtool, which includesbuild
anddiff-tactics
(formerly part offlaky
) andprecision
(formerly a separate tool).
flaky diff-tactics
will now only show layers that have potentially bad tactics. To view an entire tactic replay, useinspect tactics
flaky repeat
will now only log commandstderr
output withERROR
severity if the command failed. Otherwise,stderr
output is logged withWARNING
severity.TacticReplayer
can now accept aTacticReplayData
instance directly.TacticReplayData
can now be constructed manually instead of relying on TensorRT types.
flaky
andprecision
tools have been removed and replaced by thedebug
subtool, which includes the functionality of both.
- Added a
POLYGRAPHY_INTERNAL_CORRECTNESS_CHECKS
environment variable to enable internal correctness checks at runtime. By default, these checks are disabled. A failure in such a check typically indicates a bug in Polygraphy. - Added context managers for CUDA helper classes. This helps ensure they are correctly freed.
- Added
sparse_weights
parameter toCreateConfig
, which enables TensorRT optimizations related to sparse weights. - Added a
--sparse-weights
option to various CLI tools.
- Added checks for cases where paths provided to
BytesFromPath
did not exist.
- Added
__enter__
/__exit__
toCalibrator
so that device buffers can be reliably freed after calibration using a context manager. - Added
fp_to_fp16
parameter toModifyOutputs
which will useonnxmltools
to convert float tensors in the model to 16-bit floats. - Added
--fp-to-fp16
CLI argument to various tools. - Added support for
float
,int
, andstr
attributes tosurgeon insert
. - Added
InvokeFromScript
loader, which can import and invoke a function from a Python script. - Added support for loading TensorRT networks from Python scripts to various CLI tools.
CLI tools can now accept a Python script in place of a model file.
The script should define a
load_network
function that takes no arguments and returns a TensorRT builder, network, and optionally parser. See the example for details. - Added an experimental
template
tool that can generate template files.- Added a
trt-network
subtool that can generate a template script for defining TensorRT networks using the network API.
- Added a
- Added a
SaveBytes
loader to facilitate writing bytes to a file between loaders. - Added an experimental
flaky
tool that can help debug flaky failures.- Added
repeat
subtool, which will run a command repeatedly and sort artifacts intogood
andbad
directories. - Added
diff-tactics
subtool, which compares known-good and known-bad tactic replay files to determine which tactics may be the source of error.
- Added
EngineFromNetwork
andCreateConfig
no longer use the global timing cache by default.- Changed
--timing-cache
default in CLI tools toNone
. - Changed
timing_cache
parameter toload_timing_cache
andsave_timing_cache
inCreateConfig
andEngineFromNetwork
respectively. - Runners will now raise errors in
infer
if the provided input data types or shapes do not match expected types and shapes. This behavior can be disabled by settingcheck_inputs=False
. - Changed
--toposort
default to off insurgeon
tools as ONNX models are typically topologically sorted. - The logger will now log messages with
WARNING
or greater severity tosys.stderr
instead ofsys.stdout
- Removed
CNTKRunner
and--cntk
CLI option. - Removed experimental
--network-api
flag in CLI tools. This is superseded by thetemplate trt-network
subtool.
- Fixed memory leaks in
EngineFromNetwork
,EngineFromBytes
, andTrtRunner
.
- Added support for timing caches in
EngineFromNetwork
andCreateConfig
. The former can generate caches, while the latter can load them, resulting in much faster engine builds. By default, Polygraphy will use a global timing cache in the temporary directory. - Added a
--timing-cache
option to various CLI tools. - Added an
EngineBytesFromNetwork
TensorRT loader to provide serialized engines directly. - Added a
BytesFromEngine
TensorRT loader to provide a means of in-memory engine serialization. - Added an experimental
convert
subtool, which can convert models to various other formats. - Added an
algorithm_selector
parameter toCreateConfig
to allow the user to override TensorRT's tactic choices. - Added a
TacticReplayer
algorithm selector to allow for recording and replaying tactics in the TensorRT builder. This makes it possible to make the TensorRT builder behave deterministically. - Added an experimental
--tactic-replay
option to various CLI tools to make it possible to record to and replay from tactic replay files. - Added an experimental
inspect
subtool,tactics
which can display tactic replay files in a human readable format.
- The
surgeon sanitize
subtool can now also modify model outputs. surgeon insert
will now preserve graph input and output names.
- Fixed a bug where the CUDA wrapper could not allocate buffers larger than 3GiB.
TrtRunner
can now optionally accept acontext
directly instead of anengine
.basic_compare_func
will now show mismatched indices in addition to mismatched values.
- Added an experimental
surgeon
subtool,insert
, which can insert new nodes into an ONNX model. - Added an experimental
surgeon
subtool,sanitize
, which can remove unused nodes and fold constants in an ONNX model. - Added
--load-inputs
and--save-inputs
to provide a mechanism to supply custom input data on the command line. - Added
func.invoke()
, a function that calls a provided callable. This can be useful to make it more obvious that a loader is being immediately evaluated. For example:EngineFromNetwork(...)()
vs.func.invoke(EngineFromNetwork(...))
- Added per-output tolerance support in
basic_compare_func
. - Added per-output tolerance support to the
--atol
and--rtol
command-line options.
- Renamed
inspect results
toinspect data
since it can now also be used to inspect input data, not just results. Comparator.compare_accuracy
now supports comparing a single runner against itself.
- Removed experimental surgeon subtool
prepare
andoperate
as they were difficult to maintain and not very useful.
- Fixed a memory leak due to
IBuilderConfig
not being properly freed in theEngineFromNetwork
loader. - Fixed memory leaks on exceptions in TensorRT loaders.
- Fixed a bug in
inspect model
wheredim_param
s in ONNX models would show up as-1
.
- Shape values in
TensorMetadata
can now be strings to indicate dynamic dimensions. TRT_LOGGER
is now exported underpolygraphy.backend.trt
- Fixed a bug in
surgeon extract
where ONNX models usingdim_param
would be rejected.
- Added missing copyright headers
- Added an
--input-shapes
alias for the--inputs
option inrun
to better reflect its purpose.
inspect model
will no longer showdtype
/shape
asNone
if the information is not present in the model. Instead, these are now omitted.
- Fixed a bug where boolean outputs would cause a crash in
basic_compare_func
- Fixed a bug where
TrtRunner
would use the wrong shapes for empty tensor outputs .
- Fixed a bug where the
Calibrator
would not re-check the cache whenreset()
- Added
-v
/--version
flag topolygraphy
- Cleaned up unnecessary logging output, and fixed formatting.
- Added new modes to
inspect model
, to control whether to show weights in the model. - Added
-s
/--show-values
option toinspect results
to display output values. - Added an experimental
--top-k
flag torun
, which will apply a Top-K before comparing outputs. - Added
exclude_outputs
toModifyOutputs
andModifyNetworkOutputs
- Added an experimental
--onnx-exclude-outputs
and--trt-exclude-outputs
to selectively unmark outputs.
- Fixed a bug in
inspect model
for ONNX models containing nodes with Tensor attributes. - Fixed a bug where
DeviceArray.copy_from
would segfault in rare cases.
- General cleanup and addition of missing docstrings.
- Fixed a bug where
DataLoader
would use a shape provided by the user even for static shapes in the model. - Fixed a bug where
DataLoader
would incorrectly report certain tensors as shape tensors. - Fixed a bug where the
DataLoaderCache
would stop checking the cache after the first miss.
- Added an
extend
decorator, which makes it easier to extend existing loaders. - Added more API examples.
Comparator.compare_accuracy
will now display an accuracy summary after processing all iterations.- Added a
CreateNetwork
loader to create new TensorRT networks - Added experimental
--network-api
option that works with--gen
to allow manually defining a TensorRT network.
Calibrator
can now accept a file-like object forcache
instead of just a file path.
- Fixed various errors in API documentation.
EngineFromBytes
will now calltrt.init_libnvinfer_plugins
before attempting to deserialize the engine.
- Added HTML docs for the Python API
- Fixed a bug where the data loader would not support cases where
int_min
==int_max
when bounding input data - Fixed a bug where OnnxrtRunner would report incorrect metadata for ONNX models using
dim_param
for dynamic dimensions.
CreateConfig
now accepts astrict_types
argument.- Added a new
polygraphy
binary, which includes several tools - Added an experimental new tool:
precision
, which can be used to figure out what layers to run in higher precision in TensorRT to achieve the desired accuracy.- Added
bisect
subtool that does binary search - Added
linear
subtool that does a linear search - Added
worst-first
subtool that marks the layers that introduce the most error first.
- Added
- Added a new tool:
inspect
to inspect supported files- Added
model
which displays information about models. - Added
results
which displays information about savedRunResults
- Added
- Added back
subprocess_polling_interval
toComparator.run()
, as this is still required in certain rare cases. - Optimization passes are now optional in
OnnxFromTfGraph
, and can be disabled by settingoptimize=False
in the constructor. - Runners now include an
is_active
property, which indicates whether the runner is currently activated. - Added an experimental new tool:
surgeon
, which can be used to modify ONNX models more easily than using ONNX-GraphSurgeon.- Added
prepare
andoperate
which can be used to modify an ONNX model using a JSON configuration. - Added
extract
which can extract ONNX subgraphs with a single command.
- Added
- Added
--onnx-outputs
and--trt-outputs
to set outputs in the corresponding loaders - Added a passthrough loader,
LoadPlugins
, that can wrap any other loader, and load plugins
EngineFromNetwork
will no longer free the the builder, network and parser if they are provided directly (as opposed to via a loader).TrtRunner
will no longer free the the engine if it is provided directly (as opposed to via a loader).- All file saving arguments now take file paths instead of directories. This makes it easier to know exactly where each file is being written.
compare_func
inComparator.compare_accuracy
now accepts a function that returns anything convertible to a boolean, rather than requiring a boolean.basic_compare_func
now will return information about required tolerances afterComparator.compare_accuracy
.Calibrator
can now be configured to inherit from a different TensorRT calibrator base class.- ONNX GraphSurgeon is no longer required to mark outputs in ONNX models.
TrtLegacyRunner
no longer depends onpycuda
TrtRunner
will now only reset context shapes if the shapes changed. This should improve performance.DataLoader
now takesint_range
andfloat_range
parameters, so min/max can be provided more concisely.- All
Loaders
andRunner
were renamed to better reflect their purpose, and to improve readability. - Renamed
warm_up_runs
towarm_up
. Calibrator
'sdata_loader
parameter now accepts any generator or iterable instead of requiring a special type.Comparator.run
'sdata_loader
parameter now accepts any generator or iterable instead of requiring a special type.- The included
DataLoader
can now be used as an iterable, and its iteration length can be controlled via theiterations
parameter. - Renamed
--input-shape
to--inputs
- Renamed
--min-shape
/--opt-shape
/--max-shape
to--trt-min-shapes
/--trt-opt-shapes
/--trt-max-shapes
DataLoader
now accepts aninput_metadata
parameter which can be used to override shapes and data types.- Split off
layerwise
andoutputs
functionality into separateModify
loaders. - Split off artifact saving functionality into separate
Save
loaders. - Renamed
--read
options to--load
, and--write
to--save
- Renamed
--read-outputs
/--write-outputs
to--load-results
/--save-results
Calibrator
no longer requiresinput_metadata
to be set if the data loader does not need itTfRunner
now uses aCreateConfig
loader to supply configuration.TfRunner
andOnnxrtRunner
now take aBuildSession
, so that custom sessions can be used.
- Removed iteration arguments from
Comparator.run()
andCalibrator
. Instead these now iterate the provided data loader until it runs out of data. - Removed
--load-engine
option frompolygraphy
. Engines can now be provided as models directly, e.g.polygraphy run example.engine --trt
polygraphy_exec
andpolygraphy_gen
were removed. They are superseded by therun
subtool ofpolygraphy
.--layerwise
andlayerwise
options have been removed. Layerwise behavior is now possible withoutputs=constants.MARK_ALL
or--<framework>-outputs mark all
- Fixed bugs in
Comparator.validate
that would cause it not to correctly display non-finite values. Calibrator
will now warn if a cache exists but is emptyDataLoader
will now used a fixed seed value unless otherwise specified. This ensures consistent run-to-run behavior.- The default
find_output_func
will no longer compare outputs whose names don't match if there is another output that does match. - Fixed a bug where custom names provided to runners would still be suffixed with a timestamp.
- Fixed a bug where regular TensorRT calibrators could not be used with
CreateConfig
- The missing subtool warning will no longer be displayed if that subtool is not being used.
basic_compare_func
now accepts afind_output_func
parameter, allowing users to control which outputs are compared between results.- The
--load-outputs
argument can now accept multiple different files. Outputs from each of these will be read in order. - Added an implicit batch ONNX network loader for the legacy TensorRT runner. This will not work with recent versions of the parser.
- Added
RunResults
class which replaces theOrderedDict
thatComparator.run
previously returned (structure is unchanged).
layerwise
mode will no longer mark constants as outputs.- The default
compare_func
inComparator.compare_accuracy
will now always iterate over the output names in the firstIterationResult
and attempt to find them in the second. The order of theIterationResult
s provided to this function can be modified either by settingcomparisons
inComparator.compare_accuracy
, or changing the order of runners inComparator.run
- Improves
polygraphy_gen
output formatting - Renamed
RunResult
toIterationResult
to better reflect its purpose. - Default runner names now include timestamps to disambiguate when saving and loading multiple runners.
graphsurgeon
is no longer a dependency of Polygraphy
- Logger settings in
polygraphy_exec
/polygraphy_gen
are now set prior to any logging output. - Comparator will no longer attempt to decompress all
bytes
objects sent over the queue when using subprocesses
- Added
OnnxExtWeightsNetworkLoader
to support loading ONNX models with externally stored weights into TensorRT. - Added a
TensorMetadata
class to replace dictionaries that were used across Polygraphy. - Added
CaffeNetworkLoader
for theTrtLegacyRunner
polygraphy_exec
andpolygraphy_gen
will no longer use subprocesses by default. To revert to the old behavior, the--use-subprocess
flag must now be explicitly provided.SerializedEngineLoader
now accepts abuffer_loader
, so that a function that loads a serialized engine may be provided instead of the serialized engine itself.- Default opset for
OnnxFromTfGraph
has been updated to11
polygraphy_exec
andpolygraphy_gen
now correctly handle cases where no model file is provided
- Added a
PolygraphyException
class to serve as a base class for exceptions raised by Polygraphy.
ConfigLoader
now accepts a list ofProfile
s to support multiple optimization profiles.- Changed the format of CLI shapes arguments to enable specifying multiple profiles.
- Moves
outputs
argument from TfRunner to the tensorflow loaders.
- Polygraphy now includes a thin
ctypes
wrapper around the CUDA runtime library, accessible inutil/cuda.py
TrtRunner
no longer depends onpycuda
, and instead uses the included CUDA wrapper.- Loader parameters may now be loaders themselves, or the result of invoking a loader.
- Improves the quality of Comparator messages when there are mismatches
basic_compare_func
will now preserve output ordering in the results.- Makes
EngineFromNetwork
compatible with TensorRT 7.0
- Restructures ONNX Runner, and adds layerwise functionality (using ONNX-GraphSurgeon).
- Added
--timestamp
and--line-info
options topolygraphy_exec
to enable logging of timestamp and line numbers respectively. - Added
--no-letter
option to disable severity letter prefixes in log messages - Added
register_callback
to Logger, which registers a callback that will be called whenever the severity changes. - Added
Logger.verbosity()
which returns a context manager that can be used to temporarily change logging severity. - Added new variants to
--model-type
inpolygraphy_exec
:keras
,ckpt
, renamedtf
tofrozen
- Added
ConfigLoader
which can be passed toEngineFromNetwork
to customize the build configuration prior to building.
- The logger no longer displays timestamps and line numbers. These can be enabled by setting the
timestamp
/line_info
properties respectively toTrue
. - Logger now relies on the
colored
module to provide colored output polygraphy_exec
now runs runners in the order in which they were specified.- Greatly shortens import paths by removing
_runner
suffixes and shortening framework names (e.g.tensorflow_runner
->tf
) runners
submodule has been renamed tobackend
TrtRunner
has been renamed toTrtLegacyRunner
TrtRunnerV2
has been renamed toTrtRunner
polygraphy_gen
is now at parity withpolygraphy_exec
- Removed
--tftrt
as a separate runner inpolygraphy_exec
- instead it is now an option for the--tf
runner. - Removed
--tftrt-gpu-memory-fraction
and renamed--tf-gpu-memory-fraction
to--gpu-memory-fraction
inpolygraphy_exec
- Removed
--tfonnx
, and instead adds this functionality in--onnxrt
when using a TensorFlow model inpolygraphy_exec
- Removed
Experimental
argument section inpolygraphy_exec
. All functionality has now been integrated into non-experimental arguments. - Removed
preprocess_network
argument fromEngineFromNetwork
. This functionality can be achieved by wrapping the network loaders instead.
Comparator.run
will now forcefully terminate the subprocess if it does not exit on its own.
- Added TF32 support to legacy TrtLegacyRunner.
- Various improvements to automatic shape matching for cases where shapes between runners do not match exactly.
- Changed
BaseRunner
so that runners can now implementactivate()
/deactivate()
instead of__enter__()
/__exit__()
polygraphy_exec
now defaults to running just a single iteration of inference.
- The
--accuracy
flag has been removed frompolygraphy_exec
, as this is now the default behavior.
- TensorRT runners now use the same builder to build the network and engine, instead of using a separate builder for each.
- Fixes a bug in
try_match_shape
- Added a
tf32
parameter as well as--tf32
flag for TensorRT. - Added support for
dim_param
in ONNX.
fp16_mode
andint8_mode
parameters have been renamed tofp16
andint8
respectively.polygraphy_exec
will now use the runtime shapes specified rather than always usingOPT
shapes from the TensorRT profile.- Improves shape matching logic in
DataLoaderCache
- Added a
start_index
andend_index
toComparator.run
to make it easy to skip over inputs from the data loader. - Added
CompareFunc
to provide built-in comparison functions. - Added
PostprocessFunc
to provide built-in post-processing functions. Comparator.compare_accuracy
now returns anAccuracyResult
object, which contains much more information about the results of the comparisons.- Added
percentage()
function toAccuracyResult
to provide an easy way to figure out the percentage of passed iterations.
- Replaces
RunInfo
withIterationResult
. The latter only stores information about a single iteration for a single runner. compare_func
inComparator.compare_accuracy
is now aCallable(IterationResult, IterationResult) -> Dict[str, bool]
warm_up_runs
now defaults to0
, andend_index
to1
- Ordering of outputs in a single iteration is now preserved in
CompareFunc.basic_compare_func
use_subprocess
now defaults toFalse
inComparator.run()
(still defaults toTrue
inpolygraphy_exec
).Calibrator
now takes astart_index
andend_index
argument instead ofmax_items
.
- Removed
Comparator.compare
function sinceComparator.compare_accuracy
includes all of its functionality. iterations
inComparator.run
has been removed and replaced bystart_index
andend_index
- Removed
subprocess_polling_interval
argument, asComparator
can now properly detect when the subprocess terminates.
Comparator.run()
will no longer hang if there is a segfault in the subprocess.
- Added
--int-min
,--int-max
,--float-min
, and--float-max
arguments topolygraphy_exec
- Added
--explicit-precision
option topolygraphy_exec
to enable QAT models in TRT. - Added empty tensor support. Empty tensors are tensors whose shapes contain one or more 0s.
- When
--load-outputs
or--save-outputs
is specified topolygraphy_exec
,seed
will default to1
to ensure consistent inputs across runs.
- Added a
--calibration-cache
option topolygraphy_exec
to enable supplying a calibration cache - Added a
--no-color
option to disable color logging.
- Added
GraphOptimizerLoader
for freezing TensorFlow graphs and--freeze-graph
option topolygraphy_exec
. - Added
--load-outputs
and--save-outputs
topolygraphy_exec
for comparing across executions. - Added
KerasLoader
for loading models stored inhdf5
format. - Added constant folding pass to
GraphOptimizerLoader
for TensorFlow graphs.
- Updates
Calibrator
so that it will now use the opt dimension of a profile for networks with dynamic shapes. - Updates Legacy TensorRT runner to use
Loaders
for easier UFF debugging.
Calibrator
will no longer allocate buffers if a calibration cache was provided.
- Added generation of ONNX code to
polygraphy_gen
- Added default implementations of some
BaseRunner
methods. - Added
last_inference_time()
toBaseRunner
so thatinfer()
now only needs to return outputs. - Added
Calibrator
for int8 calibration, along with additional parameters toEngineFromNetwork
- Better warnings for user-defined implementations of various APIs.
DataLoaderCache
will now warn loudly when a set of inputs needs to be regenerated.- Cleans up
Comparator
run()
function. - Moves most
save_*
options into loaders rather than runners. - Changed
BaseDataLoader.next()
to take index as an argument. This way, inputs can be reliably repeated across runners. - Moves all
layerwise
parameters into loaders rather than runners. Loader
s are now interchangeable with PythonCallable
sDataLoader
s are now interchangeable with PythonCallable
s
DataLoader
no longer generates allTrue
values for boolean types.- Various bug fixes in
polygraphy_gen
DataLoaderCache
is now sent over the queue when runners are run in subprocesses. This resolves an issue where the cache was not being updated correctly.Comparator
now updates runners correctly when using a subprocess.
- Added
--no-fold-constant
option to preventOnnxFromTfGraph
from doing constant folding in the TensorFlow graph. - Added experimental
polygraphy_gen
script that enables generation of template Python scripts for running Polygraphy.
- Bug fix for cases where TensorFlow nodes with no outputs are recognized as graph outputs by
GraphSurgeon
.
- Added
name
parameter toCheckpointLoader
in case the checkpoint does not include acheckpoint
file.
TFTRTLoader
now accepts any kind of TensorFlow Graph loader
- Bug fix in
TrtRunner
Buffers
so that no-op reshapes (no reallocation) are handled correctly.
- Added
check_inf
,check_nan
, andfail_fast
options toComparator.validate()
- Cleans up
Buffers
implementation forTrtRunner
- eliminates an unnecessary copy that was happening on the host input. - Improved logic for matching output names in
util.find_in_dict()
TrtRunner
will no longer callcontext
's shape setting functions on non-dynamic inputs.
- Bug fix for volume computation for scalars.
- Updates
DataLoader
to handle scalars correctly, adds several tests.
- Added various utility functions as static members of
TrtRunner
, e.g.create_network
function to simplify TensorRT's network flags.
EngineFromNetwork
will now mark network outputs whenlayerwise=True
- Added support for
bool
outputs inComparator
- Replaces
OnnxEngineLoader
withOnnxNetworkLoader
andEngineFromNetwork
. This allows for more flexibility in building engines from TensorRT networks.
- Added
allow_growth
option to TfRunner to work aroundCUDNN_STATUS_INTERNAL_ERROR
. Whenallow_growth
is enabled, the error disappears.
DataLoaderCache
will now attempt to permute inputs in cases where shapes do not match exactly (e.g. NCHW vs NHWC inputs).
- Fixes a bug in
polygraphy_exec
which caused it to ignore user-defined profiles.
- Added support for many more ONNX data types.
- Added support for
int8
and explicit precision mode inTrtRunner
- Added
preprocess_network
parameter toOnnxEngineLoader
so that the network can be modified before it is used for building.
TrtRunner
will now attempt to generate sane default shapes in cases with dynamic shapes where no profiles are provided.
DataLoader
no longer overrides static shapes in the model, but issues a warning if an override is requested.DataLoader
now accepts shape tensor inputs in itsdefault_shapes
parameter.
- Added timestamps to logging output.
Comparator
can now catch segfaults in runners properly.
- Added options for
DataLoader
to be able to specify input bounds - Added smarter matching for input metadata in the
DataLoaderCache
- Default
subprocess_polling_interval
is now 30 seconds. Comparator
now attempts to partially match output names when no exact matches are found.
- Added
subprocess_timeout
parameter toComparator.run
to prevent hangs when a subprocess does not terminate. - Added
subprocess_polling_interval
parameter to allow the process to be polled so that failing processes can be terminated before the fullsubprocess_timeout
.
- If ONNX checker fails due to the IR version of the model being too new, Polygraphy now ignores the error and continues.
OnnxEngineLoader
now accepts anonnx_loader
for better flexibility in loading models.polygraphy_exec
now supports running TF models in TRT via the tf2onnx converter.- Legacy
TrtLegacyRunner
now only supports UFF models.
- Added
BaseModelLoader
that can be used to load models. This allows for reuse of existing runners with different import paths. For example,OnnxrtRunner
can be used withOnnxFromTfGraph
in order to run a TensorFlow frozen graph via ONNX Runtime. - Implements
ModelLoader
s forTfRunner
, including a frozen model loader, checkpoint loader, and TF-TRT loader.
OnnxFromTfGraph
now accepts a TensorFlow ModelLoader to support a wider variety of input formats.- Updates legacy
TrtLegacyRunner
to useget_input_metadata
API, so it is usable for UFF models.
- Comparator will now look at the union of all outputs from all runners when checking for common outputs.
TrtRunner
will no longer mark layers within the loop body as network outputs inlayerwise
mode.DataLoaderCache
now falls back to reusing inputs based on order if names do not match exactly.DataLoader
now accepts adefault_shapes
parameter to override dynamic shapes.
- Added
get_input_metadata
API to BaseRunner. Overhauls runners so they no longer need to handle dynamic input shapes individually. - Added
DataLoader
class which can be used to feed data to the Comparator. - Added
DataLoaderCache
so that the data loader does not have to load inputs multiple times for each runner.
Comparator.compare_accuracy
now fails if no outputs were compared.
- Removed support for implicit batch ONNX models in
TrtLegacyRunner
. You should useTrtRunner
for ONNX models instead.
- Removed
python2
support.
- Bug fixes for TensorFlow Graphs
- Bug fixes for
polygraphy_exec
when using legacyTrtLegacyRunner
- Bug fixes for
TrtRunner
for cases with multiple outputs
- Added support for compression during communication between the runner subprocesses and the main
Comparator
process. This is becausePipe
s andQueue
s can only send objects smaller than 2GB. - Added timeouts to reduce the possibility of hangs in runners.
- Added
--fail-fast
option topolygraphy_exec
and correspondingfail_fast
option toComparator.compare()
. Useful for determining the first layer at which two models diverge. - Added
TrtRunner
that can be used to run TRT networks with dynamic shapes. Currently only supports ONNX.
- Runners no longer need to specify inputs up front - they can now be specified after
__enter__
is called. This greatly simplifies much of the logic in several runners. RunInfo
no longer contains data about the inputs used.TFOnnxrtRunner
now accepts an opset option when converting graphs to ONNX.
- All runner files are now suffixed with
_runner
to disambiguate them from system packages. - Fixes an issue that prevent EXTRA_VERBOSE logging output from TRT from being displayed.
- Added a
--uff-order
option in case the automatically determined order is wrong. - Added an experimental
--build-only
option topolygraphy_exec
- Comparator will now attempt to permute outputs with mismatched shapes when
check_shapes
is disabled. - Lowers the default GPU memory fraction, as TensorFlow has OOM issues when it is set too high.
- Added
TFOnnxrtRunner
and--tfonnx
option topolygraphy_exec
- Added
OnnxrtRunner
and movesTFOnnxrtRunner
intoonnx_runner.py
. - Added
--save-onnx
option forOnnxrtRunner
- Changed
--onnx
polygraphy_exec
option toonnxtf
to disambiguate from--onnxrt
- Added
CNTKRunner
and--cntk
option topolygraphy_exec
- Changed default shape value to 1. This is the value that is set when no input dimension is specified.
- Added support for loading TF checkpoints.
- Added support for overriding automatically determined outputs in the TF and TF-TRT runners. Added
--tf-outputs
argument topolygraphy_exec
- Fixes input shape mismatches between ONNX-RT and TF.
- Added
--plugins
option topolygraphy_exec
for loading TRT plugins.
- Added a function in comparator to perform output validation, and a corresponding flag in
polygraphy_exec
. - Runners now use OrderedDict for outputs, meaning that the ordering of the outputs will match the order of the layers in the network in most cases.
- Improved TensorFlow output tensor deduction by excluding certain ops that cannot behave like outputs in TensorFlow.
- Version information is now logged at INFO logging severity.
- Removed prepare_inputs/prepare_outputs functions. Instead, runners now do timing on their own in the infer function.
- Changed runner inputs to use dictionaries that map input names to their numpy buffers.
polygraphy_exec
will no longer fail if the extension for the model file is unrecognized.- Added
fp16_mode
option to TfRunner for TF-TRT.
- Added an option to limit TensorFlow GPU memory usage
- Added an option to specify minimum segment size to TF-TRT.
- Added an option to write out engine(s) from the TF-TRT graph.
polygraphy_exec
now exits when unknown arguments are encountered- Improves timestamps to be human-readable instead of using seconds from epoch.
- Added support for dynamic ops in TF-TRT
- Added an option to write out tensorboard visualizations.
- Added an option for enabling XLA in the TensorFlow runner.
- Added nicer error messages on failed TF-TRT imports
- If a TensorFlow graph specifies a dynamic shape, Polygraphy now automatically populates it with concrete values.
- Added argument groups and moves some unstable arguments to Experimental section.
- Polygraphy will now refuse to write artifacts to the disk if a file already exists wherever it can detect such cases.
polygraphy_exec
now emits warnings when unknown command line parameters are used.- Added capability to write out TensorFlow timelines.
- Changed --save* options to accept directory names instead, and the resulting files are timestamped and named based on the runner name.
- Changed command-line parameters to use dashes instead of underscore.
- Modifies TrtLegacyRunner to pass along input order to UFF, instead of permuting the order to CHW.
- Comparator now prints runner output in the same order in which they were specified.
- Added per-inference-inputs command-line arguments for running multiple comparisons.
- Seed is now displayed correctly during Comparator.run().
- User-friendly Comparator output - now suggests command-line flags to get what you were looking for.
- Added layerwise comparison support for TrtLegacyRunner and TfRunner.
- Renamed to TRT Polygraphy.
- Overhauled README.md
- Modified project structure - created runners, comparator, and logger submodules.
- polygraphy_exec now uses batch size specified by model if none is specified by the user.
- Added framework dependencies to setup.py
- TrtLegacyRunner now displays ONNX parsing errors and exits early on parsing failures.
- Initial integration