Release Apache TVM v0.8 Release Note · apache/tvm

Overview
Accepted RFCs
Features and Improvements

Overview

Apache TVM v0.8 brings several major exciting experimental features, including:

PaddlePaddle frontend
TVMScript: round-trippable python-based syntax for TIR
TorchScript integration
TensorIR scheduling language
TensorRT and CUTLASS integration via BYOC
Int4 TensorCore support in AutoTVM
MicroTVM Project API and Zephyr, Arduino support
AOT executor
Robust Windows support
Affine analysis infra: iter-affine-map
Improved Vulkan backend
CUDA graph support in TVM runtime

Besides, The community has been working together to refactor and evolve the existing infrastructure, including but not limited to:

Relay compilation engine
Relay pattern language
CI and build process
Refactoring documentation and tutorials
Stablizing AutoScheduler
Stablizing TVMC command line driver interface
Stablizing target system
Frontend coverage, quantization, dynamic shape, training

Full changelog: https://gist.github.com/junrushao1994/c669905dbc41edc2e691316df49d8562.

Accepted RFCs

The community has adopted a formal RFC process. Below is a list of the formal RFCs accepted by the community since then:

[RFC-0005] Meta schedule (AutoTIR)
[RFC-0006] Automatic mixed-precision pass and support
[RFC-0007] Parametrized unit tests
[RFC-0008] MicroTVM Project API
[RFC-0009] Unified static memory planner
[RFC-0010] Target-registered compiler flow customisation
[RFC-0011] Arm® Ethos-U integration
[RFC-0014] Pipeline executor
[RFC-0015] Use CMSIS-NN with TVM
[RFC-0019] Add PaddlePaddle frontend
[RFC-0020] Extend metadata in project option
[RFC-0022] TIR non-scalar constants
[RFC-0023] Adding annotation field to tir.allocate nodes
[RFC-0025] PyTorchTVM
[RFC-0027] Formalize TVM documentation organization
[RFC-0028] Command line composition from internal registry
[RFC-0029] Migrating target attributes to IRModule
[RFC-0030] Command line configuration files
[RFC-0031] C Device API
[RFC-0036] TVMScript namespace
[RFC-0041] Update TVMScript block syntax

Features and Improvements

TE, TIR, TVMScript

TVMScript parser and printer #7630 #9115 #9286
Scheduleable TIR (S-TIR) infrastructure, analysis and lowering passes #7553 #7765 #7847 #8114 #8121 #7873 #7923 #7962 #7848 #8044 #7806
S-TIR schedule primitives: compute-inline, reverse-compute-inline, fuse, split, rfactor, storage-align, vectorize, unroll, bind, reorder, cache-read, cache-write, compute-at, reverse-compute-at, decompose-reduction #8170 #8467 #8544 #8693 #8716 #8767 #8863 #8943 #9041
While loop in TIR #7425 #9004
Metaprogramming in S-TIR via specialize #8354
Support Return value in TIR #7084 #7932
Storage scope support in PointerType #8017 #8366 #8463
Creation of S-TIR via TE compute #7987

AutoTVM, AutoScheduler, Meta Schedule

PopenPoolExecutor is used to replace python native library to provide better multiprocessing support as well as enable auto-tuning in Jupyter notebooks for AutoTVM and AutoScheduler #6959 #8492 #8913 #8820 #8851
AutoScheduler improvement and stabilization: task scheduler, layout rewrite, early stopping, dispatching #6945 #6750 #6987 #7156 #8862 #8995 #7571 #7376 #7377 #7344 #7185
AutoScheduler support for sparse workloads #7313 #7635 #8065
AutoScheduler support for Vulkan, ROCm, Mali #7626 #7038 #7132
AutoTVM support for int4 TensorCore #7831 #8402
Meta Schedule core infrastructure, builder runner and database #8615 #8623 #8642 #8817 #9079 #9132 #9154 #9053 #9059 #9044 #9111 #9061 #9153

Operator Coverage

Operators for Int-8 vision transformer on GPU #7814
Optimizing NMS and ROI-related kernel on GPU #7257 #7172 #7136 #7796 #7463 #6516 #7440 #7666 #8174
Support and optimize sparse operators #8605 #7477 #7435 #6889 #6580 #8437
Sort-related operators and optimization #9184 #7669 #8672 #7611 #7195 #7056 #6978
Support for einsum operator #6370
Matmul, dense operators and their optimization #8921 #8527 #8234 #8250 #6616 #8229 #8401 #7404 #8669
Convolution and pooling operators and their optimization #8620 #8936 #8584 #7075 #7142 #7515 #6999 #6899 #6840 #6137 #6802 #6445 #6711 #6714 #8167 #8222 #8275 #8276 #8422 #8430 #6687 #7928 #8897
Scatter and gather operators and their optimization #8479 #7600 #7044 #7464 #7233 #6533 #6856 #6854 #7927 #8105
Prefix scan, cumsum and cumprod #7722 #7303 #7314 #7334 #7123 #6868
Dynamic shape and shape functions #7414 #6979 #6912 #6898 #6373 #8068 #7490 #7487
Miscellaneous improvement. Operators including: reshape, resize, pad, PRNG, transpose, where, softmax, concat, nll_loss, space_to_batch_nd, batch_to_space_nd, slice_like; Libraries including thrust, cuDNN, cuBLAS, MIOpen; Improving schedules for generic reduction and softmax. #8592 #7375 #7287 #7184 #7131 #7086 #7083 #8030 #6851 #6477 #8346 #6759 #8028 #8056 #8369 #7468 #7458 #7194 #8138 #8543

Training

Relay AutoDiff #7677 #8318
TE AutoDiff #7321
Gradient operators #7685 #7340 #6767 #8307 #7357 #6827

Relay

Pattern language and mixed-mode visitor: matching more IR constructs, fuzzy matching; converting more passes to non-recursive. #8843 #7754 #7355 #7332 #7282 #7151 #7120 #6958 #7507 #8325 #8774 #7817 #7374 #6695 #6704
Improving or adding passes including ExtractOperators, SimplifyExpr, DynamicToStatic, DefuseOps, ConvertLayout, FoldConstant. Added a set of utilities that allows a model to be run efficiently on TensorCores #9253 #9245 #8996 #7827 #9034 #7807 #8755 #7731 #7368 #7603 #7656 #7423 #7354 #6946 #6748 #6720 #6776 #7835 #7895 #8205
TECompiler and refactoring of compilation workflow #9103 #8974 #8886 #8802 #8501 #8526 #8486 #8597 #7518 #7552 #8914 #9130
Quantization and automatic-mixed precision #8883 #8810 #8644 #7613 #8069 #8341 #8126 #8460
Parser, printer and diagnostic #7347 #6274 #6692 #8352 #8000

MicroTVM, AOT, Graph Executor and VM

Pipeline Executor #8702 #9108
CUDA graph integration in graph executor #7616
Enable add set_output_zero_copy in graph executor #8497
VM: memory allocation improvement, shape function improvement and misc #7746 #7451 #7413 #7210 #8040 #6938 #8661 #7676 #8285
AOT compilation and execution #8697 #7785 #8014 #8023 #8096 #8075
Project API infrastructure: #8380 #8963 #8708 #8019
MicroTVM, Zephyr, Arduino RVM, AutoTVM support #9320 #8941 #7804 #7786 #7449 #7891 #7915 #8055 #8037 #8386 #8519 #8748 8154 #8945 #8624 #8701 #7723 #8715 #7225 #6964 #7813 #7528
The pure C runtime (CRT) #7398 #7333 #7095 #7225
Model library format #8270 #8072 #7938

Arithmetic Analysis

Tighter bounds and more simplification on cast #6771 #7045
Introducing iterator (quasi-) affine map detection #6667 #7752 #7759
Inverse of iterator affine map #8384 #8427
Subspace division in iterator affine map #7760

Frontends

PaddlePaddle initial support #8645 #9124 #9126 #9295 #9370 #9236 #9283
ONNX support, including better handling of control flow, coverage of more operators, better dynamic shape support, more tests. #9265 #9178 #9146 #8894 #8966 #8967 #7818 #9000 #9001 #9066 #9028 #9002 #8985 #9019 #9017 #8972 #7802 #7800 #7781 #8919 #9054 #8906 #8933 #8959 #8907 #7771 #8923 #8924 #7755 #7720 #8773 #8872 #7655 #8741 #7633 #8781 #8866 #8867 #7522 #7519 #7489 #7438 #7429 #7364 #7300 #7259 #7243 #7237 #7208 #7189 #7115 #7109 #7089 #7036 #7031 #6839 #6351 #7842 #7844 #6646 #6647 #6681 #6700 #7883 #6726 #6730 #7899 #7900 #7906 #7934 #7956 #8007 #8011 #8084 #8099 #8189 #8191 #8304 #8321 #8337 #8356 #8385 #8502 #8426 #8440 #8456 #8475 #7391 #7394 #8621 #8322 #8323 #8435 #8436 #8455 #7353 #7215
TensorFlow and TFLite, including more operators, better TensorArray support and quantization #9404 #9256 #8689 #7789 #7736 #8763 #8647 #8648 #8558 #8780 #8538 #7659 #7639 #7531 #7520 #7502 #7496 #7473 #7452 #7442 #7441 #7400 #7320 #7293 #7267 #7159 #7148 #7114 #7113 #7093 #7074 #7048 #7030 #6998 #6984 #6970 #6949 #6933 #6918 #6901 #6885 #6849 #5767 #6589 #6670 #6674 #6675 #7866 #6685 #7885 #6729 #7901 #6774 #6783 #6799 #7951 #8024 #8051 #8060 #8074 #8142 #8179 #8251 #8277 #8335 #8364 #8375 #8431 #8454 #6818 #8483 #9099 #9165
PyTorch: more operators including activations, inplace operators, RNNs, NMS #9371 #9204 #9185 #9135 #9133 #9015 #8839 #8718 #8699 #8692 #7712 #8753 #7694 #8583 #7675 #7646 #7606 #7592 #7569 #7544 #7549 #7535 #7517 #7465 #7397 #7371 #7348 #7346 #7325 #7231 #7174 #7154 #7137 #7134 #7133 #7128 #7088 #7023 #6900 #6602 #7845 #6659 #6740 #6782 #6784 #7958 #8192 #8397 #8398 #8403 #8447 #6829
MXNet support. More operators and NLP model coverage in GluonNLP #7568 #7409 #7209 #7191 #7062 #6561 #6699
Misc: CoreML, Keras, DarkNet, etc. #7667 #6676 #6651 #6963 #7949 #7035 #7446 #8562 #8599

Codegen Backends and Runtime

LLVM backend: recover LLVM support on windows; support target feature strings in function attributes; atomic support in NVPTX, ROCm; LLVM compatibility to LLVM 12+ #9305 #9223 #9138 #8860 #8958 #6763 #6698 #6717 #6738 #8293 #6907 #7051
ROCm 3.9 bitcode files search #6865
Vulkan and SPIR-V refactoring and major improvement in codegen and runtime. A critical bug fix in SPIRV codegen allows the Vulkan backend to produce correct outputs on more hardwares and drivers. Added support for querying device specific hardware parameters and capabilities, dynamic shapes, irregular ops such as sorting and NMS, UBO, fp16, and vectorization. We can now run complicated models like MaskRCNN on Vulkan end to end. #8904 #7833 #7717 #7681 #8746 #8813 #7609 #8882 #7607 #7591 #7574 #7572 #7833 #6662 #7969 #8013 #8048 #8098 #8102 #8107 #8127 #8151 #8196 #8320 #8588 #8332 #8333 #8348 #8528
Metal language version upgrade (MTLLanguageVersion2_3), better codegen support, int64 support, various bug fixes #7830 #7819 #7714 #7118 #7116 #7105 #7980 #8054 #8175 #8202 #8206 #8313
OpenCL, VTA, Verilator: refactored code generator, better error messages, various bug fixes #7834 #7777 #7761 #7100 #6125 #6126 #6191 #7834 #8256 #8257 #8731 #8756 #8973
CUDA: enable __launch_bounds__, dynamic shared memory, TensorCore, BF16, half2, NVCC version upgrade #9341 #8678 #7561 #7273 #7146 #7147 #7099 #7065 #7033 #7014 #7907 #7964 #9087 #8135 #8137 #8457 #8466 #8571
ARM: CMSIS-NN, Ethos-N #8653 #7628 #8951 #7506 #7443 #7858 #6982 #8795 #8806 #8833 #9147 #9159 #9160 #9162 #9163 #9167 #9209 #9386 #9387
Hexagon: build, compilation, model launcher, more target options and better runtime #7784 #6718 #8821 #8822 #9033 #8823 #8859 #8865 #8915 #8954 #9024 #9025 #8960 #8986 #9010 #9011 #9189 #9220 #9355 #9356
WASM: Update support for latest emcc, add ffi test. #6751

BYOC Integration with Vendor Libraries: TensorRT, ACL, VitisAI

TensorRT initial integration, stabilization, int8 calibration, dynamism support #6395 #7702 #7595 #7581 #7412 #7372 #9047 #8073 #8808 #6905 #7967 #8005 #8172 #8461 #8506 #8607 #7205 #7026 #7016 #7011 #6955 #6872 #7253 #6805 #9324
Arm Compute Library (ACL) integration #7649 #7206 #6532 #7121 #6724 #8149 #7251 #9396
Verilator integration #7406 #7351 #7286 #8094
VitisAI integration #6343 #7350
BYOC infrastructure enhancement: improving control flow, AnnotateTarget, custom codegen #6641 #6655 #6697 #6786 #7977 #8464

TVMC

MacOS support #8396
AutoScheduler support #7070
Support cross compiler options #7922
Python scripting #7823 #7698
More flexible input specification #7366 #7788
More options, --disable-pass and --config #7816 #8253
Allow passing optional arguments to importers #7674
Model library format (MLF) support #8086 #8331
More backend and library support: metal, ACL, Vulkan, OpenCL, ROCm, Vitis AI #8282 #7508 #8359 #6831 #8896 #7577
Support for the new target system #7651 #7654 #6788 #7304 #6855

Rust Binding

Rust bindings installable via Cargo #7503 #6678 #8631 #8665
Initial support for diagnostic interface #6656
Fixes for using Python APIs from Rust #7085
Improve NDArray, GraphRt, Relay, IRModule, Array, Attrs bindings #6563 #6741 #7138 #8353 #7082
Improve error handling, error messages and fix memory leaks #8289 #6815 #8714 #8725

Misc

Enhanced CPP-RPC implementation: allow user supplied work dir, support of CPP-RPC server for Apple, support adb-shell style CPP-RPC #7670 #8224 #8223 #7766 #7013
Use PopenWorker to handle RPC system: #7889 #7757 #7961
Fold target host into target #7462 #7791 #7534 #8835
Target-based intrinsic lowering and legalization #7936 #7809
Add target tags for all existing CUDA GPU models #7410
Linear Congruential Random Engine #8642

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Apache TVM v0.8 Release Note