Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Force a gc between sphinx-gallery items to reclaim GPU memory. #8722

Merged
merged 1 commit into from
Aug 11, 2021

Conversation

mbs-octoml
Copy link
Contributor

@mbs-octoml mbs-octoml commented Aug 11, 2021

GPU memory is only released once the PackedFunc for evaling the model is gced
by Python. In CI we're noticing intermittent 'CUDA: Out of memory' failures
while processing the tutorials, and tracing showed there was no gc happening
between items. Not confident this will solve the problem but worth a try.

GPU memory is only released once the PackedFunc for evaling the model is gced
by Python. In CI we're noticing intermittent 'CUDA: Out of memory' failures
while processing the tutorials, and tracing showed there was no gc happening
between items. Not confident this will solve the problem but worth a try.
Copy link
Contributor

@areusch areusch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @mbs-octoml ! the other idea i had was explicitly deleting the module from sys.modules, if it's true that sphinx-gallery is importing it. seems like it does something very weird, though, and it's likely the top-level ref is removed: https://github.com/sphinx-gallery/sphinx-gallery/blob/master/sphinx_gallery/gen_rst.py#L501

@mbs-octoml
Copy link
Contributor Author

the other idea i had was explicitly deleting the module from sys.modules, if it's true that sphinx-gallery is importing it. seems like it does something very weird, though, and it's likely the top-level ref is removed: https://github.com/sphinx-gallery/sphinx-gallery/blob/master/sphinx_gallery/gen_rst.py#L501

Yuk. At least on my runs it's not outright holding onto everything and gc indeed does do the right thing. So I thought trying the obvious dumb thing first would be worthwhile before we get deeper into the labyrinth.

@tqchen tqchen merged commit 2bc0ece into apache:main Aug 11, 2021
mbs-octoml added a commit to mbs-octoml/mbs-tvm that referenced this pull request Aug 11, 2021
  https://discuss.tvm.apache.org/t/rfc-relay-tecompiler-rewrite-existing-compile-engine-to-match-updated-compiler-flow/9233
This gets about halfway there for the Interpreter:

* Remove direct access to TECompiler from interpreter, and instead call
  tec::LowerTEExpr when 'preparing' a module and expression for evaluation.
* Make clear there's no phase distinction between create_interpreter and
  evaluate on the Python side -- both must be prepared together as a single IRModule.
* But in return make sure the result of evaluate on the Python side is a packed func
  ready to directly apply 'simple' arguments to an already interpreted closure.
* The interpreter builds and caches primitive TIR functions (and their corresponding
  dynamic shape functions) as packed funcs as they are encountered.
* Cleanup uses of interpreter for constant folding on the C++ side.

Future work:
* Fold LoweredModule into IRModule so tec::LowerTEExpr is just another pass.
* Get rid of the implicit caching of lowered functions in TECompiler.
* Make calling convention from Relay to TIR explicit, and remove all the function
  attribute hackery currently needed so the interpreter can correctly invoke lowered
  functions as it encounters them.
* Make TECompiler private. Though could do this now it will make migrating the VM and
  AOT uses of CompilerEngine harder.

Force a gc between sphinx-gallery items to reclaim GPU memory. (apache#8722)

GPU memory is only released once the PackedFunc for evaling the model is gced
by Python. In CI we're noticing intermittent 'CUDA: Out of memory' failures
while processing the tutorials, and tracing showed there was no gc happening
between items. Not confident this will solve the problem but worth a try.
@mbs-octoml mbs-octoml deleted the mbs-sphinx-gc branch August 11, 2021 22:04
@tqchen
Copy link
Member

tqchen commented Aug 12, 2021

Some followup context

appears that sphinx-gallery did recycle global variables, i also did a quick exp to confirm that

so the memory contention might due to the global retention of packages that we use(likely Tensorflow or pytorch who might retain its internal memory pool)

Adding "tensorflow", "torch" to reset_modules might be the right solution

mbs-octoml added a commit to mbs-octoml/mbs-tvm that referenced this pull request Aug 13, 2021
  https://discuss.tvm.apache.org/t/rfc-relay-tecompiler-rewrite-existing-compile-engine-to-match-updated-compiler-flow/9233
This gets about halfway there for the Interpreter:

* Remove direct access to TECompiler from interpreter, and instead call
  tec::LowerTEExpr when 'preparing' a module and expression for evaluation.
* Make clear there's no phase distinction between create_interpreter and
  evaluate on the Python side -- both must be prepared together as a single IRModule.
* But in return make sure the result of evaluate on the Python side is a packed func
  ready to directly apply 'simple' arguments to an already interpreted closure.
* The interpreter builds and caches primitive TIR functions (and their corresponding
  dynamic shape functions) as packed funcs as they are encountered.
* Cleanup uses of interpreter for constant folding on the C++ side.

Future work:
* Fold LoweredModule into IRModule so tec::LowerTEExpr is just another pass.
* Get rid of the implicit caching of lowered functions in TECompiler.
* Make calling convention from Relay to TIR explicit, and remove all the function
  attribute hackery currently needed so the interpreter can correctly invoke lowered
  functions as it encounters them.
* Make TECompiler private. Though could do this now it will make migrating the VM and
  AOT uses of CompilerEngine harder.

Force a gc between sphinx-gallery items to reclaim GPU memory. (apache#8722)

GPU memory is only released once the PackedFunc for evaling the model is gced
by Python. In CI we're noticing intermittent 'CUDA: Out of memory' failures
while processing the tutorials, and tracing showed there was no gc happening
between items. Not confident this will solve the problem but worth a try.
mbs-octoml added a commit to mbs-octoml/mbs-tvm that referenced this pull request Aug 16, 2021
  https://discuss.tvm.apache.org/t/rfc-relay-tecompiler-rewrite-existing-compile-engine-to-match-updated-compiler-flow/9233
This gets about halfway there for the Interpreter:

* Remove direct access to TECompiler from interpreter, and instead call
  tec::LowerTEExpr when 'preparing' a module and expression for evaluation.
* Make clear there's no phase distinction between create_interpreter and
  evaluate on the Python side -- both must be prepared together as a single IRModule.
* But in return make sure the result of evaluate on the Python side is a packed func
  ready to directly apply 'simple' arguments to an already interpreted closure.
* The interpreter builds and caches primitive TIR functions (and their corresponding
  dynamic shape functions) as packed funcs as they are encountered.
* Cleanup uses of interpreter for constant folding on the C++ side.

Future work:
* Fold LoweredModule into IRModule so tec::LowerTEExpr is just another pass.
* Get rid of the implicit caching of lowered functions in TECompiler.
* Make calling convention from Relay to TIR explicit, and remove all the function
  attribute hackery currently needed so the interpreter can correctly invoke lowered
  functions as it encounters them.
* Make TECompiler private. Though could do this now it will make migrating the VM and
  AOT uses of CompilerEngine harder.

Force a gc between sphinx-gallery items to reclaim GPU memory. (apache#8722)

GPU memory is only released once the PackedFunc for evaling the model is gced
by Python. In CI we're noticing intermittent 'CUDA: Out of memory' failures
while processing the tutorials, and tracing showed there was no gc happening
between items. Not confident this will solve the problem but worth a try.
mbs-octoml added a commit to mbs-octoml/mbs-tvm that referenced this pull request Aug 17, 2021
  https://discuss.tvm.apache.org/t/rfc-relay-tecompiler-rewrite-existing-compile-engine-to-match-updated-compiler-flow/9233
This gets about halfway there for the Interpreter:

* Remove direct access to TECompiler from interpreter, and instead call
  tec::LowerTEExpr when 'preparing' a module and expression for evaluation.
* Make clear there's no phase distinction between create_interpreter and
  evaluate on the Python side -- both must be prepared together as a single IRModule.
* But in return make sure the result of evaluate on the Python side is a packed func
  ready to directly apply 'simple' arguments to an already interpreted closure.
* The interpreter builds and caches primitive TIR functions (and their corresponding
  dynamic shape functions) as packed funcs as they are encountered.
* Cleanup uses of interpreter for constant folding on the C++ side.

Future work:
* Fold LoweredModule into IRModule so tec::LowerTEExpr is just another pass.
* Get rid of the implicit caching of lowered functions in TECompiler.
* Make calling convention from Relay to TIR explicit, and remove all the function
  attribute hackery currently needed so the interpreter can correctly invoke lowered
  functions as it encounters them.
* Make TECompiler private. Though could do this now it will make migrating the VM and
  AOT uses of CompilerEngine harder.

Force a gc between sphinx-gallery items to reclaim GPU memory. (apache#8722)

GPU memory is only released once the PackedFunc for evaling the model is gced
by Python. In CI we're noticing intermittent 'CUDA: Out of memory' failures
while processing the tutorials, and tracing showed there was no gc happening
between items. Not confident this will solve the problem but worth a try.
jroesch pushed a commit that referenced this pull request Aug 17, 2021
…rewrite. (#8597)

* This continues the work outlined in the RFC
  https://discuss.tvm.apache.org/t/rfc-relay-tecompiler-rewrite-existing-compile-engine-to-match-updated-compiler-flow/9233
This gets about halfway there for the Interpreter:

* Remove direct access to TECompiler from interpreter, and instead call
  tec::LowerTEExpr when 'preparing' a module and expression for evaluation.
* Make clear there's no phase distinction between create_interpreter and
  evaluate on the Python side -- both must be prepared together as a single IRModule.
* But in return make sure the result of evaluate on the Python side is a packed func
  ready to directly apply 'simple' arguments to an already interpreted closure.
* The interpreter builds and caches primitive TIR functions (and their corresponding
  dynamic shape functions) as packed funcs as they are encountered.
* Cleanup uses of interpreter for constant folding on the C++ side.

Future work:
* Fold LoweredModule into IRModule so tec::LowerTEExpr is just another pass.
* Get rid of the implicit caching of lowered functions in TECompiler.
* Make calling convention from Relay to TIR explicit, and remove all the function
  attribute hackery currently needed so the interpreter can correctly invoke lowered
  functions as it encounters them.
* Make TECompiler private. Though could do this now it will make migrating the VM and
  AOT uses of CompilerEngine harder.

Force a gc between sphinx-gallery items to reclaim GPU memory. (#8722)

GPU memory is only released once the PackedFunc for evaling the model is gced
by Python. In CI we're noticing intermittent 'CUDA: Out of memory' failures
while processing the tutorials, and tracing showed there was no gc happening
between items. Not confident this will solve the problem but worth a try.

* Get rid of logs spam.
chiwwang added a commit to chiwwang/tvm that referenced this pull request Aug 29, 2021
commit 2545e9caecadd66c72fbb6734c30d100e823b0fb
Author: Josh Fromm <[email protected]>
Date:   Sat Aug 28 12:59:20 2021 -0700

    [Frontend][Onnx] Simplify onnx input since name accesses are not reliable. (#8867)

    * Simplify onnx input since name accesses are no longer supported.

    * move Celu importer.

commit 0961b65cbf0d6e1c5f51e0e88dd17886d6111522
Author: Jiawei Liu <[email protected]>
Date:   Sat Aug 28 04:28:07 2021 -0500

    [Tutorial][Executor] Fix the usage of executors in tutorials (#8586)

    * fix: executor usage for keras tutorial

    * fix: executor usage for onnx tutorial

    * [Tutorial][Executor] Fix executors in tutorials

commit 5ab527a71f7eb1d352db1408b225c79a21945c94
Author: AndrewZhaoLuo <[email protected]>
Date:   Sat Aug 28 02:24:16 2021 -0700

    [Autoscheduler] Configurable workload keys (#8862)

    * change workload keys

    * remove binary string comparison

    * append the tuple not every integer

    * clean up

    * lint

    * dump workload keys to dags

    * fix things

    * change some strings

    * misc fixes, add tests

    * jostle ci

commit 7214f5239dbb8da4585d4d10fbc8c65c8f155b12
Author: Siyuan Feng <[email protected]>
Date:   Sat Aug 28 17:23:43 2021 +0800

    [TIR] Fix opaque access in buffer locator pass and match_buffer in region detector (#8855)

    * init

    * fix

    * Update src/tir/transforms/plan_update_buffer_allocation_location.cc

    Co-authored-by: Ruihang Lai <[email protected]>

    * Update src/tir/transforms/plan_update_buffer_allocation_location.cc

    Co-authored-by: Ruihang Lai <[email protected]>

    * address

    Co-authored-by: Junru Shao <[email protected]>
    Co-authored-by: Ruihang Lai <[email protected]>

commit 1df6c273f0fb1242d0b399614616635cef38bc15
Author: Yizhi Liu <[email protected]>
Date:   Fri Aug 27 19:33:58 2021 -0700

    [Community] @mdw-octoml -> Reviewer (#8868)

commit f188a4fb11971c9bfce9b059fd2b9dacdbe1a0d1
Author: Christopher Sidebottom <[email protected]>
Date:   Fri Aug 27 23:29:42 2021 +0100

    Remove AOT Executor header from Arduino project (#8857)

commit 3c86eec10ff8ced914db2af5873dfa91b76e5523
Author: Lunderberg <[email protected]>
Date:   Fri Aug 27 16:43:49 2021 -0500

    [UnitTests] Expose TVM pytest helpers as plugin (#8532)

    * [UnitTests] Expose TVM pytest helpers as plugin

    Previously, pytest helper utilities such as automatic parametrization
    of `target`/`dev`, or `tvm.testing.parameter` were only available for
    tests within the `${TVM_HOME}/tests` directory.  This PR extracts the
    helper utilities into an importable plugin, which can be used in
    external tests (e.g. one-off debugging).

    * [UnitTests] Refactor the plugin-specific logic out into plugin.py.

    * [UnitTests] Moved marker definition out to global variable.

commit 55bb8b60b707d5fc25c3828adf6086aa01bcc039
Author: Tristan Konolige <[email protected]>
Date:   Fri Aug 27 14:39:03 2021 -0700

    [Graph Executor, VM] Add end to end benchmarking of models (#8858)

    Add benchmarking that includes ovearhead of transfering inputs and
    outputs to and from the device. This should give an accurate measurement
    of the runtime a user would see when using the model. This is
    accomplished by adding functions that run from inputs to return values
    into the graph executor and the VM.

commit cf19c889214ca9a1b8c420baff35aa10986b3d9c
Author: Krzysztof Parzyszek <[email protected]>
Date:   Fri Aug 27 05:22:47 2021 -0500

    [Hexagon] Change declaration order of unique_ptr objects to fix crash (#8859)

    A crash occurs when automatically deleting an instance of
    CodeGenHexagon because the LLVMContext object has already been
    freed. Objects of both types are created using unique_ptr, but
    the object managed by the LLVMContext unique_ptr is passed to
    CodeGenHexagon object (not as a unique_ptr).

    This crash is fixed by moving the declaration of the LLVMContext
    object before the CodeGenHexagon object. I'm not sure if this
    is the best way to fix this, but it does fix the crash. Also,
    in other files, the LLVMContext object is always created first.

    Co-authored-by: Cahoon, Brendon <[email protected]>

commit 3306857d80bfc76cdd10d7a40028f52b7ca696aa
Author: Swift.Sun <[email protected]>
Date:   Fri Aug 27 17:28:50 2021 +0800

    [Runtime] add set_output_zero_copy (#8497)

    * Update graph_executor.h

    * Update graph_executor.cc

    * modify zero copy UT add set input zero copy

    * modify C style

    * add runtime test

    * realy build  generatr the json

    Co-authored-by: hwstaff <[email protected]>

commit e774fed67c2d12e6cfc29a013f029d4b55c28e2a
Author: Jason <[email protected]>
Date:   Fri Aug 27 15:34:05 2021 +0800

    Add a PaddlePaddle Frontend (#8645)

    * fix some problems for matmul

    * fix some problems for matmul

    * add alpha parameter for matmul

    * remove unnecessary condition

    * add TranslatedLayer which support model loaded by jit.load

    * add mul operator support

    * Add padding mode support for conv/pool2d

    * support 4 two-tuples

    * add paddle test case

    * add paddle conv2d  case

    * update test_forward.py

    * fix paddle convert_matmul

    * add paddle multiply and matmul op test case

    * add test case and fix bug

    * delete import pandas

    * add paddlepaddle tests

    * modify the variable name of convert_reshape

    * formatting

    * formatting

    * use black to format python code

    * pylint check

    * Remove fluid api

    * black format

    Co-authored-by: root <[email protected]>
    Co-authored-by: wjj19950828 <[email protected]>
    Co-authored-by: heliqi <[email protected]>
    Co-authored-by: Junru Shao <[email protected]>

commit 9d168822f2950083a59be243cb35ad51888dbc5d
Author: Christopher Sidebottom <[email protected]>
Date:   Fri Aug 27 06:04:09 2021 +0100

    Change AOT from ExprVisitor to MixedModeVisitor (#8856)

    This should allow better scale-ability for AOT when targeting larger networks.

commit b4b194dbb0db1f152740bbb84cab96721482e2cf
Author: Valery Chernov <[email protected]>
Date:   Fri Aug 27 05:48:25 2021 +0300

    extend repeat_interleave op for relay.Expr (#8839)

    Co-authored-by: Valery Chernov <[email protected]>

commit 227bf7ffafec7a2cff543a8a22f3741f45970b8d
Author: Tantalus13A98B5F <[email protected]>
Date:   Thu Aug 26 21:18:34 2021 -0400

    [TOPI] [Relay] Sparse Conv2d Implementation for 3x3 kernels (#8605)

    * [topi] add spconv2d_3x3 nhwc

    * [relay] sparse_conv2d: add kernel_size attr

    * [relay] add strategy for spconv2d_3x3 nhwc

    * [relay] pass to convert spconv2d with const args

    * [relay] convert sparse conv2d pass fixes

    * use array for sparse conv2d attr

    * fixup 1x1 tests; new 3x3 tests

commit f4f525dab86af653636bce95ce3609288fbaa587
Author: masahi <[email protected]>
Date:   Fri Aug 27 07:16:54 2021 +0900

    [AMP] Disallow fp16 conversion for summation-like ops (#8810)

    * [AMP] Disallow fp16 conversion for summation-like ops

    * test only structural equality

commit 3d81489a2656214e93c6ea983e82c55b310cd28b
Author: Mehrdad Hessar <[email protected]>
Date:   Thu Aug 26 23:24:51 2021 +0200

    move rust lint script (#8726)

commit 423958fd8fdf1a2bd8d45d604135054953c5c73b
Author: Chenfan <[email protected]>
Date:   Fri Aug 27 03:33:37 2021 +0800

    [FIX] Bug fix for a floormod rewrite simplify rule (#8852)

    * Update rewrite_simplify.cc

    * Update test_arith_rewrite_simplify.py

    * Update test_arith_rewrite_simplify.py

    * Update test_arith_rewrite_simplify.py

commit 04bdd32281c4ae50d086e4469fd6a9ee6f0c93b6
Author: Anton Sorokin <[email protected]>
Date:   Thu Aug 26 10:21:18 2021 -0700

    VTA cmake change to include Verilator header for building tsim library (#8797)

    * VTA cmake file require Verilator include for tsim target. VTA module.cc uses svOpenArrayHandle to send wide data through DPI

    * Refactor Verialtor check conditions

    * Build TSIM only for CPU target. CPU target don't use -Werror to compile with Verilator. Jenkinsfile to have tvm_multilib_tsim defined for CPU build target.

    * remove build/libvta_tsim.so from non tsim targeting builds

    * Revert to enable TSIM build i386. Revert to -Werror in CPU config. Remove verilator CPP objects from cmake config for tsim and put them as include into vta module.cc to avoid Verilator compilation warnings

commit 4fd1bf4e512aafc0bea0b809789cd27f8dd944d4
Author: Mehrdad Hessar <[email protected]>
Date:   Thu Aug 26 19:08:15 2021 +0200

    update gpu and cpu (#8853)

commit d263c6d4300170cc6cf7f58b923edcb23b5a7791
Author: Elen Kalda <[email protected]>
Date:   Thu Aug 26 18:06:23 2021 +0100

    [Pattern matching] Add an option to rewrite the graph only once (#8843)

    * [Pattern matching] Add an option to rewrite the graph only once

    If the graph returned from the callback consists of the original
    pattern, the rewriter will run in the loop, which is not always desired.
    So this patch proposes an option to run the rewriter only once.

    Change-Id: I85cf0a055b8961d52394f21c1e4d7aad0a7e1d06

    * Make rewrite_once default to false

    Change-Id: Idf6f01f254c403158883681e75c2a5978efbd2d0

commit 3f777d555f1b1a125b0f7f83291d1d8693ffa6be
Author: Krzysztof Parzyszek <[email protected]>
Date:   Thu Aug 26 09:08:17 2021 -0500

    [Hexagon] Rework tvm.target.hexagon() interface (#8823)

    * [Hexagon] Rework tvm.target.hexagon() interface

    Make the tvm.target.hexagon() function take most options as keyword
    parameters. This will allow adding additional parameters without changing
    the interface.

    No changes are required to existing code, except for changing positional
    parameters following the CPU version to keyword parameters, and updating
    the names of the keyword parameters:
      sim_args  -> sim_options,
      llvm_args -> llvm_options,
    although the old names will be accepted for the time being.

    * formatting

    * change ' to "

    * Rename 'args' to 'config' for clarity

    * Use 'strip' instad of 'replace'

    * Restart build

commit bca57cb1e74fe946c2db3d24fe5042b74da9fea7
Author: AndrewZhaoLuo <[email protected]>
Date:   Thu Aug 26 04:23:28 2021 -0700

    [AMP] Bump up tolerance on flaky test (#8850)

    * bumpy up tol

    * bumped tolerance up even more

    * jostle ci

commit 98a3476bfc7428f592ad0fd6b8c863b5fd5ec1f9
Author: Mehrdad Hessar <[email protected]>
Date:   Thu Aug 26 13:22:43 2021 +0200

    [microtvm][Zephyr] Increase timeout to fix flaky tests (#8846)

    * increase timeout

    * trigger

commit 349157641b17882fcf944409fba79c7300978a77
Author: Ashutosh Parkhi <[email protected]>
Date:   Thu Aug 26 11:16:45 2021 +0100

    Support for CMSIS-NN in Corstone300 Makefile (#8831)

    Change-Id: Ifc2305db4e11d1d15d45407287f8f0bea469100a

commit 46f8b61bd3f9f3351104a0bb9934afe3bafa2c28
Author: Anastasia Stulova <[email protected]>
Date:   Thu Aug 26 10:03:06 2021 +0100

    [Relay][TOPI] Support of depthwise conv2d NHWC for Mali/Bifrost. (#8584)

    * [Relay][TOPI] Support of depthwise conv2d NHWC for Mali/Bifrost.

    Added initial tunable autotvm templates for depthwise conv2d with
    NHWC layout for Mali and Bifrost.

    * [Relay][TOPI] Misc fixes for depthwise conv2d Mali/Bifrost.

    - Fix assert for Bifrost.
    - Set reasonable default axis splits to avoid using tophub for NHWC.
    - Fixed typo: arm cpu -> Mali.

    * [Relay][TOPI] Fixed formatting in depthwise conv2d Mali/Bifrost.

commit d80528db0becfc471acd1e7cda122f8283117627
Author: Christopher Sidebottom <[email protected]>
Date:   Thu Aug 26 09:54:01 2021 +0100

    Apply CPPLint to CRT Tests (#8844)

    This one was a bit trickier as there was more usage of dynamic arrays and less safe casts. I've tried to minimise the changes to just those required to passing linting.

commit f1ca91d4e401096d04e962c982d62b1f2669c9f5
Author: Tristan Konolige <[email protected]>
Date:   Wed Aug 25 18:25:29 2021 -0700

    [GRAPH EXECUTOR,VM] Add benchmarking function to graph executor and vm (#8807)

    * [GRAPH EXECUTOR,VM] Add benchmarking function to graph executor and vm

    This new benchmarking function is just a convenience function for
    calling time_evaluator on the underlying module. Hopefully this should
    make it easier for users to get good benchmarks of their code.

    * formatting

    * import order

    * more test, more comments, more precision

    * fix tests

    * add seconds descriptions to doc

commit 0648fffc9b6fddd27dc04a91ebac9cccd780b3b3
Author: Jiawei Liu <[email protected]>
Date:   Wed Aug 25 17:34:10 2021 -0500

    [BUG] ToBasicBlockNormalForm immutability (#8778)

    * ToBasicBlockNormalForm immutability

    * better comment on ToBasicBlock

    * refine comment of ToBasicBlockForm

commit 4a9b5b5cf597418c8bdbf2e0fcb8ac8cf24f0d07
Author: Mehrdad Hessar <[email protected]>
Date:   Wed Aug 25 22:58:40 2021 +0200

    Update CI Lint Image Version (#8841)

    * Update CI Lint Image Version

    * trigger

commit 5a6b75dcac8c2be4e3e9751844794bd74762eb51
Author: Michalis Papadimitriou <[email protected]>
Date:   Wed Aug 25 21:41:10 2021 +0300

    [Pre-commit] Add pre-commit configuration to perform minimal checks locally (#8382)

    * [Pre-commit] Add pre-commit hook configuration file

    * [Pre-commit] Add header to configuratin file

    * [Pre-commit] Add basic configuration instructions

    * [Pre-commit] Extend pre-commit pipelines with C++ linting

    * [pre-commit] Add example usage comment for pre-commit hooks

    * [CI] Add in docker linting script mypy step

    * [CI] Use lint docker image for pre-commit checks

    * [CI][pre-commit] Minor cleanups on docker runners of pre-commit lints

commit a31ebf75350946f09b2965576ba276e526f8d4ac
Author: Krzysztof Parzyszek <[email protected]>
Date:   Wed Aug 25 11:46:19 2021 -0500

    [Hexagon] Reuse Hexagon SDK analysis across cmake files (#8822)

    * [Hexagon] Reuse Hexagon SDK analysis across cmake files

    Different versions of the Hexagon SDK may have different directory
    structures. Extract the directory identification code into a separate
    cmake module. Use that module in Hexagon.cmake and in the cmake file
    for the FastRPC libraries.

    * Don't modify CMAKE_SHARED_LINKER_FLAGS, instead set target properties

    * Add quotes around ${...}

    * Add USE_HEXAGON_ARCH variable to cmake configuration

    * Restart build

commit 977bdbdf772f6149554d43ae9073ce58c8e36a38
Author: Christopher Sidebottom <[email protected]>
Date:   Wed Aug 25 16:54:40 2021 +0100

    Force CMake targets in top-level Makefile to run (#8840)

    This is a bug I introduced in https://github.com/apache/tvm/pull/8809, because the built binary is now named `build/cpptest` when `make` checks that artifact it finds it exists already and skips running `make -C build cpptest`. This ensures all nested `make` calls are forced to run from the top-level `Makefile`.

commit b8193646fa9f97fc3476b5275d8ce8b0270408a3
Author: Valery Chernov <[email protected]>
Date:   Wed Aug 25 10:48:14 2021 +0300

    [Frontend] [Torch] [ONNX] GRU layer (#8781)

    * GRU cell was implemented in common.py. GRU was supported on pytorch frontend side

    * update GRU in common.py and onnx frontend

    * fix issue related to GRU accuracy in pytorch and ONNX frontend

    * small fixes and remove excess

    * common GRU was additionaly updated. tuned pytorch GRU was strongly accelerated

    * GRU cell in ONNX frontend was used from common.py. previous implementation was removed

    * small fixes in comments

    * fixes after review. GRU test was implemented for pytorch frontend

    * tests for RNN layers was unified for pytorch frontend

    Co-authored-by: Valery Chernov <[email protected]>

commit 02b57a6be3044b25ed997774760bcec649768696
Author: Matt Welsh (OctoML) <[email protected]>
Date:   Tue Aug 24 20:02:45 2021 -0700

    Update CONTRIBUTORS.md (#8837)

    TVM is no longer in the Apache Incubator; moving mentors to the end of the doc.

commit 7ae8f898a768956d6c27ac9054ab52e8ad9ead73
Author: Cody Yu <[email protected]>
Date:   Tue Aug 24 16:24:20 2021 -0700

    [Community] @Lunderberg -> Reviewer (#8834)

commit 2859c20391a2eb377f9ae28d4980193b5d1685a1
Author: Xiyou Zhou <[email protected]>
Date:   Tue Aug 24 15:52:03 2021 -0700

    [M3a][Meta Schedule] Add Sampling Primitive SampleCategorical. (#8817)

    Co-authored-by: Junru Shao <[email protected]>
    Co-authored-by: Wuwei Lin <[email protected]>
    Co-authored-by: Ruihang Lai <[email protected]>
    Co-authored-by: Hongyi Jin <[email protected]>
    Co-authored-by: Siyuan Feng <[email protected]>
    Co-authored-by: Bohan Hou <[email protected]>

commit 44a1d1f81d1d651e914750c58299079f205cbd5c
Author: Christopher Sidebottom <[email protected]>
Date:   Tue Aug 24 23:20:03 2021 +0100

    Better reflect allocator names in CRT tests (#8828)

    When the AOT executor was introduced, the Stack Allocator was associated
    with it by test name whereas the Page Allocator was left as just
    memory_test.cc. This cleans that up a bit to clarify which tests which
    allocator.

commit 5ada91bbf42f7a7351198994e654c6423d8eddb6
Author: Matt Welsh (OctoML) <[email protected]>
Date:   Tue Aug 24 14:59:02 2021 -0700

    Add link to docs and tutorials in the README. (#8832)

    Most project pages on GitHub have a README.md file with a clear link to installation or tutorial material for new users.
    While there is a link to Documentation, it's not that obvious, and adding a more explicit "getting started" link may be
    helpful for new TVM users trying to navigate the project.

commit fe0bd12f776842b157d6ba994ae85ff3d15ee884
Author: Krzysztof Parzyszek <[email protected]>
Date:   Tue Aug 24 16:30:16 2021 -0500

    [Hexagon] Remove uses of LLVM from simulator runtime (#8821)

    * [Hexagon] Remove uses of LLVM from simulator runtime

    The TVM runtime is not linked with LLVM libraries, so using LLVM
    in it carries a risk of referencing undefined symbols. This may
    work for objects defined in header files, but it then relies on
    LLVM keeping them there.

    Replace uses of LLVM utilities in the Hexagon simulator runtime,
    with simple alternatives.

    * clang-format

    * Use dmlc::optional instead of implementing one from scratch

    Make detail::Optional be derived from dmlc::optional, and add some bits
    to make it behave more like the C++17's std::optional. The goal is to
    replace detail::Optional with std::optional, once the project switches
    to C++17.

commit b77a7d4fc672a4af9858e17c92c6c731d8657249
Author: Christopher Sidebottom <[email protected]>
Date:   Tue Aug 24 21:45:52 2021 +0100

    Apply CPPLint to C++ Unit Tests (#8827)

    This change enables `cpplint` for the tests in `tests/cpp` and corrects any current linting errors. I had to use `NOLINT` in some of the PackedFunc tests due to a bug (see: https://github.com/cpplint/cpplint/issues/131) in CPPLint where `int(int)` is picked up as a cast rather than a nameless argument.

commit 2c6dccb6694ae8d1bd8d8ed752cf1bc8f1fccf1f
Author: Christopher Sidebottom <[email protected]>
Date:   Tue Aug 24 18:20:02 2021 +0100

    Correct function signatures for CreateXPass functions in docs (#8829)

commit 596ff5cff576d8ded1251bb936eb7198551cc481
Author: Yuanjing Shi <[email protected]>
Date:   Tue Aug 24 09:40:30 2021 -0700

    [AutoTVM] Use PopenPool in XGBoostCostModel (#8820)

    * replacd multiprocessing.Pool with PopenPoolExecutor

    * add initializer func

    * static init func

    * address comments

    * linting

    * fix tests

    * address comments

commit 64a7eb674623c658342ec95283323a0d5ae4ca73
Author: Chenfan <[email protected]>
Date:   Tue Aug 24 21:33:36 2021 +0800

    [FLAKY] A small bug fix on the CmakeLists (#8826)

commit eabae30059cf914bbe7fa2ad46d381ff662267b9
Author: Jared Roesch <[email protected]>
Date:   Mon Aug 23 21:11:24 2021 -0700

    [Rust] Fix memory leak #2 (#8725)

    * Add C++ API for computing type key from type index

    * Try and isolate leak

    * Rewrite the bindings to fix the ArgValue lifetime issue

    There are still quite a few issues left to resolve in this patch, but I believe the runtime
    changes stablize memory consumption as long as the parameters are only set once. ByteArray
    also has some totally broken unsafe code which I am unsure of how it was introduced.

    * Finish handling tvm-rt issues due to ArgValue lifetime

    This patch further refactors the bindings to better handle the
    lifetime issues introduced by detecting the argument memory leak.

    * WIP memory leak

    * There is issue using TVMCb function which is breaking refcount

    * Fix fallout from the lifetime refactor

    * Another tweak

    * Follow up work from the memory leak, attempt to clean up ByteArray

    * Add some todos for future work

    * Fix doc string

    * Clean up the changes

    * Format

commit e883dcba2e2529d4dcf23169a7c72494b0b5b60b
Author: Christopher Sidebottom <[email protected]>
Date:   Tue Aug 24 04:59:45 2021 +0100

    Run AOT tests against reference system (#8744)

    * Run AOT tests against reference system

    This introduces an alternative way of running AOT tests using the reference system added in https://github.com/apache/tvm/pull/8514. This gives us additional assurance that the AOT output runs successfully on embedded platforms in our core test suite.

    I've also changed calculate_workspace_sizes to debug_workspace_sizes and default to False in most cases as it only needs to be True for a few cases to check theoutput with the debug flag - this was discovered trying to allocate 16MB in an embedded test :scream_cat:

    Co-authored-by: Grant Watson <[email protected]>

    * Skip AOT reference system tests in i386 container

    * Add comment clarifying the reference system runner

    Co-authored-by: Grant Watson <[email protected]>

commit 4524567775a1e963e656bc740d36166460d45b66
Author: Gustavo Romero <[email protected]>
Date:   Tue Aug 24 00:56:50 2021 -0300

    Remove unnecessary memset in TVMMutableFuncRegistry initialization (#8818)

    Remove unnecessary memset() call in TVMMutableFuncRegistry_Create()
    when initializing a TVMMutableFuncRegistry struct. All struct members
    (registry.names, registry.funcs, and max_functions) are already
    initialized properly before returning, hence some CPU cycles might be
    saved (usually 12 bytes in a 32-bit platform and 24 bytes in a 64-bit
    platform must be written with 0 by memset()).

    Signed-off-by: Gustavo Romero <[email protected]>

commit dfe21c5f208496560a405123ef27ca7e747bb7c4
Author: Gustavo Romero <[email protected]>
Date:   Tue Aug 24 00:56:35 2021 -0300

    Remove unused allocated memory in crt initialization (#8819)

    Currently TVMInitializeRuntime() allocates 250 bytes dynamically to back
    buffer 'func_registry_memory' which is never used. That is not much in
    general but besides being twice the current necessary amount for the
    runtime (allocated to back 'registry_backing_memory' buffer) that amount
    can be important to be saved on memory-constrained devices (microTVM).

    This commit removes the 'func_registry_memory' buffer which is allocated
    dynamically in TVMInitializeRuntime() since it occupies 250 bytes and is
    never used.

    Signed-off-by: Gustavo Romero <[email protected]>

commit b1396a0f159298a286091b018090369cd8dcff53
Author: Tristan Konolige <[email protected]>
Date:   Mon Aug 23 20:56:12 2021 -0700

    [FIX] Remove leftover instances of USE_GRAPH_EXECUTOR_DEBUG (#8796)

    * [FIX] Remove leftover instances of USE_GRAPH_EXECUTOR_DEBUG

    single flag, USE_PROFILER. This PR cleans up the last few remaining uses
    of USE_GRAPH_EXECUTOR_DEBUG.

    * formatting

    * Update CMakeLists.txt

    Co-authored-by: Cody Yu <[email protected]>

    Co-authored-by: Cody Yu <[email protected]>

commit d1f19c470c16a1ca87c67fd93f30dd59e16bbec1
Author: Lily Orth-Smith <[email protected]>
Date:   Mon Aug 23 20:12:07 2021 -0700

    Add LowerTEPass, and convert calls to LowerTE to application of LowerTEPass (#8802)

    * Initial commit

    Initial stab at IRModule -> LoweredModule conversion func, notes

    Add external_mods and main_func_info to conversion funcs

    MTest lowered module to ir module

    fix problem with conversion funcs + print stmts

    Add LowerTE pass

    Add pLowerTEPass

    AAdd LowerTEPass to graph_executor_codegen.cc

    Use LowerTEPass instead of LowerTe in graph_executor_codegen.cc

    Code cleanup

    Add docs, more cleanup

    Formatting

    * Fix bad rebase

    * Address 1st round of comments

    * Use tir kTarget instead of relay one

    * Change target string to Target obj

    * removing target string causing issues

    * Fix typos

    * Revert target str -> target obj changes

    * Don't use Update : IRModule because it is broken

    * Fix check

    * flaky test?

    * lint

commit 356879d4c35eba5d561665003e38a553cd410ff3
Author: Christopher Sidebottom <[email protected]>
Date:   Tue Aug 24 01:08:18 2021 +0100

    Use CTest for C++ tests (#8809)

    By using the `gtest_discover_tests` CMake macro the CPP and CRT tests can be configured to build binaries with a single test runner each. Once CTest has information about tests it can be used in IDE extensions such as [CMake Test Explorer](https://marketplace.visualstudio.com/items?itemName=fredericbonnet.cmake-test-adapter).

    `ctest` can also run tests in parallel using the `-j` flag, which could be interesting in future.

commit 607bb9492a3fc05451ba58d0bc538f0c3ef7d7e1
Author: Lunderberg <[email protected]>
Date:   Mon Aug 23 17:11:30 2021 -0500

    [Vulkan] Remote target.h #include (#8813)

    Was added in #8127, should have been removed in #8171 along with the
    rest of the references outside of libtvm_runtime.so.  This didn't
    impact the Vulkan+g++ builds, because no symbols were accessed outside
    of the runtime library.  However, it broke the Vulkan+Windows builds,
    which expected symbols due to the `__declspec(dllexport)` defintion of
    `TVM_DLL` on MSVC (see #8805).  This wasn't caught by the CI build on
    Windows, because it doesn't perform the Vulkan build.

commit aafc2d5efd2ab2ea914da6c291e82a18f5ad6e14
Author: Christopher Sidebottom <[email protected]>
Date:   Mon Aug 23 20:53:42 2021 +0100

    Remove duplicated PackedFunc C++ test (#8812)

    I came across this file whilst looking at the C++ tests and realised it's a
    duplicate of the PackedFunc tests which doesn't get invoked.

    ```
    $ diff -u tests/cpp/contrib/bnns.cc tests/cpp/packed_func_test.cc
    --- tests/cpp/contrib/bnns.cc   2021-07-30 12:59:33.830443830 +0000
    +++ tests/cpp/packed_func_test.cc       2021-08-23 12:47:43.193708421 +0000
    @@ -17,6 +17,13 @@
      * under the License.
      */

    +#include <dmlc/logging.h>
    +#include <gtest/gtest.h>
    +#include <tvm/runtime/packed_func.h>
    +#include <tvm/runtime/registry.h>
    +#include <tvm/tir/expr.h>
    +#include <tvm/tir/transform.h>
    +
     TEST(PackedFunc, Basic) {
       using namespace tvm;
       using namespace tvm::tir;
    ```

commit 10fca9c620f93d912a2c1ac28c1859896c20d436
Author: Gustavo Romero <[email protected]>
Date:   Mon Aug 23 06:42:52 2021 -0300

    [microTVM] Fix platform name for qemu_x86 in Zephyr AOT tests (#8762)

    Currently two Zephyr AOT tests (test_tflite and test_qemu_make_fail) are
    not running when qemu_x86 platform is selected because the platform name
    is wrongly listed as 'host' in the match list for not skipping these
    tests. This commit fixes it.

    Signed-off-by: Gustavo Romero <[email protected]>

commit 0a1eae87fd551a5fb67c739eabb45226fa870af1
Author: Hongyi Jin <[email protected]>
Date:   Mon Aug 23 13:30:08 2021 +0800

    [TensorIR][M2a] Reorder (#8767)

    This PR is part of the TensorIR upstreaming effort (#7527), which adds a schedule primitive: reorder.

    Co-authored-by: Siyuan Feng <[email protected]>
    Co-authored-by: Bohan Hou <[email protected]>
    Co-authored-by: Ruihang Lai <[email protected]>
    Co-authored-by: Wuwei Lin <[email protected]>
    Co-authored-by: Junru Shao <[email protected]>

commit d6d6367b0105bc5f155a963ea11e59c7913a2cf0
Author: Gustavo Romero <[email protected]>
Date:   Sun Aug 22 02:13:08 2021 -0300

    Fix typos (#8787)

    Fix a couple of typos in comments about the IR/AST node reflection code
    and a typo in a comment about the main member of the TVMModule struct.

    Signed-off-by: Gustavo Romero <[email protected]>

commit fc9f5823e826a32f4ce5f620c39a2d9b9ed01d80
Author: wrongtest <[email protected]>
Date:   Sun Aug 22 06:10:18 2021 +0800

    [TIR] Support fold constants in specialize process (#8803)

    * support fold constants in specialize

    * replace Substitue() with VisitExpr() in specializer.

commit 4b9881ec50008bc14fc1ae7805413544cf962011
Author: Yuan-Chuan-YUE <[email protected]>
Date:   Sun Aug 22 05:42:21 2021 +0800

    [CODEGEN][OpenCL]: fix tir.erf codegen to opencl directly (#8756)

    * register tir.erf to lower opencl directly

    * add opencl codegen unit test

    * change erf opencl codegen unit test for checking there is erf in the source not erff

commit c6f62aafc91e2600ed7772597fd4238c924c2a1b
Author: Chris Sullivan <[email protected]>
Date:   Fri Aug 20 16:08:07 2021 -0700

    [Texture support][Part 1] TIR lowering and OpenCL support (#7686)

    * Add support for kTexture storage rank.

    * Add scaffolding for texture_flatten pass.

    * Add scaffolding for texture allocation.

    * Implement 2d texture flattening to builtin tir.text2d_alloca.

    * Lower BufferStore/Load to builtin texture store/load.

    * Add vectorizable attribure to texture load and store.

    * Support auto-vectorization on the innermost (RGBA) axis.

    * Add read/write_imagef opencl codegen for builtin texture load/store.

    * Add TextureType support.

    * Add InferTextureAccess pass to deduce __read_only
    and __write_only access qualifiers for texture vars.
    Also refactor use of restrict keyword to be var dependent.

    * Implement texture allocation as external function in TIR lowering.

    * Remove commented lines.

    * Add nd->2d texture flattening.

    * Bug fixes in opencl codegen (row<>col, access quals.)

    * Improve texture codegen by explicitly allocating local vector
    for the texture load. Also support indexing individual elements
    of the RGBA vector.

    * Remove automatic vectorization
    code as it is no longer needed.

    * Improve SSA local use when storing texture read to scalar buffer.

    * Define texture flattening convention such
    that the outer Nd-1 axes are stored as rows,
    and the last axis is stored as columns.

    * Add tir lowering and opencl codegen support for float16 textures.

    * Disable SSA when texture load is immediately casted.

    * Allow RGBA extent to be of length 1.

    * Add pass to forward externally allocated textures
    in place of textures realized from cache_read. Fix
    to better follow indexing spec.

    * Add buffer_common.h to house buffer offset simplification routines.

    * More refactor and clean up in texture lowering.

    * Add IsTextureType to tir and allow buffer
    var type annotation to be TextureType in addition
    to PointerType.

    * Bug fix in texture access qualifier inference pass

    * Step toward handling external texture buffer forwarding
    when external buffer is not stored directly to cache_read realized buffer.
    For example when it is conditionally stored via an IfThenElse node when
    padding is used.

    * [Part 2/3] Support texture:weight lowering convention for externally provided
    texture buffers. Need to propagate this to allocated textures when
    cache_read(texture) is used for weights.

    * Bug fix in texture access qualifier inference pass

    * Tighten constraint on external buffer forwarding --
    cache_read(texture) cancellation -- to avoid incorrect
    programs. Currently only forward through if_then_else node
    and direct external loads. For if_then_else, still need
    proper analysis of structural equality between buffers
    and access patterns to determine if an external buffer
    can replace the texture buffer realized via cache_read.

    * Use texture lowering convention from texture runtime util.

    * Use updated texture lowering utilities

    * Use inherited visitor overloads in texture flattener.

    * Add check in codegen for float/half until
    read/write_image codegen supports other types.

    * Rename tir texture builtins

    * Remove codegen and tir runtime dependence on for TVMBackendAlloc/FreeTexture.

    * Dispatch texture allocas via target specialized tir.tvm_call_packed

    * Remove kTexture scope and use kGlobal with texture tag.

    * Remove TextureType.

    * Remove TextureType from OpenCL codegen.

    * Remove TextureType from TIR lowering.

    * Remove dependency on MergeMulMod.

    * Revert "Add buffer_common.h to house buffer offset simplification routines."

    This reverts commit 027628259229aaee051dbf1dfbed4e63ef820544.

    * Prune include list

    * Add more documentation to texture flattening.

    * Add TextureFlatten transform to refactored tvm lower API.

    * Apply clang formatting.

    * Blacken python APIs.

    * Apply cpplint changes.

    * Attempt to extract storage scope from pointer scope.

    * Remove ExternalBufferForwarding (cache_read cancellation) for now.

    * Apply MyPy.

    * Clang format

    * Only visit RealizeBuffer body for texture storage.

    * Fix bad merge.

    * Utilize OpenCL preprocessor to switch between
    sampler-less and codegen provided sampler for
    texture reads depending on whether the opencl
    runtime is 2.0 compliant.

    * Add texture codegen test example.

    * Refactor tests to use pytest parameterization.

    Blacken tests.

    * Respond to CRs.

commit 18a2ee16036a4d70d7cfc572fe9807e6e2a70eda
Author: Euntaik <[email protected]>
Date:   Sat Aug 21 07:29:43 2021 +0900

    [Frontend][TFLite] Implement fake quant (#8780)

    * [Frontend][TFLite] Implement fake quant

    * remove unused variable

    * fix linting errors

    * add more tests

    * use pytest parametrize instead of a separate function

commit d722c109ee8ede4ef88f53a414385f72756928be
Author: Yuanjing Shi <[email protected]>
Date:   Fri Aug 20 07:39:10 2021 -0700

    [CONTRIB] Allow customized initializer in PopenPool (#8789)

commit e691c7f83892d7242d0992c78cec1e2f8953a9e3
Author: Siyuan Feng <[email protected]>
Date:   Fri Aug 20 15:49:31 2021 +0800

    [TIR] Fix buffer scope in structural equal (#8768)

    * fix buffer scope in structual equal

    * make global equal to empty

commit 7f237dd0de9880881a1a6d696e093224821a116c
Author: Andrey Malyshev <[email protected]>
Date:   Fri Aug 20 08:59:08 2021 +0300

    Extend tune_relay_x86 tutorial to measure default and kernel level tune (#8794)

commit 36ea17a0d15ce44fdd5d758de2717d75eac4a97e
Author: Lunderberg <[email protected]>
Date:   Thu Aug 19 20:33:43 2021 -0500

    [Docker][Vulkan] Allow Vulkan GPU access in docker container. (#8784)

    - The environment variable NVIDIA_DRIVER_CAPABILITIES must include
      "graphics" in order to expose Vulkan drivers to the container.  This
      is added both to Dockerfile.ci_gpu for future image builds, and to
      docker/bash.sh for compatibility with current images.

    - The configuration files needed by the vulkan launcher and glvnd must
      be exposed to the container.  These are only included in
      `docker/bash.sh`, as they may vary by host and so cannot be baked
      into the image.

commit 0d3e2335654fcdee731ceac57852d00482472903
Author: Lunderberg <[email protected]>
Date:   Thu Aug 19 20:32:38 2021 -0500

    [UnitTest][Flaky] Increased tolerance on onnx test_forward::test_aten (#8798)

    Default tolerance had about 2-3% failure rate (8/300 iterations), and
    caused failures on unrelated PRs
    (e.g. https://ci.tlcpack.ai/blue/organizations/jenkins/tvm/detail/PR-8784/1/pipeline#step-485-log-1156).
    New limit of `atol=5e-7` chosen to be above the maximum delta of
    3.5e-7 observed in 300 iterations.

commit 9697bfd6847a78145ffe66640a96bc157ef365b5
Author: Leandro Nunes <[email protected]>
Date:   Fri Aug 20 02:32:25 2021 +0100

    Add synr==0.3.0 dependency for Docker images and Python dependency. (#8801)

    - PR #8776 removed `synr` as a dependency to be installed in the Docker
      images, making the images to need manual intervention so that we could
      run tests.

    - Thir PR reverts synr (with current constraint as observed in
      tests/scripts/task_ci_setup.sh) to be part of the Docker image.

commit 1936609200913985f32eae42feb271b65132bd42
Author: Anastasia Stulova <[email protected]>
Date:   Thu Aug 19 17:42:09 2021 +0100

    [Android][RPC] Fix Vulkan runtime support. (#8791)

    Update Android RPC app to reflect the new
    Vulkan source code tree structure.

commit e1a0ea1eb5fe733f73aa098cb67a881ed1a6786d
Author: Gustavo Romero <[email protected]>
Date:   Thu Aug 19 10:09:01 2021 -0300

    [microTVM][RVM] Fix base-box-tool command in README.md (#8613)

    This commit fixes the platform argument order for base-box-tool.py
    'test' command in the documentation about the RVM. Currently the example
    in documentation places <platform> before option
    [--test-device-serial=<serial>], whilst the correct order is after all
    the options, so trying to use the 'test' command arguments in the order
    as suggested by the documentation will not work.

    This commit also fixes a typo (inovke -> invoke).

    Finally it tweaks a bit the text format: lines with maximum 80 columns,
    a better diagram format for the dir structure, and a better format for
    the bash commands. A link is added too for easy access to the
    "microTVM Reference VM tutorial" found in tutorials/micro directory. A
    couple of command examples were also added to the documentation.

    Signed-off-by: Gustavo Romero <[email protected]>

commit 6b7597b56374f2f5e7fd45afdf60b43353b06c03
Author: Leandro Nunes <[email protected]>
Date:   Thu Aug 19 07:48:11 2021 +0100

    [CI] Rev ci-cpu to v0.76 (#8786)

    - This includes changes up to commit 1a95f9bd0

commit 41879b2552364f094492470a77a3ec0866b30eae
Author: Chenfan <[email protected]>
Date:   Thu Aug 19 11:15:34 2021 +0800

    [FIX] Bug fix for batch_matmul parameters mismatch (#8785)

commit 87674f9ef76dbc89bac0af52a4e33155f5d4a8f8
Author: Gavin Uberti <[email protected]>
Date:   Thu Aug 19 00:03:26 2021 +0100

    [microTVM] Project API Arduino support (#8708)

    * ProjectAPI Arduino support

    * Compile and run integration tests

    * Add support for other Arduino boards

    * Unit tests for project generation

    * AOT support

    * Arduino RPC server

    * Incorporate ProjectAPI changes

    Add Arduino tests to CI

    * Copyright notices

    * Fix Andrew's PR comments

    * Additional PR comments

    PR comments and Python 3.6 support

    Linting fix

    Re-add test onnx file

    Test Arduino cli bug workaround

    Support new hardware targets

    Temporary fix for tests

    Formatting issue

    Spelling fix

    Add test case for exact FQBN matching

    * Add unit tests from apps directory to task_python_microtvm.sh

commit 70727435a0827cd6cefe0b52f4407e08b799e39a
Author: Christopher Sidebottom <[email protected]>
Date:   Wed Aug 18 22:57:37 2021 +0100

    Remove old AOT Executor code (#8758)

    * Remove old AOT Executor code

    This removes the old AOT execution functions that relied on the model descriptor which was removed in https://github.com/apache/tvm/pull/8280.

    * Remove rogue tvm_model_t from demo app

    * Remove aot_executor from demo CRT libs

commit 3f881ab15fd207490af3f04deb0813429e31e1f4
Author: kueitang <[email protected]>
Date:   Thu Aug 19 04:49:16 2021 +0800

    Expose FTVMInferCorrectLayout Python interface (#8755)

    Co-authored-by: kueitang <[email protected]>

commit 5b2e5044dc239f7e2ff0481efeb6a8441fd20c52
Author: Jaehun Ryu <[email protected]>
Date:   Thu Aug 19 00:03:51 2021 +0900

    Restore License (#8779)

commit ab019676a5964fcfaa065877c4c98719ecaabb8a
Author: Andrew Reusch <[email protected]>
Date:   Wed Aug 18 07:38:14 2021 -0700

    Rev ci-qemu to v0.08 (#8776)

    * Remove synr from pip-installed package list

     * synr is installed by task_ci_setup

    * rev ci-qemu to 0.08

commit e7748aac40bd4c263882323393ea8896837614a9
Author: Haichen Shen <[email protected]>
Date:   Tue Aug 17 23:16:54 2021 -0700

    [Relay] Extract dataflow matcher data structure into header (#8774)

    * extract dataflow matcher data structure into a header file

    * lint

    * lint

commit 843d246a8ad8dbf778b39a716c053de4768fa62b
Author: anwang2009 <[email protected]>
Date:   Tue Aug 17 23:14:13 2021 -0700

    Add onnx opset v13 support for softmax, logsoftmax (#8625)

    * add more support for softmax ops

    * noop

    * noop

commit da4869e331b6d9a94a3aef103a14b53a80704968
Author: Ruihang Lai <[email protected]>
Date:   Wed Aug 18 13:48:35 2021 +0800

    [TensorIR][M2a] Parallel, Vectorize, Bind & Unroll (#8716)

    Co-authored-by: Junru Shao <[email protected]>
    Co-authored-by: Bohan Hou <[email protected]>
    Co-authored-by: Siyuan Feng <[email protected]>
    Co-authored-by: Hongyi Jin <[email protected]>
    Co-authored-by: Wuwei Lin <[email protected]>

commit 3181f273f00c47fd07e576cff96e8df6d94d6f79
Author: Andrew Reusch <[email protected]>
Date:   Tue Aug 17 22:46:45 2021 -0700

    skip aot checks when USE_MICRO=OFF (#8772)

commit 78dbfcf7992f68748c61925c08a2b94e2e7dd463
Author: anwang2009 <[email protected]>
Date:   Tue Aug 17 22:46:17 2021 -0700

    Add support for QLinearMul ONNX op (#8773)

    * add qlinearmatmul

    * noop

    * mul not matmul

    * refactor some common qlinear op test code

commit c8a6089073ba997375b318c80674cc906df19e53
Author: Mark Shields <[email protected]>
Date:   Tue Aug 17 16:41:42 2021 -0700

    [Relay] Refactor Interpreter to treat lowering as IRModule->IRModule rewrite. (#8597)

    * This continues the work outlined in the RFC
      https://discuss.tvm.apache.org/t/rfc-relay-tecompiler-rewrite-existing-compile-engine-to-match-updated-compiler-flow/9233
    This gets about halfway there for the Interpreter:

    * Remove direct access to TECompiler from interpreter, and instead call
      tec::LowerTEExpr when 'preparing' a module and expression for evaluation.
    * Make clear there's no phase distinction between create_interpreter and
      evaluate on the Python side -- both must be prepared together as a single IRModule.
    * But in return make sure the result of evaluate on the Python side is a packed func
      ready to directly apply 'simple' arguments to an already interpreted closure.
    * The interpreter builds and caches primitive TIR functions (and their corresponding
      dynamic shape functions) as packed funcs as they are encountered.
    * Cleanup uses of interpreter for constant folding on the C++ side.

    Future work:
    * Fold LoweredModule into IRModule so tec::LowerTEExpr is just another pass.
    * Get rid of the implicit caching of lowered functions in TECompiler.
    * Make calling convention from Relay to TIR explicit, and remove all the function
      attribute hackery currently needed so the interpreter can correctly invoke lowered
      functions as it encounters them.
    * Make TECompiler private. Though could do this now it will make migrating the VM and
      AOT uses of CompilerEngine harder.

    Force a gc between sphinx-gallery items to reclaim GPU memory. (#8722)

    GPU memory is only released once the PackedFunc for evaling the model is gced
    by Python. In CI we're noticing intermittent 'CUDA: Out of memory' failures
    while processing the tutorials, and tracing showed there was no gc happening
    between items. Not confident this will solve the problem but worth a try.

    * Get rid of logs spam.

commit 4216cd7f9d06a7feb712e78892061721410cc224
Author: Leandro Nunes <[email protected]>
Date:   Tue Aug 17 23:48:26 2021 +0100

    Add params.* to Jenkins file parameters (#8771)

    * Prefix all parameters with params.* so that it checks
       whether parameters exist before using them

     * This is a follow-up fix on #8721 so that existing PRs work
       without being re-triggered manually twice

commit 4cee61a563980239fb4ef7a8548e09424b0cd3e1
Author: Jason <[email protected]>
Date:   Wed Aug 18 05:25:39 2021 +0800

    Add PaddlePaddle dependency in docker file (#8742)

commit cddb0f6d566e738a6b45be662a498fd8ac69dab7
Author: Andrew Reusch <[email protected]>
Date:   Tue Aug 17 13:36:58 2021 -0700

    Update QemuTransport#write() to match new write API contract. (#8761)

    * suspect this should fix #8278

commit e7534209429be47f18502e514cc7ccfdf20fd437
Author: Gavin Uberti <[email protected]>
Date:   Tue Aug 17 19:22:57 2021 +0100

    Fix ci-qemu Arduino install dir (#8766)

commit 26f7c0d7c1959bc1fe37915abe26db5c080dbb57
Author: Jaehun Ryu <[email protected]>
Date:   Wed Aug 18 02:14:05 2021 +0900

    [Relay testing] densenet implementation fix (#8704)

    * Fixed testing densenet bug

    * Fixed code format using black

commit 2793113880bf5b1f38dc6ff527c3c2f24c78c519
Author: Qiang Zhang <[email protected]>
Date:   Wed Aug 18 01:13:50 2021 +0800

    [TIR] Change Integer Implicit Conversion Rule to C Standard Way (#8733)

commit dbf9ce52d4e4fbcd0c1d44f6c8d05c7774dffe63
Author: Lunderberg <[email protected]>
Date:   Tue Aug 17 10:30:27 2021 -0500

    [UnitTests] Require cached fixtures to be copy-able, with opt-in. (#8451)

    * [UnitTests] Require cached fixtures to be copy-able, with opt-in.

    Previously, any class that doesn't raise a TypeError in copy.deepcopy
    could be used as a return value in a @tvm.testing.fixture.  This has
    the possibility of incorrectly copying classes inherit the default
    object.__reduce__ implementation.  Therefore, only classes that
    explicitly implement copy functionality (e.g. __deepcopy__ or
    __getstate__/__setstate__), or that are explicitly listed in
    tvm.testing._fixture_cache are allowed to be cached.

    * [UnitTests] Added TestCachedFixtureIsCopy

    Verifies that tvm.testing.fixture caching returns copy of object, not
    the original object.

    * [UnitTests] Correct parametrization of cudnn target.

    Previous checks for enabled runtimes were based only on the target
    kind.  CuDNN is the same target kind as "cuda", and therefore needs
    special handling.

    * Change test on uncacheable to check for explicit TypeError

commit f2c5272bdfb8d09c6dadde475d2a5420b28500d6
Author: Leandro Nunes <[email protected]>
Date:   Tue Aug 17 13:42:18 2021 +0100

    Enable custom images to be set in TVM Jenkinsfile (#8721)

    * This work is needed to enable automatic testing of our
       newly built Docker images as part of CI

     * The default value is set by variables in the same
       Jenkinsfile and are used when no custom values are
       provided

commit c28c86ae6b187c1afd12c86be3b65913345a49f5
Author: Thierry Moreau <[email protected]>
Date:   Tue Aug 17 00:24:30 2021 -0700

    [Community] @Mousius -> Reviewer (#8764)

    * adding Mousius to reviewers,  name update for Siva Reddy

    * making Siva's name consistent

commit f4ba8fc6276f984b311e20f8eb39888f424b373a
Author: Thierry Moreau <[email protected]>
Date:   Tue Aug 17 00:19:47 2021 -0700

    adding gromero as a reviewer (#8765)

commit e02ea7430589fa345ab4472f02511ae8d6c08dea
Author: Lily Orth-Smith <[email protected]>
Date:   Mon Aug 16 22:44:59 2021 -0700

    Add DictAttrs to IRModule and refactor DictAttrs utility functions (#8750)

    * Add DictAttrs to IRModuleNode

    Move GetAttrs to be a member of DictAttrs

    Generalize WithAttrs to work with IRModule and move to attrs.h

    Change func->GetAttr to func->attrs.GetAttr

    * lint

    * Fix documentation

    * fix typo

    * Another typo!

    * Revert GetAttrs to ->attrs.GetAttrs change

    * Didn't mean to revert these

    * Revert a few more things

    * Add GetAttrs to IRModuleNode

commit cfa498c0376622afe4e0f7344f0104dc97d7e876
Author: Mark Shields <[email protected]>
Date:   Mon Aug 16 22:44:30 2021 -0700

    Make from_tensorflow.py more GPU memory friendly. (#8763)

    * Make from_tensorflow.py more GPU memory friendly.

    Sphinx-gallery runs everything in a single process. There
    doesn't appear to be any easy way to force Tensorflow to
    return memory other than terminating the process. This at
    least gives us a little more wiggle room.

    * Also deploy_sparse.py. Should probably also be done to tensorflow.rst.

commit 2008d6279756a1dd74cf41eec090e98636215e17
Author: Euntaik <[email protected]>
Date:   Tue Aug 17 10:52:59 2021 +0900

    add support for half_pixel_centers in resize (#8689)

commit d02e50cbaf05de6955ff7bd7471dadc2a997fe15
Author: Yuanjing Shi <[email protected]>
Date:   Mon Aug 16 17:57:00 2021 -0700

    [AutoScheduler][FIX] Fix exception handling in measure.py (#8754)

    * fix exception handling

    * fix linting

    * stringify the exception from MapResult

    * use repr instead if str

commit e334942db002019979438971440d33ece16585a3
Author: Natan Kaminsky <[email protected]>
Date:   Mon Aug 16 15:35:03 2021 +0300

    Fix builtin_fp16.h path according to: https://discuss.tvm.apache.org/… (#8705)

commit 3e0c461f26b5174759dbb932986006f73a94a816
Author: Christopher Sidebottom <[email protected]>
Date:   Mon Aug 16 09:15:26 2021 +0100

    Allow Linker script files to be committed (#8745)

    This is a source file type needed for https://github.com/apache/tvm/pull/8744

    Co-authored-by: Grant Watson <[email protected]>

    Co-authored-by: Grant Watson <[email protected]>

commit 1d087920a0f1a9ea065c2c7992a15235170f5c75
Author: Gustavo Romero <[email protected]>
Date:   Mon Aug 16 05:09:01 2021 -0300

    [microTVM] Fix warnings on Zephyr tests (#8740)

    Fix the following warning message on Zephyr tests:

    DeprecationWarning: Please use input parameter mod (tvm.IRModule)
    instead of deprecated parameter mod (tvm.relay.function.Function)

    Signd-off-by: Gustavo Romero <[email protected]>

commit cddd3485dd58c1de98c1279a465897a36782befe
Author: Tianqi Zhang (张天启) <[email protected]>
Date:   Mon Aug 16 14:04:08 2021 +0800

    [Fix][TOPI] remove wrong fix in x86's dense_nopack operator (#8687)

commit 2e247825be4119882e6c5c691ccba69a2ad33836
Author: CircleSpin <[email protected]>
Date:   Mon Aug 16 02:02:01 2021 -0400

    [Onnx Operators] Celu (#8741)

    * complete celu op

    * forgot to add test

    * change order in convert_map, remove comment, delete import hiccup

    Co-authored-by: CircleSpin <[email protected]>

commit c4c31de90850d67d3cf1f16b28e92ce431799c8b
Author: Andrew Reusch <[email protected]>
Date:   Sun Aug 15 21:48:59 2021 -0700

    Install rust in ci-lint so cargo fmt can move to lint stage. (#8727)

commit 1a95f9bd0e84c4f3ebcbd668f26631fd74e8f28f
Author: xiaolong18 <[email protected]>
Date:   Mon Aug 16 04:43:52 2021 +0800

    [TF] Support TensorFlow < 1.13 for test_sparse_add (#8647)

commit 49224cb8b81b7e0b857935191019981b22787be3
Author: Andrey Malyshev <[email protected]>
Date:   Sun Aug 15 23:33:04 2021 +0300

    Fix use of fallback AutoTVM knobs in default scheduling (#8707)

    * Fix use of fallback AutoTVM knobs

    Previously knob values depended on order of explicit cfg update and cfg.define_split
    calls in fallback mode

    * Add test for define_split with fallback defined values

commit 994a15164cdac97ca610a3b6af4c157e0e465569
Author: AndrewZhaoLuo <[email protected]>
Date:   Sun Aug 15 09:40:04 2021 -0700

    update docs (#8736)

    Co-authored-by: Andrew Zhao Luo <[email protected]>

commit e12ddcafd74cc10cef343fc39a0c6a892a431650
Author: Alperen Bag <[email protected]>
Date:   Sun Aug 15 07:01:08 2021 +0300

    [FRONTEND][PYTORCH] Support fo nn.SiLU added (#8753)

commit 3ebd353a7f526cdf21293055a00eeeabe6efae1f
Author: Hua Jiang <[email protected]>
Date:   Sat Aug 14 17:48:50 2021 -0700

    [VTA] Make vta graph_pack compatible with latest TVM, and bring back object detection tutorials. (#8731)

    * [VTA] Make vta graph_pack compatible with latest TVM, and bring back
    object detection tutorials.

    * remove deploy_detection.py.

    * move out deploy_detection.py from legacy folder.

    * fix build error.

commit 901dee54e42d6393cde6eefcc25964db8e24e41d
Author: Lunderberg <[email protected]>
Date:   Sat Aug 14 19:47:08 2021 -0500

    [Vulkan] Check at codegen if the shader is within shared memory limits. (#8746)

    Previously, shaders that do not respect device limits for shared
    memory could result in segfaults that occur during the call to
    `vkCreateComputePipelines`.

commit 170add2f2fbc3507d1cbfc77ff95312dfe3a1ca9
Author: Robert Kimball <[email protected]>
Date:   Sat Aug 14 12:00:46 2021 -0700

    Add parameter to allow caller to supply a Runner (#8747)

    * Add parameter to allow caller to supply a Runner

    * Add unit test for passing in runner to graph tuner

commit f5661f4e78cae5f62bc172559040576cd2fd20f2
Author: Lunderberg <[email protected]>
Date:   Fri Aug 13 13:27:14 2021 -0500

    [Docs] Moved the generated tutorials folders into a _staging folder. (#8735)

    * [Docs] Moved the generated tutorials folders into a _staging folder.

    Previously, reorganization or renaming of tutorials could cause
    documentation tests to fail in CI.  The CI checks out the version to
    be tested, which may still have generated documents in
    `docs/tutorials` and `docs/vta/tutorials`.  If a PR moves these to
    different folders, then they show up as duplicate `*.rst` files,
    resulting in sphinx warnings.

    This commit makes a `docs/_staging` folder in which sphinx is run.
    All tutorials are generated within this folder, and the entire folder
    can be deleted with `make clean`.  As a result, it is safe to
    reorganize the tutorial without impacting CI.

    * Updates based on reviews.

    * Changed graph_runtime references in deploy_classification.py to graph_executor

    * Removed unnecessary graph_runtime import from tune_alu_vta.py

commit a06863ac9406c027b29f346fb6177268f612912d
Author: Wuwei Lin <[email protected]>
Date:   Fri Aug 13 02:54:34 2021 -0400

    [TensorIR][M2a] Storage Align (#8693)

    This PR is part of the TensorIR upstreaming effort (#7527), which adds the one
    schedule primitive storage_align.

    Co-authored-by: Siyuan Feng <[email protected]>
    Co-authored-by: Bohan Hou <[email protected]>
    Co-authored-by: Ruihang Lai <[email protected]>
    Co-authored-by: Hongyi Jin <[email protected]>
    Co-authored-by: Junru Shao <[email protected]>

commit ccc09fa7cda5a3975f064d00725aaa619d0c118c
Author: Tristan Konolige <[email protected]>
Date:   Thu Aug 12 21:36:16 2021 -0700

    [TVMC] Switch profile flag to use new profiler (#8710)

commit 395b308acf39bba20d4f5aceedaa9d31406feb2c
Author: Jiawei Liu <[email protected]>
Date:   Fri Aug 13 12:36:04 2021 +0800

    enhance tir signed-unsigned cast (#8706)

commit 7cf7adff444b88a9a661219b7cfc6e9fb61dd98f
Author: Valery Chernov <[email protected]>
Date:   Fri Aug 13 04:54:43 2021 +0300

    [Torch] chunk and unsafe chunk (#8718)

    * alternative chunk op was implemented in pytorch frontend. aten::unsafe_chunk was added to op map in pytorch frontend

    * chunk was replaced by new one in pytorch frontend. it is faster in 2.5 times

    Co-authored-by: Valery Chernov <[email protected]>

commit 8843153bf32129fe92acdb509c8d76ac0eef8e60
Author: Lunderberg <[email protected]>
Date:   Thu Aug 12 17:40:07 2021 -0500

    [UnitTest] Updated tolerances to avoid flaky unit test. (#8723)

    * [UnitTest] Updated tolerances to avoid flaky unit test.

    The result was correct, but the atol was just small enough to trigger
    a CI error for a value that was close to zero in an unrelated PR at
    #8670.

    https://ci.tlcpack.ai/blue/organizations/jenkins/tvm/detail/PR-8670/16/pipeline/#step-236-log-1703

    * Also updated 32-bit version of test_conv2d_nchw

commit 3e37bb5e677fbf6a20a8aaaea4792aa642a1cead
Author: lhutton1 <[email protected]>
Date:   Thu Aug 12 21:59:40 2021 +0100

    [CI] Add Arm Compute Library to Arm CI unit test pipeline (#8734)

commit 4dd7f6806f05bbc9c33d68d68493113534a34b12
Author: Yuanjing Shi <[email protected]>
Date:   Thu Aug 12 13:35:04 2021 -0700

    [TIR] Use PopenPool instead of multiprocessing.pool (#8492)

    Co-authored-by: Wuwei Lin <[email protected]>

commit 66ac4705aae9bec92047920c8a9273693cd48c44
Author: masahi <[email protected]>
Date:   Fri Aug 13 02:15:13 2021 +0900

    [Relay] Dense alter layout fixed for packed input (#8669)

    * clean up typerel

    * add layout transform when input is 3D

    * add test

    * update doc to clarify that only 2D input data is supported

    * add weight_layout attribute in dense

    * remove explicit layout transform from dense_alter_op.py

    * Add DensePackInferCorrectLayout to insert layout transform

    * relax type rel

    * revert type rel relax and add check on dim

    * introduce DensePackAttrs to avoid breaking dense op

    * try fixing arm compute lib test

    * Update tests/python/contrib/test_arm_compute_lib/test_dense.py

    Co-authored-by: lhutton1 <[email protected]>

    * formatting

    Co-authored-by: lhutton1 <[email protected]>

commit 5e20ef968c99d8294e5d185051e7421879f36abc
Author: Mehrdad Hessar <[email protected]>
Date:   Thu Aug 12 09:58:35 2021 -0700

    Remove qemu installation from Zephyr RVM (#8701)

commit 76a7fa9d92e99b544a6e34731cdc686a22891380
Author: Christopher Sidebottom <[email protected]>
Date:   Thu Aug 12 13:35:43 2021 +0100

    Convert AOT to TECompiler (#8697)

    * Convert AOT to TECompiler

    This removes the dependency on "compile_engine.h" from aot_executor_codegen.cc. This required a few changes to how AOT was operating:
    * AOT run_model is now based on the post lowering main_module
    * AOTOnDemandAllocator is ran twice to ensure SIDs are updated post-lowering
    * Moved to using tec::UpdateFunctionMetadata

    Tests are passing, but would appreciate other validation :smile_cat:

    * Clarify reasoning behind replanning memory later

    * Use main_func_info rather than bespoke logic in AOT

    This moves from using the bespoke AOT UpdateMainWorkspaceSize to the
    LoweredModule main_func_info property to unify with Graph executor
    codegen.

commit e9380e47f0b97c0b98b97f082b075eaa1308038b
Author: Christopher Sidebottom <[email protected]>
Date:   Thu Aug 12 09:51:24 2021 +0100

    Refactor AOT Test Utils parameters into object (#8650)

    * Refactor AOT Test Utils parameters into object

    `compile_and_run` was getting quite complicated to understand as well as being mostly duplicated by `comile_and_run_multiple_models`.

    This patch pulls out some common parameters into a data class `AOTTestNetwork` which makes it clearer what each parameter is doing and provides documentation.

    * Rename Network -> Model and sizebytes -> size_bytes

commit 9586ee2c2e7d902ca366eb18c85cd0d6515426fa
Author: Mehrdad Hessar <[email protected]>
Date:   Wed Aug 11 19:43:59 2021 -0700

    increase atol for float32 (#8712)

commit 2e6356854763e27ce03a4fbc79b6235ae7397317
Author: Lunderberg <[email protected]>
Date:   Wed Aug 11 21:43:40 2021 -0500

    [Docs][UnitTest] Updated target parametrization documentation (#8724)

    * [Docs][UnitTest] Updated target parametrization documentation

    The intended audience are developers writing unit tests, or debugging
    unit tests that have failed.  Therefore, moving the recommended style
    to the top of the section, and the implementation details to the
    bottom.

    * Documentation updates as recommended by tkonolige

commit 722efc5dad83e6f1312f372e20a65254a64c6d5b
Author: Lunderberg <[email protected]>
Date:   Wed Aug 11 18:31:21 2021 -0500

    [Docker] Refactor/clean-up of docker/bash.sh (#8670)

    * [Docker] Refactor/clean-up of docker/bash.sh

    - Added detailed help message, displayed using `-h` or `--help`.

    - Optional flags handled using `getopt`, can now occur in any order.

    - `--mount` flag may occur more than once.

    - Switched from short arguments to docker-run to long arguments
      (e.g. `--volume` instead of `-v`).  Short arguments are good
      shortcuts for interactive work, but can be more difficult to read in
      longer scripts.

    - Mount the `.tvm_test_data` folder, to avoid re-downloading test data
      already available in the host environment.

    * [Docker] docker/bash.sh CI fix

    Dash-prefixed arguments as part of the command now require prefixing with
    -- to separate them from arguments intended for docker/bash.sh

    * [Docker] docker/bash.sh, consistent quoting

    * [Docker] Added --repo-mount-point for docker/bash.sh

    * [Docker] Updated command-line parsing of docker/bash.sh

    - Maintained previous behavior, any unrecognized flags after the
      docker/bash.sh are part of the command, no -- is
      needed. (e.g. docker/bash.sh ci_gpu make -j2)

    - Reverted changes to Jenskinsfile to add a --, no longer needed.

    * [Docker] Fixed multi-argument commands

    * [Docker] docker/bash.sh check permissions before mounting ~/.tvm_test_data

    * [Docker] Consistent workplace directory in docker/bash.sh for Jenkins

    Some locations in the CI perform build commands outside of the build
    steps (e.g. tests/scripts/task_ci_setup.sh#L38), and cmake doesn't
    like it if the build directory changes.  These should probably be
    moved into the build steps of the CI, and be packed in tvm_multilib in
    the Jenkinsfile, but for the meantime maintaining a consistent
    /workspace directory on all CI nodes allows cmake to run.

    * [Docker] Updated bash.sh for MacOS compatibility

    MacOS has an older version of bash that handles arrays slightly
    differently.  All instances of array expansion `"${ARRAY[@]}"` should
    instead be written as `${ARRAY[@]+"${ARRAY[@]}"}`.  Otherwise, `set -u`
    will erroneously complain about an undefined variable. See
    https://stackoverflow.com/a/61551944 for details.

    Even though this is an older version of bash (observed in version
    3.2.57), this is the last major version available under GPLv2 and is
    therefore the default version on MacOSX.  At some point, the
    `docker/bash.sh` could be migrated to python for ease of
    maintenance/testing.

commit e88fe7726f58c415ac825b3a39ec0e034d3fa009
Author: Mehrdad Hessar <[email protected]>
Date:   Wed Aug 11 16:24:54 2021 -0700

    [microTVM] Zephyr Test Refactor (#8713)

    * refactor host to qemu

    * remove unused variables

    * remove skip-build arg

    * fix microtvm test script

commit 2bc0ecef8dd0f822329dc18773bd83d828206042
Author: Mark Shields <[email protected]>
Date:   Wed Aug 11 14:07:24 2021 -0700

    Force a gc between sphinx-gallery items to reclaim GPU memory. (#8722)

    GPU memory is only released once the PackedFunc for evaling the model is gced
    by Python. In CI we're noticing intermittent 'CUDA: Out of memory' failures
    while processing the tutorials, and tracing showed there was no gc happening
    between items. Not confident this will solve the problem but worth a try.

commit 09b989deb77cfb40f468c2566d1f40227af44bf7
Author: Jared Roesch <[email protected]>
Date:   Wed Aug 11 05:57:52 2021 -0700

    [Rust][Fix] Memory leak (#8714)

    * Fix obvious memory leak in function.rs

    * Update object point…
ylc pushed a commit to ylc/tvm that referenced this pull request Sep 29, 2021
…e#8722)

GPU memory is only released once the PackedFunc for evaling the model is gced
by Python. In CI we're noticing intermittent 'CUDA: Out of memory' failures
while processing the tutorials, and tracing showed there was no gc happening
between items. Not confident this will solve the problem but worth a try.
ylc pushed a commit to ylc/tvm that referenced this pull request Sep 29, 2021
…rewrite. (apache#8597)

* This continues the work outlined in the RFC
  https://discuss.tvm.apache.org/t/rfc-relay-tecompiler-rewrite-existing-compile-engine-to-match-updated-compiler-flow/9233
This gets about halfway there for the Interpreter:

* Remove direct access to TECompiler from interpreter, and instead call
  tec::LowerTEExpr when 'preparing' a module and expression for evaluation.
* Make clear there's no phase distinction between create_interpreter and
  evaluate on the Python side -- both must be prepared together as a single IRModule.
* But in return make sure the result of evaluate on the Python side is a packed func
  ready to directly apply 'simple' arguments to an already interpreted closure.
* The interpreter builds and caches primitive TIR functions (and their corresponding
  dynamic shape functions) as packed funcs as they are encountered.
* Cleanup uses of interpreter for constant folding on the C++ side.

Future work:
* Fold LoweredModule into IRModule so tec::LowerTEExpr is just another pass.
* Get rid of the implicit caching of lowered functions in TECompiler.
* Make calling convention from Relay to TIR explicit, and remove all the function
  attribute hackery currently needed so the interpreter can correctly invoke lowered
  functions as it encounters them.
* Make TECompiler private. Though could do this now it will make migrating the VM and
  AOT uses of CompilerEngine harder.

Force a gc between sphinx-gallery items to reclaim GPU memory. (apache#8722)

GPU memory is only released once the PackedFunc for evaling the model is gced
by Python. In CI we're noticing intermittent 'CUDA: Out of memory' failures
while processing the tutorials, and tracing showed there was no gc happening
between items. Not confident this will solve the problem but worth a try.

* Get rid of logs spam.
ylc pushed a commit to ylc/tvm that referenced this pull request Jan 13, 2022
…e#8722)

GPU memory is only released once the PackedFunc for evaling the model is gced
by Python. In CI we're noticing intermittent 'CUDA: Out of memory' failures
while processing the tutorials, and tracing showed there was no gc happening
between items. Not confident this will solve the problem but worth a try.
ylc pushed a commit to ylc/tvm that referenced this pull request Jan 13, 2022
…rewrite. (apache#8597)

* This continues the work outlined in the RFC
  https://discuss.tvm.apache.org/t/rfc-relay-tecompiler-rewrite-existing-compile-engine-to-match-updated-compiler-flow/9233
This gets about halfway there for the Interpreter:

* Remove direct access to TECompiler from interpreter, and instead call
  tec::LowerTEExpr when 'preparing' a module and expression for evaluation.
* Make clear there's no phase distinction between create_interpreter and
  evaluate on the Python side -- both must be prepared together as a single IRModule.
* But in return make sure the result of evaluate on the Python side is a packed func
  ready to directly apply 'simple' arguments to an already interpreted closure.
* The interpreter builds and caches primitive TIR functions (and their corresponding
  dynamic shape functions) as packed funcs as they are encountered.
* Cleanup uses of interpreter for constant folding on the C++ side.

Future work:
* Fold LoweredModule into IRModule so tec::LowerTEExpr is just another pass.
* Get rid of the implicit caching of lowered functions in TECompiler.
* Make calling convention from Relay to TIR explicit, and remove all the function
  attribute hackery currently needed so the interpreter can correctly invoke lowered
  functions as it encounters them.
* Make TECompiler private. Though could do this now it will make migrating the VM and
  AOT uses of CompilerEngine harder.

Force a gc between sphinx-gallery items to reclaim GPU memory. (apache#8722)

GPU memory is only released once the PackedFunc for evaling the model is gced
by Python. In CI we're noticing intermittent 'CUDA: Out of memory' failures
while processing the tutorials, and tracing showed there was no gc happening
between items. Not confident this will solve the problem but worth a try.

* Get rid of logs spam.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants