-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Force a gc between sphinx-gallery items to reclaim GPU memory. #8722
Conversation
GPU memory is only released once the PackedFunc for evaling the model is gced by Python. In CI we're noticing intermittent 'CUDA: Out of memory' failures while processing the tutorials, and tracing showed there was no gc happening between items. Not confident this will solve the problem but worth a try.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @mbs-octoml ! the other idea i had was explicitly deleting the module from sys.modules, if it's true that sphinx-gallery is importing it. seems like it does something very weird, though, and it's likely the top-level ref is removed: https://github.com/sphinx-gallery/sphinx-gallery/blob/master/sphinx_gallery/gen_rst.py#L501
Yuk. At least on my runs it's not outright holding onto everything and gc indeed does do the right thing. So I thought trying the obvious dumb thing first would be worthwhile before we get deeper into the labyrinth. |
https://discuss.tvm.apache.org/t/rfc-relay-tecompiler-rewrite-existing-compile-engine-to-match-updated-compiler-flow/9233 This gets about halfway there for the Interpreter: * Remove direct access to TECompiler from interpreter, and instead call tec::LowerTEExpr when 'preparing' a module and expression for evaluation. * Make clear there's no phase distinction between create_interpreter and evaluate on the Python side -- both must be prepared together as a single IRModule. * But in return make sure the result of evaluate on the Python side is a packed func ready to directly apply 'simple' arguments to an already interpreted closure. * The interpreter builds and caches primitive TIR functions (and their corresponding dynamic shape functions) as packed funcs as they are encountered. * Cleanup uses of interpreter for constant folding on the C++ side. Future work: * Fold LoweredModule into IRModule so tec::LowerTEExpr is just another pass. * Get rid of the implicit caching of lowered functions in TECompiler. * Make calling convention from Relay to TIR explicit, and remove all the function attribute hackery currently needed so the interpreter can correctly invoke lowered functions as it encounters them. * Make TECompiler private. Though could do this now it will make migrating the VM and AOT uses of CompilerEngine harder. Force a gc between sphinx-gallery items to reclaim GPU memory. (apache#8722) GPU memory is only released once the PackedFunc for evaling the model is gced by Python. In CI we're noticing intermittent 'CUDA: Out of memory' failures while processing the tutorials, and tracing showed there was no gc happening between items. Not confident this will solve the problem but worth a try.
Some followup context appears that sphinx-gallery did recycle global variables, i also did a quick exp to confirm that so the memory contention might due to the global retention of packages that we use(likely Tensorflow or pytorch who might retain its internal memory pool) Adding "tensorflow", "torch" to reset_modules might be the right solution |
https://discuss.tvm.apache.org/t/rfc-relay-tecompiler-rewrite-existing-compile-engine-to-match-updated-compiler-flow/9233 This gets about halfway there for the Interpreter: * Remove direct access to TECompiler from interpreter, and instead call tec::LowerTEExpr when 'preparing' a module and expression for evaluation. * Make clear there's no phase distinction between create_interpreter and evaluate on the Python side -- both must be prepared together as a single IRModule. * But in return make sure the result of evaluate on the Python side is a packed func ready to directly apply 'simple' arguments to an already interpreted closure. * The interpreter builds and caches primitive TIR functions (and their corresponding dynamic shape functions) as packed funcs as they are encountered. * Cleanup uses of interpreter for constant folding on the C++ side. Future work: * Fold LoweredModule into IRModule so tec::LowerTEExpr is just another pass. * Get rid of the implicit caching of lowered functions in TECompiler. * Make calling convention from Relay to TIR explicit, and remove all the function attribute hackery currently needed so the interpreter can correctly invoke lowered functions as it encounters them. * Make TECompiler private. Though could do this now it will make migrating the VM and AOT uses of CompilerEngine harder. Force a gc between sphinx-gallery items to reclaim GPU memory. (apache#8722) GPU memory is only released once the PackedFunc for evaling the model is gced by Python. In CI we're noticing intermittent 'CUDA: Out of memory' failures while processing the tutorials, and tracing showed there was no gc happening between items. Not confident this will solve the problem but worth a try.
https://discuss.tvm.apache.org/t/rfc-relay-tecompiler-rewrite-existing-compile-engine-to-match-updated-compiler-flow/9233 This gets about halfway there for the Interpreter: * Remove direct access to TECompiler from interpreter, and instead call tec::LowerTEExpr when 'preparing' a module and expression for evaluation. * Make clear there's no phase distinction between create_interpreter and evaluate on the Python side -- both must be prepared together as a single IRModule. * But in return make sure the result of evaluate on the Python side is a packed func ready to directly apply 'simple' arguments to an already interpreted closure. * The interpreter builds and caches primitive TIR functions (and their corresponding dynamic shape functions) as packed funcs as they are encountered. * Cleanup uses of interpreter for constant folding on the C++ side. Future work: * Fold LoweredModule into IRModule so tec::LowerTEExpr is just another pass. * Get rid of the implicit caching of lowered functions in TECompiler. * Make calling convention from Relay to TIR explicit, and remove all the function attribute hackery currently needed so the interpreter can correctly invoke lowered functions as it encounters them. * Make TECompiler private. Though could do this now it will make migrating the VM and AOT uses of CompilerEngine harder. Force a gc between sphinx-gallery items to reclaim GPU memory. (apache#8722) GPU memory is only released once the PackedFunc for evaling the model is gced by Python. In CI we're noticing intermittent 'CUDA: Out of memory' failures while processing the tutorials, and tracing showed there was no gc happening between items. Not confident this will solve the problem but worth a try.
https://discuss.tvm.apache.org/t/rfc-relay-tecompiler-rewrite-existing-compile-engine-to-match-updated-compiler-flow/9233 This gets about halfway there for the Interpreter: * Remove direct access to TECompiler from interpreter, and instead call tec::LowerTEExpr when 'preparing' a module and expression for evaluation. * Make clear there's no phase distinction between create_interpreter and evaluate on the Python side -- both must be prepared together as a single IRModule. * But in return make sure the result of evaluate on the Python side is a packed func ready to directly apply 'simple' arguments to an already interpreted closure. * The interpreter builds and caches primitive TIR functions (and their corresponding dynamic shape functions) as packed funcs as they are encountered. * Cleanup uses of interpreter for constant folding on the C++ side. Future work: * Fold LoweredModule into IRModule so tec::LowerTEExpr is just another pass. * Get rid of the implicit caching of lowered functions in TECompiler. * Make calling convention from Relay to TIR explicit, and remove all the function attribute hackery currently needed so the interpreter can correctly invoke lowered functions as it encounters them. * Make TECompiler private. Though could do this now it will make migrating the VM and AOT uses of CompilerEngine harder. Force a gc between sphinx-gallery items to reclaim GPU memory. (apache#8722) GPU memory is only released once the PackedFunc for evaling the model is gced by Python. In CI we're noticing intermittent 'CUDA: Out of memory' failures while processing the tutorials, and tracing showed there was no gc happening between items. Not confident this will solve the problem but worth a try.
…rewrite. (#8597) * This continues the work outlined in the RFC https://discuss.tvm.apache.org/t/rfc-relay-tecompiler-rewrite-existing-compile-engine-to-match-updated-compiler-flow/9233 This gets about halfway there for the Interpreter: * Remove direct access to TECompiler from interpreter, and instead call tec::LowerTEExpr when 'preparing' a module and expression for evaluation. * Make clear there's no phase distinction between create_interpreter and evaluate on the Python side -- both must be prepared together as a single IRModule. * But in return make sure the result of evaluate on the Python side is a packed func ready to directly apply 'simple' arguments to an already interpreted closure. * The interpreter builds and caches primitive TIR functions (and their corresponding dynamic shape functions) as packed funcs as they are encountered. * Cleanup uses of interpreter for constant folding on the C++ side. Future work: * Fold LoweredModule into IRModule so tec::LowerTEExpr is just another pass. * Get rid of the implicit caching of lowered functions in TECompiler. * Make calling convention from Relay to TIR explicit, and remove all the function attribute hackery currently needed so the interpreter can correctly invoke lowered functions as it encounters them. * Make TECompiler private. Though could do this now it will make migrating the VM and AOT uses of CompilerEngine harder. Force a gc between sphinx-gallery items to reclaim GPU memory. (#8722) GPU memory is only released once the PackedFunc for evaling the model is gced by Python. In CI we're noticing intermittent 'CUDA: Out of memory' failures while processing the tutorials, and tracing showed there was no gc happening between items. Not confident this will solve the problem but worth a try. * Get rid of logs spam.
commit 2545e9caecadd66c72fbb6734c30d100e823b0fb Author: Josh Fromm <[email protected]> Date: Sat Aug 28 12:59:20 2021 -0700 [Frontend][Onnx] Simplify onnx input since name accesses are not reliable. (#8867) * Simplify onnx input since name accesses are no longer supported. * move Celu importer. commit 0961b65cbf0d6e1c5f51e0e88dd17886d6111522 Author: Jiawei Liu <[email protected]> Date: Sat Aug 28 04:28:07 2021 -0500 [Tutorial][Executor] Fix the usage of executors in tutorials (#8586) * fix: executor usage for keras tutorial * fix: executor usage for onnx tutorial * [Tutorial][Executor] Fix executors in tutorials commit 5ab527a71f7eb1d352db1408b225c79a21945c94 Author: AndrewZhaoLuo <[email protected]> Date: Sat Aug 28 02:24:16 2021 -0700 [Autoscheduler] Configurable workload keys (#8862) * change workload keys * remove binary string comparison * append the tuple not every integer * clean up * lint * dump workload keys to dags * fix things * change some strings * misc fixes, add tests * jostle ci commit 7214f5239dbb8da4585d4d10fbc8c65c8f155b12 Author: Siyuan Feng <[email protected]> Date: Sat Aug 28 17:23:43 2021 +0800 [TIR] Fix opaque access in buffer locator pass and match_buffer in region detector (#8855) * init * fix * Update src/tir/transforms/plan_update_buffer_allocation_location.cc Co-authored-by: Ruihang Lai <[email protected]> * Update src/tir/transforms/plan_update_buffer_allocation_location.cc Co-authored-by: Ruihang Lai <[email protected]> * address Co-authored-by: Junru Shao <[email protected]> Co-authored-by: Ruihang Lai <[email protected]> commit 1df6c273f0fb1242d0b399614616635cef38bc15 Author: Yizhi Liu <[email protected]> Date: Fri Aug 27 19:33:58 2021 -0700 [Community] @mdw-octoml -> Reviewer (#8868) commit f188a4fb11971c9bfce9b059fd2b9dacdbe1a0d1 Author: Christopher Sidebottom <[email protected]> Date: Fri Aug 27 23:29:42 2021 +0100 Remove AOT Executor header from Arduino project (#8857) commit 3c86eec10ff8ced914db2af5873dfa91b76e5523 Author: Lunderberg <[email protected]> Date: Fri Aug 27 16:43:49 2021 -0500 [UnitTests] Expose TVM pytest helpers as plugin (#8532) * [UnitTests] Expose TVM pytest helpers as plugin Previously, pytest helper utilities such as automatic parametrization of `target`/`dev`, or `tvm.testing.parameter` were only available for tests within the `${TVM_HOME}/tests` directory. This PR extracts the helper utilities into an importable plugin, which can be used in external tests (e.g. one-off debugging). * [UnitTests] Refactor the plugin-specific logic out into plugin.py. * [UnitTests] Moved marker definition out to global variable. commit 55bb8b60b707d5fc25c3828adf6086aa01bcc039 Author: Tristan Konolige <[email protected]> Date: Fri Aug 27 14:39:03 2021 -0700 [Graph Executor, VM] Add end to end benchmarking of models (#8858) Add benchmarking that includes ovearhead of transfering inputs and outputs to and from the device. This should give an accurate measurement of the runtime a user would see when using the model. This is accomplished by adding functions that run from inputs to return values into the graph executor and the VM. commit cf19c889214ca9a1b8c420baff35aa10986b3d9c Author: Krzysztof Parzyszek <[email protected]> Date: Fri Aug 27 05:22:47 2021 -0500 [Hexagon] Change declaration order of unique_ptr objects to fix crash (#8859) A crash occurs when automatically deleting an instance of CodeGenHexagon because the LLVMContext object has already been freed. Objects of both types are created using unique_ptr, but the object managed by the LLVMContext unique_ptr is passed to CodeGenHexagon object (not as a unique_ptr). This crash is fixed by moving the declaration of the LLVMContext object before the CodeGenHexagon object. I'm not sure if this is the best way to fix this, but it does fix the crash. Also, in other files, the LLVMContext object is always created first. Co-authored-by: Cahoon, Brendon <[email protected]> commit 3306857d80bfc76cdd10d7a40028f52b7ca696aa Author: Swift.Sun <[email protected]> Date: Fri Aug 27 17:28:50 2021 +0800 [Runtime] add set_output_zero_copy (#8497) * Update graph_executor.h * Update graph_executor.cc * modify zero copy UT add set input zero copy * modify C style * add runtime test * realy build generatr the json Co-authored-by: hwstaff <[email protected]> commit e774fed67c2d12e6cfc29a013f029d4b55c28e2a Author: Jason <[email protected]> Date: Fri Aug 27 15:34:05 2021 +0800 Add a PaddlePaddle Frontend (#8645) * fix some problems for matmul * fix some problems for matmul * add alpha parameter for matmul * remove unnecessary condition * add TranslatedLayer which support model loaded by jit.load * add mul operator support * Add padding mode support for conv/pool2d * support 4 two-tuples * add paddle test case * add paddle conv2d case * update test_forward.py * fix paddle convert_matmul * add paddle multiply and matmul op test case * add test case and fix bug * delete import pandas * add paddlepaddle tests * modify the variable name of convert_reshape * formatting * formatting * use black to format python code * pylint check * Remove fluid api * black format Co-authored-by: root <[email protected]> Co-authored-by: wjj19950828 <[email protected]> Co-authored-by: heliqi <[email protected]> Co-authored-by: Junru Shao <[email protected]> commit 9d168822f2950083a59be243cb35ad51888dbc5d Author: Christopher Sidebottom <[email protected]> Date: Fri Aug 27 06:04:09 2021 +0100 Change AOT from ExprVisitor to MixedModeVisitor (#8856) This should allow better scale-ability for AOT when targeting larger networks. commit b4b194dbb0db1f152740bbb84cab96721482e2cf Author: Valery Chernov <[email protected]> Date: Fri Aug 27 05:48:25 2021 +0300 extend repeat_interleave op for relay.Expr (#8839) Co-authored-by: Valery Chernov <[email protected]> commit 227bf7ffafec7a2cff543a8a22f3741f45970b8d Author: Tantalus13A98B5F <[email protected]> Date: Thu Aug 26 21:18:34 2021 -0400 [TOPI] [Relay] Sparse Conv2d Implementation for 3x3 kernels (#8605) * [topi] add spconv2d_3x3 nhwc * [relay] sparse_conv2d: add kernel_size attr * [relay] add strategy for spconv2d_3x3 nhwc * [relay] pass to convert spconv2d with const args * [relay] convert sparse conv2d pass fixes * use array for sparse conv2d attr * fixup 1x1 tests; new 3x3 tests commit f4f525dab86af653636bce95ce3609288fbaa587 Author: masahi <[email protected]> Date: Fri Aug 27 07:16:54 2021 +0900 [AMP] Disallow fp16 conversion for summation-like ops (#8810) * [AMP] Disallow fp16 conversion for summation-like ops * test only structural equality commit 3d81489a2656214e93c6ea983e82c55b310cd28b Author: Mehrdad Hessar <[email protected]> Date: Thu Aug 26 23:24:51 2021 +0200 move rust lint script (#8726) commit 423958fd8fdf1a2bd8d45d604135054953c5c73b Author: Chenfan <[email protected]> Date: Fri Aug 27 03:33:37 2021 +0800 [FIX] Bug fix for a floormod rewrite simplify rule (#8852) * Update rewrite_simplify.cc * Update test_arith_rewrite_simplify.py * Update test_arith_rewrite_simplify.py * Update test_arith_rewrite_simplify.py commit 04bdd32281c4ae50d086e4469fd6a9ee6f0c93b6 Author: Anton Sorokin <[email protected]> Date: Thu Aug 26 10:21:18 2021 -0700 VTA cmake change to include Verilator header for building tsim library (#8797) * VTA cmake file require Verilator include for tsim target. VTA module.cc uses svOpenArrayHandle to send wide data through DPI * Refactor Verialtor check conditions * Build TSIM only for CPU target. CPU target don't use -Werror to compile with Verilator. Jenkinsfile to have tvm_multilib_tsim defined for CPU build target. * remove build/libvta_tsim.so from non tsim targeting builds * Revert to enable TSIM build i386. Revert to -Werror in CPU config. Remove verilator CPP objects from cmake config for tsim and put them as include into vta module.cc to avoid Verilator compilation warnings commit 4fd1bf4e512aafc0bea0b809789cd27f8dd944d4 Author: Mehrdad Hessar <[email protected]> Date: Thu Aug 26 19:08:15 2021 +0200 update gpu and cpu (#8853) commit d263c6d4300170cc6cf7f58b923edcb23b5a7791 Author: Elen Kalda <[email protected]> Date: Thu Aug 26 18:06:23 2021 +0100 [Pattern matching] Add an option to rewrite the graph only once (#8843) * [Pattern matching] Add an option to rewrite the graph only once If the graph returned from the callback consists of the original pattern, the rewriter will run in the loop, which is not always desired. So this patch proposes an option to run the rewriter only once. Change-Id: I85cf0a055b8961d52394f21c1e4d7aad0a7e1d06 * Make rewrite_once default to false Change-Id: Idf6f01f254c403158883681e75c2a5978efbd2d0 commit 3f777d555f1b1a125b0f7f83291d1d8693ffa6be Author: Krzysztof Parzyszek <[email protected]> Date: Thu Aug 26 09:08:17 2021 -0500 [Hexagon] Rework tvm.target.hexagon() interface (#8823) * [Hexagon] Rework tvm.target.hexagon() interface Make the tvm.target.hexagon() function take most options as keyword parameters. This will allow adding additional parameters without changing the interface. No changes are required to existing code, except for changing positional parameters following the CPU version to keyword parameters, and updating the names of the keyword parameters: sim_args -> sim_options, llvm_args -> llvm_options, although the old names will be accepted for the time being. * formatting * change ' to " * Rename 'args' to 'config' for clarity * Use 'strip' instad of 'replace' * Restart build commit bca57cb1e74fe946c2db3d24fe5042b74da9fea7 Author: AndrewZhaoLuo <[email protected]> Date: Thu Aug 26 04:23:28 2021 -0700 [AMP] Bump up tolerance on flaky test (#8850) * bumpy up tol * bumped tolerance up even more * jostle ci commit 98a3476bfc7428f592ad0fd6b8c863b5fd5ec1f9 Author: Mehrdad Hessar <[email protected]> Date: Thu Aug 26 13:22:43 2021 +0200 [microtvm][Zephyr] Increase timeout to fix flaky tests (#8846) * increase timeout * trigger commit 349157641b17882fcf944409fba79c7300978a77 Author: Ashutosh Parkhi <[email protected]> Date: Thu Aug 26 11:16:45 2021 +0100 Support for CMSIS-NN in Corstone300 Makefile (#8831) Change-Id: Ifc2305db4e11d1d15d45407287f8f0bea469100a commit 46f8b61bd3f9f3351104a0bb9934afe3bafa2c28 Author: Anastasia Stulova <[email protected]> Date: Thu Aug 26 10:03:06 2021 +0100 [Relay][TOPI] Support of depthwise conv2d NHWC for Mali/Bifrost. (#8584) * [Relay][TOPI] Support of depthwise conv2d NHWC for Mali/Bifrost. Added initial tunable autotvm templates for depthwise conv2d with NHWC layout for Mali and Bifrost. * [Relay][TOPI] Misc fixes for depthwise conv2d Mali/Bifrost. - Fix assert for Bifrost. - Set reasonable default axis splits to avoid using tophub for NHWC. - Fixed typo: arm cpu -> Mali. * [Relay][TOPI] Fixed formatting in depthwise conv2d Mali/Bifrost. commit d80528db0becfc471acd1e7cda122f8283117627 Author: Christopher Sidebottom <[email protected]> Date: Thu Aug 26 09:54:01 2021 +0100 Apply CPPLint to CRT Tests (#8844) This one was a bit trickier as there was more usage of dynamic arrays and less safe casts. I've tried to minimise the changes to just those required to passing linting. commit f1ca91d4e401096d04e962c982d62b1f2669c9f5 Author: Tristan Konolige <[email protected]> Date: Wed Aug 25 18:25:29 2021 -0700 [GRAPH EXECUTOR,VM] Add benchmarking function to graph executor and vm (#8807) * [GRAPH EXECUTOR,VM] Add benchmarking function to graph executor and vm This new benchmarking function is just a convenience function for calling time_evaluator on the underlying module. Hopefully this should make it easier for users to get good benchmarks of their code. * formatting * import order * more test, more comments, more precision * fix tests * add seconds descriptions to doc commit 0648fffc9b6fddd27dc04a91ebac9cccd780b3b3 Author: Jiawei Liu <[email protected]> Date: Wed Aug 25 17:34:10 2021 -0500 [BUG] ToBasicBlockNormalForm immutability (#8778) * ToBasicBlockNormalForm immutability * better comment on ToBasicBlock * refine comment of ToBasicBlockForm commit 4a9b5b5cf597418c8bdbf2e0fcb8ac8cf24f0d07 Author: Mehrdad Hessar <[email protected]> Date: Wed Aug 25 22:58:40 2021 +0200 Update CI Lint Image Version (#8841) * Update CI Lint Image Version * trigger commit 5a6b75dcac8c2be4e3e9751844794bd74762eb51 Author: Michalis Papadimitriou <[email protected]> Date: Wed Aug 25 21:41:10 2021 +0300 [Pre-commit] Add pre-commit configuration to perform minimal checks locally (#8382) * [Pre-commit] Add pre-commit hook configuration file * [Pre-commit] Add header to configuratin file * [Pre-commit] Add basic configuration instructions * [Pre-commit] Extend pre-commit pipelines with C++ linting * [pre-commit] Add example usage comment for pre-commit hooks * [CI] Add in docker linting script mypy step * [CI] Use lint docker image for pre-commit checks * [CI][pre-commit] Minor cleanups on docker runners of pre-commit lints commit a31ebf75350946f09b2965576ba276e526f8d4ac Author: Krzysztof Parzyszek <[email protected]> Date: Wed Aug 25 11:46:19 2021 -0500 [Hexagon] Reuse Hexagon SDK analysis across cmake files (#8822) * [Hexagon] Reuse Hexagon SDK analysis across cmake files Different versions of the Hexagon SDK may have different directory structures. Extract the directory identification code into a separate cmake module. Use that module in Hexagon.cmake and in the cmake file for the FastRPC libraries. * Don't modify CMAKE_SHARED_LINKER_FLAGS, instead set target properties * Add quotes around ${...} * Add USE_HEXAGON_ARCH variable to cmake configuration * Restart build commit 977bdbdf772f6149554d43ae9073ce58c8e36a38 Author: Christopher Sidebottom <[email protected]> Date: Wed Aug 25 16:54:40 2021 +0100 Force CMake targets in top-level Makefile to run (#8840) This is a bug I introduced in https://github.com/apache/tvm/pull/8809, because the built binary is now named `build/cpptest` when `make` checks that artifact it finds it exists already and skips running `make -C build cpptest`. This ensures all nested `make` calls are forced to run from the top-level `Makefile`. commit b8193646fa9f97fc3476b5275d8ce8b0270408a3 Author: Valery Chernov <[email protected]> Date: Wed Aug 25 10:48:14 2021 +0300 [Frontend] [Torch] [ONNX] GRU layer (#8781) * GRU cell was implemented in common.py. GRU was supported on pytorch frontend side * update GRU in common.py and onnx frontend * fix issue related to GRU accuracy in pytorch and ONNX frontend * small fixes and remove excess * common GRU was additionaly updated. tuned pytorch GRU was strongly accelerated * GRU cell in ONNX frontend was used from common.py. previous implementation was removed * small fixes in comments * fixes after review. GRU test was implemented for pytorch frontend * tests for RNN layers was unified for pytorch frontend Co-authored-by: Valery Chernov <[email protected]> commit 02b57a6be3044b25ed997774760bcec649768696 Author: Matt Welsh (OctoML) <[email protected]> Date: Tue Aug 24 20:02:45 2021 -0700 Update CONTRIBUTORS.md (#8837) TVM is no longer in the Apache Incubator; moving mentors to the end of the doc. commit 7ae8f898a768956d6c27ac9054ab52e8ad9ead73 Author: Cody Yu <[email protected]> Date: Tue Aug 24 16:24:20 2021 -0700 [Community] @Lunderberg -> Reviewer (#8834) commit 2859c20391a2eb377f9ae28d4980193b5d1685a1 Author: Xiyou Zhou <[email protected]> Date: Tue Aug 24 15:52:03 2021 -0700 [M3a][Meta Schedule] Add Sampling Primitive SampleCategorical. (#8817) Co-authored-by: Junru Shao <[email protected]> Co-authored-by: Wuwei Lin <[email protected]> Co-authored-by: Ruihang Lai <[email protected]> Co-authored-by: Hongyi Jin <[email protected]> Co-authored-by: Siyuan Feng <[email protected]> Co-authored-by: Bohan Hou <[email protected]> commit 44a1d1f81d1d651e914750c58299079f205cbd5c Author: Christopher Sidebottom <[email protected]> Date: Tue Aug 24 23:20:03 2021 +0100 Better reflect allocator names in CRT tests (#8828) When the AOT executor was introduced, the Stack Allocator was associated with it by test name whereas the Page Allocator was left as just memory_test.cc. This cleans that up a bit to clarify which tests which allocator. commit 5ada91bbf42f7a7351198994e654c6423d8eddb6 Author: Matt Welsh (OctoML) <[email protected]> Date: Tue Aug 24 14:59:02 2021 -0700 Add link to docs and tutorials in the README. (#8832) Most project pages on GitHub have a README.md file with a clear link to installation or tutorial material for new users. While there is a link to Documentation, it's not that obvious, and adding a more explicit "getting started" link may be helpful for new TVM users trying to navigate the project. commit fe0bd12f776842b157d6ba994ae85ff3d15ee884 Author: Krzysztof Parzyszek <[email protected]> Date: Tue Aug 24 16:30:16 2021 -0500 [Hexagon] Remove uses of LLVM from simulator runtime (#8821) * [Hexagon] Remove uses of LLVM from simulator runtime The TVM runtime is not linked with LLVM libraries, so using LLVM in it carries a risk of referencing undefined symbols. This may work for objects defined in header files, but it then relies on LLVM keeping them there. Replace uses of LLVM utilities in the Hexagon simulator runtime, with simple alternatives. * clang-format * Use dmlc::optional instead of implementing one from scratch Make detail::Optional be derived from dmlc::optional, and add some bits to make it behave more like the C++17's std::optional. The goal is to replace detail::Optional with std::optional, once the project switches to C++17. commit b77a7d4fc672a4af9858e17c92c6c731d8657249 Author: Christopher Sidebottom <[email protected]> Date: Tue Aug 24 21:45:52 2021 +0100 Apply CPPLint to C++ Unit Tests (#8827) This change enables `cpplint` for the tests in `tests/cpp` and corrects any current linting errors. I had to use `NOLINT` in some of the PackedFunc tests due to a bug (see: https://github.com/cpplint/cpplint/issues/131) in CPPLint where `int(int)` is picked up as a cast rather than a nameless argument. commit 2c6dccb6694ae8d1bd8d8ed752cf1bc8f1fccf1f Author: Christopher Sidebottom <[email protected]> Date: Tue Aug 24 18:20:02 2021 +0100 Correct function signatures for CreateXPass functions in docs (#8829) commit 596ff5cff576d8ded1251bb936eb7198551cc481 Author: Yuanjing Shi <[email protected]> Date: Tue Aug 24 09:40:30 2021 -0700 [AutoTVM] Use PopenPool in XGBoostCostModel (#8820) * replacd multiprocessing.Pool with PopenPoolExecutor * add initializer func * static init func * address comments * linting * fix tests * address comments commit 64a7eb674623c658342ec95283323a0d5ae4ca73 Author: Chenfan <[email protected]> Date: Tue Aug 24 21:33:36 2021 +0800 [FLAKY] A small bug fix on the CmakeLists (#8826) commit eabae30059cf914bbe7fa2ad46d381ff662267b9 Author: Jared Roesch <[email protected]> Date: Mon Aug 23 21:11:24 2021 -0700 [Rust] Fix memory leak #2 (#8725) * Add C++ API for computing type key from type index * Try and isolate leak * Rewrite the bindings to fix the ArgValue lifetime issue There are still quite a few issues left to resolve in this patch, but I believe the runtime changes stablize memory consumption as long as the parameters are only set once. ByteArray also has some totally broken unsafe code which I am unsure of how it was introduced. * Finish handling tvm-rt issues due to ArgValue lifetime This patch further refactors the bindings to better handle the lifetime issues introduced by detecting the argument memory leak. * WIP memory leak * There is issue using TVMCb function which is breaking refcount * Fix fallout from the lifetime refactor * Another tweak * Follow up work from the memory leak, attempt to clean up ByteArray * Add some todos for future work * Fix doc string * Clean up the changes * Format commit e883dcba2e2529d4dcf23169a7c72494b0b5b60b Author: Christopher Sidebottom <[email protected]> Date: Tue Aug 24 04:59:45 2021 +0100 Run AOT tests against reference system (#8744) * Run AOT tests against reference system This introduces an alternative way of running AOT tests using the reference system added in https://github.com/apache/tvm/pull/8514. This gives us additional assurance that the AOT output runs successfully on embedded platforms in our core test suite. I've also changed calculate_workspace_sizes to debug_workspace_sizes and default to False in most cases as it only needs to be True for a few cases to check theoutput with the debug flag - this was discovered trying to allocate 16MB in an embedded test :scream_cat: Co-authored-by: Grant Watson <[email protected]> * Skip AOT reference system tests in i386 container * Add comment clarifying the reference system runner Co-authored-by: Grant Watson <[email protected]> commit 4524567775a1e963e656bc740d36166460d45b66 Author: Gustavo Romero <[email protected]> Date: Tue Aug 24 00:56:50 2021 -0300 Remove unnecessary memset in TVMMutableFuncRegistry initialization (#8818) Remove unnecessary memset() call in TVMMutableFuncRegistry_Create() when initializing a TVMMutableFuncRegistry struct. All struct members (registry.names, registry.funcs, and max_functions) are already initialized properly before returning, hence some CPU cycles might be saved (usually 12 bytes in a 32-bit platform and 24 bytes in a 64-bit platform must be written with 0 by memset()). Signed-off-by: Gustavo Romero <[email protected]> commit dfe21c5f208496560a405123ef27ca7e747bb7c4 Author: Gustavo Romero <[email protected]> Date: Tue Aug 24 00:56:35 2021 -0300 Remove unused allocated memory in crt initialization (#8819) Currently TVMInitializeRuntime() allocates 250 bytes dynamically to back buffer 'func_registry_memory' which is never used. That is not much in general but besides being twice the current necessary amount for the runtime (allocated to back 'registry_backing_memory' buffer) that amount can be important to be saved on memory-constrained devices (microTVM). This commit removes the 'func_registry_memory' buffer which is allocated dynamically in TVMInitializeRuntime() since it occupies 250 bytes and is never used. Signed-off-by: Gustavo Romero <[email protected]> commit b1396a0f159298a286091b018090369cd8dcff53 Author: Tristan Konolige <[email protected]> Date: Mon Aug 23 20:56:12 2021 -0700 [FIX] Remove leftover instances of USE_GRAPH_EXECUTOR_DEBUG (#8796) * [FIX] Remove leftover instances of USE_GRAPH_EXECUTOR_DEBUG single flag, USE_PROFILER. This PR cleans up the last few remaining uses of USE_GRAPH_EXECUTOR_DEBUG. * formatting * Update CMakeLists.txt Co-authored-by: Cody Yu <[email protected]> Co-authored-by: Cody Yu <[email protected]> commit d1f19c470c16a1ca87c67fd93f30dd59e16bbec1 Author: Lily Orth-Smith <[email protected]> Date: Mon Aug 23 20:12:07 2021 -0700 Add LowerTEPass, and convert calls to LowerTE to application of LowerTEPass (#8802) * Initial commit Initial stab at IRModule -> LoweredModule conversion func, notes Add external_mods and main_func_info to conversion funcs MTest lowered module to ir module fix problem with conversion funcs + print stmts Add LowerTE pass Add pLowerTEPass AAdd LowerTEPass to graph_executor_codegen.cc Use LowerTEPass instead of LowerTe in graph_executor_codegen.cc Code cleanup Add docs, more cleanup Formatting * Fix bad rebase * Address 1st round of comments * Use tir kTarget instead of relay one * Change target string to Target obj * removing target string causing issues * Fix typos * Revert target str -> target obj changes * Don't use Update : IRModule because it is broken * Fix check * flaky test? * lint commit 356879d4c35eba5d561665003e38a553cd410ff3 Author: Christopher Sidebottom <[email protected]> Date: Tue Aug 24 01:08:18 2021 +0100 Use CTest for C++ tests (#8809) By using the `gtest_discover_tests` CMake macro the CPP and CRT tests can be configured to build binaries with a single test runner each. Once CTest has information about tests it can be used in IDE extensions such as [CMake Test Explorer](https://marketplace.visualstudio.com/items?itemName=fredericbonnet.cmake-test-adapter). `ctest` can also run tests in parallel using the `-j` flag, which could be interesting in future. commit 607bb9492a3fc05451ba58d0bc538f0c3ef7d7e1 Author: Lunderberg <[email protected]> Date: Mon Aug 23 17:11:30 2021 -0500 [Vulkan] Remote target.h #include (#8813) Was added in #8127, should have been removed in #8171 along with the rest of the references outside of libtvm_runtime.so. This didn't impact the Vulkan+g++ builds, because no symbols were accessed outside of the runtime library. However, it broke the Vulkan+Windows builds, which expected symbols due to the `__declspec(dllexport)` defintion of `TVM_DLL` on MSVC (see #8805). This wasn't caught by the CI build on Windows, because it doesn't perform the Vulkan build. commit aafc2d5efd2ab2ea914da6c291e82a18f5ad6e14 Author: Christopher Sidebottom <[email protected]> Date: Mon Aug 23 20:53:42 2021 +0100 Remove duplicated PackedFunc C++ test (#8812) I came across this file whilst looking at the C++ tests and realised it's a duplicate of the PackedFunc tests which doesn't get invoked. ``` $ diff -u tests/cpp/contrib/bnns.cc tests/cpp/packed_func_test.cc --- tests/cpp/contrib/bnns.cc 2021-07-30 12:59:33.830443830 +0000 +++ tests/cpp/packed_func_test.cc 2021-08-23 12:47:43.193708421 +0000 @@ -17,6 +17,13 @@ * under the License. */ +#include <dmlc/logging.h> +#include <gtest/gtest.h> +#include <tvm/runtime/packed_func.h> +#include <tvm/runtime/registry.h> +#include <tvm/tir/expr.h> +#include <tvm/tir/transform.h> + TEST(PackedFunc, Basic) { using namespace tvm; using namespace tvm::tir; ``` commit 10fca9c620f93d912a2c1ac28c1859896c20d436 Author: Gustavo Romero <[email protected]> Date: Mon Aug 23 06:42:52 2021 -0300 [microTVM] Fix platform name for qemu_x86 in Zephyr AOT tests (#8762) Currently two Zephyr AOT tests (test_tflite and test_qemu_make_fail) are not running when qemu_x86 platform is selected because the platform name is wrongly listed as 'host' in the match list for not skipping these tests. This commit fixes it. Signed-off-by: Gustavo Romero <[email protected]> commit 0a1eae87fd551a5fb67c739eabb45226fa870af1 Author: Hongyi Jin <[email protected]> Date: Mon Aug 23 13:30:08 2021 +0800 [TensorIR][M2a] Reorder (#8767) This PR is part of the TensorIR upstreaming effort (#7527), which adds a schedule primitive: reorder. Co-authored-by: Siyuan Feng <[email protected]> Co-authored-by: Bohan Hou <[email protected]> Co-authored-by: Ruihang Lai <[email protected]> Co-authored-by: Wuwei Lin <[email protected]> Co-authored-by: Junru Shao <[email protected]> commit d6d6367b0105bc5f155a963ea11e59c7913a2cf0 Author: Gustavo Romero <[email protected]> Date: Sun Aug 22 02:13:08 2021 -0300 Fix typos (#8787) Fix a couple of typos in comments about the IR/AST node reflection code and a typo in a comment about the main member of the TVMModule struct. Signed-off-by: Gustavo Romero <[email protected]> commit fc9f5823e826a32f4ce5f620c39a2d9b9ed01d80 Author: wrongtest <[email protected]> Date: Sun Aug 22 06:10:18 2021 +0800 [TIR] Support fold constants in specialize process (#8803) * support fold constants in specialize * replace Substitue() with VisitExpr() in specializer. commit 4b9881ec50008bc14fc1ae7805413544cf962011 Author: Yuan-Chuan-YUE <[email protected]> Date: Sun Aug 22 05:42:21 2021 +0800 [CODEGEN][OpenCL]: fix tir.erf codegen to opencl directly (#8756) * register tir.erf to lower opencl directly * add opencl codegen unit test * change erf opencl codegen unit test for checking there is erf in the source not erff commit c6f62aafc91e2600ed7772597fd4238c924c2a1b Author: Chris Sullivan <[email protected]> Date: Fri Aug 20 16:08:07 2021 -0700 [Texture support][Part 1] TIR lowering and OpenCL support (#7686) * Add support for kTexture storage rank. * Add scaffolding for texture_flatten pass. * Add scaffolding for texture allocation. * Implement 2d texture flattening to builtin tir.text2d_alloca. * Lower BufferStore/Load to builtin texture store/load. * Add vectorizable attribure to texture load and store. * Support auto-vectorization on the innermost (RGBA) axis. * Add read/write_imagef opencl codegen for builtin texture load/store. * Add TextureType support. * Add InferTextureAccess pass to deduce __read_only and __write_only access qualifiers for texture vars. Also refactor use of restrict keyword to be var dependent. * Implement texture allocation as external function in TIR lowering. * Remove commented lines. * Add nd->2d texture flattening. * Bug fixes in opencl codegen (row<>col, access quals.) * Improve texture codegen by explicitly allocating local vector for the texture load. Also support indexing individual elements of the RGBA vector. * Remove automatic vectorization code as it is no longer needed. * Improve SSA local use when storing texture read to scalar buffer. * Define texture flattening convention such that the outer Nd-1 axes are stored as rows, and the last axis is stored as columns. * Add tir lowering and opencl codegen support for float16 textures. * Disable SSA when texture load is immediately casted. * Allow RGBA extent to be of length 1. * Add pass to forward externally allocated textures in place of textures realized from cache_read. Fix to better follow indexing spec. * Add buffer_common.h to house buffer offset simplification routines. * More refactor and clean up in texture lowering. * Add IsTextureType to tir and allow buffer var type annotation to be TextureType in addition to PointerType. * Bug fix in texture access qualifier inference pass * Step toward handling external texture buffer forwarding when external buffer is not stored directly to cache_read realized buffer. For example when it is conditionally stored via an IfThenElse node when padding is used. * [Part 2/3] Support texture:weight lowering convention for externally provided texture buffers. Need to propagate this to allocated textures when cache_read(texture) is used for weights. * Bug fix in texture access qualifier inference pass * Tighten constraint on external buffer forwarding -- cache_read(texture) cancellation -- to avoid incorrect programs. Currently only forward through if_then_else node and direct external loads. For if_then_else, still need proper analysis of structural equality between buffers and access patterns to determine if an external buffer can replace the texture buffer realized via cache_read. * Use texture lowering convention from texture runtime util. * Use updated texture lowering utilities * Use inherited visitor overloads in texture flattener. * Add check in codegen for float/half until read/write_image codegen supports other types. * Rename tir texture builtins * Remove codegen and tir runtime dependence on for TVMBackendAlloc/FreeTexture. * Dispatch texture allocas via target specialized tir.tvm_call_packed * Remove kTexture scope and use kGlobal with texture tag. * Remove TextureType. * Remove TextureType from OpenCL codegen. * Remove TextureType from TIR lowering. * Remove dependency on MergeMulMod. * Revert "Add buffer_common.h to house buffer offset simplification routines." This reverts commit 027628259229aaee051dbf1dfbed4e63ef820544. * Prune include list * Add more documentation to texture flattening. * Add TextureFlatten transform to refactored tvm lower API. * Apply clang formatting. * Blacken python APIs. * Apply cpplint changes. * Attempt to extract storage scope from pointer scope. * Remove ExternalBufferForwarding (cache_read cancellation) for now. * Apply MyPy. * Clang format * Only visit RealizeBuffer body for texture storage. * Fix bad merge. * Utilize OpenCL preprocessor to switch between sampler-less and codegen provided sampler for texture reads depending on whether the opencl runtime is 2.0 compliant. * Add texture codegen test example. * Refactor tests to use pytest parameterization. Blacken tests. * Respond to CRs. commit 18a2ee16036a4d70d7cfc572fe9807e6e2a70eda Author: Euntaik <[email protected]> Date: Sat Aug 21 07:29:43 2021 +0900 [Frontend][TFLite] Implement fake quant (#8780) * [Frontend][TFLite] Implement fake quant * remove unused variable * fix linting errors * add more tests * use pytest parametrize instead of a separate function commit d722c109ee8ede4ef88f53a414385f72756928be Author: Yuanjing Shi <[email protected]> Date: Fri Aug 20 07:39:10 2021 -0700 [CONTRIB] Allow customized initializer in PopenPool (#8789) commit e691c7f83892d7242d0992c78cec1e2f8953a9e3 Author: Siyuan Feng <[email protected]> Date: Fri Aug 20 15:49:31 2021 +0800 [TIR] Fix buffer scope in structural equal (#8768) * fix buffer scope in structual equal * make global equal to empty commit 7f237dd0de9880881a1a6d696e093224821a116c Author: Andrey Malyshev <[email protected]> Date: Fri Aug 20 08:59:08 2021 +0300 Extend tune_relay_x86 tutorial to measure default and kernel level tune (#8794) commit 36ea17a0d15ce44fdd5d758de2717d75eac4a97e Author: Lunderberg <[email protected]> Date: Thu Aug 19 20:33:43 2021 -0500 [Docker][Vulkan] Allow Vulkan GPU access in docker container. (#8784) - The environment variable NVIDIA_DRIVER_CAPABILITIES must include "graphics" in order to expose Vulkan drivers to the container. This is added both to Dockerfile.ci_gpu for future image builds, and to docker/bash.sh for compatibility with current images. - The configuration files needed by the vulkan launcher and glvnd must be exposed to the container. These are only included in `docker/bash.sh`, as they may vary by host and so cannot be baked into the image. commit 0d3e2335654fcdee731ceac57852d00482472903 Author: Lunderberg <[email protected]> Date: Thu Aug 19 20:32:38 2021 -0500 [UnitTest][Flaky] Increased tolerance on onnx test_forward::test_aten (#8798) Default tolerance had about 2-3% failure rate (8/300 iterations), and caused failures on unrelated PRs (e.g. https://ci.tlcpack.ai/blue/organizations/jenkins/tvm/detail/PR-8784/1/pipeline#step-485-log-1156). New limit of `atol=5e-7` chosen to be above the maximum delta of 3.5e-7 observed in 300 iterations. commit 9697bfd6847a78145ffe66640a96bc157ef365b5 Author: Leandro Nunes <[email protected]> Date: Fri Aug 20 02:32:25 2021 +0100 Add synr==0.3.0 dependency for Docker images and Python dependency. (#8801) - PR #8776 removed `synr` as a dependency to be installed in the Docker images, making the images to need manual intervention so that we could run tests. - Thir PR reverts synr (with current constraint as observed in tests/scripts/task_ci_setup.sh) to be part of the Docker image. commit 1936609200913985f32eae42feb271b65132bd42 Author: Anastasia Stulova <[email protected]> Date: Thu Aug 19 17:42:09 2021 +0100 [Android][RPC] Fix Vulkan runtime support. (#8791) Update Android RPC app to reflect the new Vulkan source code tree structure. commit e1a0ea1eb5fe733f73aa098cb67a881ed1a6786d Author: Gustavo Romero <[email protected]> Date: Thu Aug 19 10:09:01 2021 -0300 [microTVM][RVM] Fix base-box-tool command in README.md (#8613) This commit fixes the platform argument order for base-box-tool.py 'test' command in the documentation about the RVM. Currently the example in documentation places <platform> before option [--test-device-serial=<serial>], whilst the correct order is after all the options, so trying to use the 'test' command arguments in the order as suggested by the documentation will not work. This commit also fixes a typo (inovke -> invoke). Finally it tweaks a bit the text format: lines with maximum 80 columns, a better diagram format for the dir structure, and a better format for the bash commands. A link is added too for easy access to the "microTVM Reference VM tutorial" found in tutorials/micro directory. A couple of command examples were also added to the documentation. Signed-off-by: Gustavo Romero <[email protected]> commit 6b7597b56374f2f5e7fd45afdf60b43353b06c03 Author: Leandro Nunes <[email protected]> Date: Thu Aug 19 07:48:11 2021 +0100 [CI] Rev ci-cpu to v0.76 (#8786) - This includes changes up to commit 1a95f9bd0 commit 41879b2552364f094492470a77a3ec0866b30eae Author: Chenfan <[email protected]> Date: Thu Aug 19 11:15:34 2021 +0800 [FIX] Bug fix for batch_matmul parameters mismatch (#8785) commit 87674f9ef76dbc89bac0af52a4e33155f5d4a8f8 Author: Gavin Uberti <[email protected]> Date: Thu Aug 19 00:03:26 2021 +0100 [microTVM] Project API Arduino support (#8708) * ProjectAPI Arduino support * Compile and run integration tests * Add support for other Arduino boards * Unit tests for project generation * AOT support * Arduino RPC server * Incorporate ProjectAPI changes Add Arduino tests to CI * Copyright notices * Fix Andrew's PR comments * Additional PR comments PR comments and Python 3.6 support Linting fix Re-add test onnx file Test Arduino cli bug workaround Support new hardware targets Temporary fix for tests Formatting issue Spelling fix Add test case for exact FQBN matching * Add unit tests from apps directory to task_python_microtvm.sh commit 70727435a0827cd6cefe0b52f4407e08b799e39a Author: Christopher Sidebottom <[email protected]> Date: Wed Aug 18 22:57:37 2021 +0100 Remove old AOT Executor code (#8758) * Remove old AOT Executor code This removes the old AOT execution functions that relied on the model descriptor which was removed in https://github.com/apache/tvm/pull/8280. * Remove rogue tvm_model_t from demo app * Remove aot_executor from demo CRT libs commit 3f881ab15fd207490af3f04deb0813429e31e1f4 Author: kueitang <[email protected]> Date: Thu Aug 19 04:49:16 2021 +0800 Expose FTVMInferCorrectLayout Python interface (#8755) Co-authored-by: kueitang <[email protected]> commit 5b2e5044dc239f7e2ff0481efeb6a8441fd20c52 Author: Jaehun Ryu <[email protected]> Date: Thu Aug 19 00:03:51 2021 +0900 Restore License (#8779) commit ab019676a5964fcfaa065877c4c98719ecaabb8a Author: Andrew Reusch <[email protected]> Date: Wed Aug 18 07:38:14 2021 -0700 Rev ci-qemu to v0.08 (#8776) * Remove synr from pip-installed package list * synr is installed by task_ci_setup * rev ci-qemu to 0.08 commit e7748aac40bd4c263882323393ea8896837614a9 Author: Haichen Shen <[email protected]> Date: Tue Aug 17 23:16:54 2021 -0700 [Relay] Extract dataflow matcher data structure into header (#8774) * extract dataflow matcher data structure into a header file * lint * lint commit 843d246a8ad8dbf778b39a716c053de4768fa62b Author: anwang2009 <[email protected]> Date: Tue Aug 17 23:14:13 2021 -0700 Add onnx opset v13 support for softmax, logsoftmax (#8625) * add more support for softmax ops * noop * noop commit da4869e331b6d9a94a3aef103a14b53a80704968 Author: Ruihang Lai <[email protected]> Date: Wed Aug 18 13:48:35 2021 +0800 [TensorIR][M2a] Parallel, Vectorize, Bind & Unroll (#8716) Co-authored-by: Junru Shao <[email protected]> Co-authored-by: Bohan Hou <[email protected]> Co-authored-by: Siyuan Feng <[email protected]> Co-authored-by: Hongyi Jin <[email protected]> Co-authored-by: Wuwei Lin <[email protected]> commit 3181f273f00c47fd07e576cff96e8df6d94d6f79 Author: Andrew Reusch <[email protected]> Date: Tue Aug 17 22:46:45 2021 -0700 skip aot checks when USE_MICRO=OFF (#8772) commit 78dbfcf7992f68748c61925c08a2b94e2e7dd463 Author: anwang2009 <[email protected]> Date: Tue Aug 17 22:46:17 2021 -0700 Add support for QLinearMul ONNX op (#8773) * add qlinearmatmul * noop * mul not matmul * refactor some common qlinear op test code commit c8a6089073ba997375b318c80674cc906df19e53 Author: Mark Shields <[email protected]> Date: Tue Aug 17 16:41:42 2021 -0700 [Relay] Refactor Interpreter to treat lowering as IRModule->IRModule rewrite. (#8597) * This continues the work outlined in the RFC https://discuss.tvm.apache.org/t/rfc-relay-tecompiler-rewrite-existing-compile-engine-to-match-updated-compiler-flow/9233 This gets about halfway there for the Interpreter: * Remove direct access to TECompiler from interpreter, and instead call tec::LowerTEExpr when 'preparing' a module and expression for evaluation. * Make clear there's no phase distinction between create_interpreter and evaluate on the Python side -- both must be prepared together as a single IRModule. * But in return make sure the result of evaluate on the Python side is a packed func ready to directly apply 'simple' arguments to an already interpreted closure. * The interpreter builds and caches primitive TIR functions (and their corresponding dynamic shape functions) as packed funcs as they are encountered. * Cleanup uses of interpreter for constant folding on the C++ side. Future work: * Fold LoweredModule into IRModule so tec::LowerTEExpr is just another pass. * Get rid of the implicit caching of lowered functions in TECompiler. * Make calling convention from Relay to TIR explicit, and remove all the function attribute hackery currently needed so the interpreter can correctly invoke lowered functions as it encounters them. * Make TECompiler private. Though could do this now it will make migrating the VM and AOT uses of CompilerEngine harder. Force a gc between sphinx-gallery items to reclaim GPU memory. (#8722) GPU memory is only released once the PackedFunc for evaling the model is gced by Python. In CI we're noticing intermittent 'CUDA: Out of memory' failures while processing the tutorials, and tracing showed there was no gc happening between items. Not confident this will solve the problem but worth a try. * Get rid of logs spam. commit 4216cd7f9d06a7feb712e78892061721410cc224 Author: Leandro Nunes <[email protected]> Date: Tue Aug 17 23:48:26 2021 +0100 Add params.* to Jenkins file parameters (#8771) * Prefix all parameters with params.* so that it checks whether parameters exist before using them * This is a follow-up fix on #8721 so that existing PRs work without being re-triggered manually twice commit 4cee61a563980239fb4ef7a8548e09424b0cd3e1 Author: Jason <[email protected]> Date: Wed Aug 18 05:25:39 2021 +0800 Add PaddlePaddle dependency in docker file (#8742) commit cddb0f6d566e738a6b45be662a498fd8ac69dab7 Author: Andrew Reusch <[email protected]> Date: Tue Aug 17 13:36:58 2021 -0700 Update QemuTransport#write() to match new write API contract. (#8761) * suspect this should fix #8278 commit e7534209429be47f18502e514cc7ccfdf20fd437 Author: Gavin Uberti <[email protected]> Date: Tue Aug 17 19:22:57 2021 +0100 Fix ci-qemu Arduino install dir (#8766) commit 26f7c0d7c1959bc1fe37915abe26db5c080dbb57 Author: Jaehun Ryu <[email protected]> Date: Wed Aug 18 02:14:05 2021 +0900 [Relay testing] densenet implementation fix (#8704) * Fixed testing densenet bug * Fixed code format using black commit 2793113880bf5b1f38dc6ff527c3c2f24c78c519 Author: Qiang Zhang <[email protected]> Date: Wed Aug 18 01:13:50 2021 +0800 [TIR] Change Integer Implicit Conversion Rule to C Standard Way (#8733) commit dbf9ce52d4e4fbcd0c1d44f6c8d05c7774dffe63 Author: Lunderberg <[email protected]> Date: Tue Aug 17 10:30:27 2021 -0500 [UnitTests] Require cached fixtures to be copy-able, with opt-in. (#8451) * [UnitTests] Require cached fixtures to be copy-able, with opt-in. Previously, any class that doesn't raise a TypeError in copy.deepcopy could be used as a return value in a @tvm.testing.fixture. This has the possibility of incorrectly copying classes inherit the default object.__reduce__ implementation. Therefore, only classes that explicitly implement copy functionality (e.g. __deepcopy__ or __getstate__/__setstate__), or that are explicitly listed in tvm.testing._fixture_cache are allowed to be cached. * [UnitTests] Added TestCachedFixtureIsCopy Verifies that tvm.testing.fixture caching returns copy of object, not the original object. * [UnitTests] Correct parametrization of cudnn target. Previous checks for enabled runtimes were based only on the target kind. CuDNN is the same target kind as "cuda", and therefore needs special handling. * Change test on uncacheable to check for explicit TypeError commit f2c5272bdfb8d09c6dadde475d2a5420b28500d6 Author: Leandro Nunes <[email protected]> Date: Tue Aug 17 13:42:18 2021 +0100 Enable custom images to be set in TVM Jenkinsfile (#8721) * This work is needed to enable automatic testing of our newly built Docker images as part of CI * The default value is set by variables in the same Jenkinsfile and are used when no custom values are provided commit c28c86ae6b187c1afd12c86be3b65913345a49f5 Author: Thierry Moreau <[email protected]> Date: Tue Aug 17 00:24:30 2021 -0700 [Community] @Mousius -> Reviewer (#8764) * adding Mousius to reviewers, name update for Siva Reddy * making Siva's name consistent commit f4ba8fc6276f984b311e20f8eb39888f424b373a Author: Thierry Moreau <[email protected]> Date: Tue Aug 17 00:19:47 2021 -0700 adding gromero as a reviewer (#8765) commit e02ea7430589fa345ab4472f02511ae8d6c08dea Author: Lily Orth-Smith <[email protected]> Date: Mon Aug 16 22:44:59 2021 -0700 Add DictAttrs to IRModule and refactor DictAttrs utility functions (#8750) * Add DictAttrs to IRModuleNode Move GetAttrs to be a member of DictAttrs Generalize WithAttrs to work with IRModule and move to attrs.h Change func->GetAttr to func->attrs.GetAttr * lint * Fix documentation * fix typo * Another typo! * Revert GetAttrs to ->attrs.GetAttrs change * Didn't mean to revert these * Revert a few more things * Add GetAttrs to IRModuleNode commit cfa498c0376622afe4e0f7344f0104dc97d7e876 Author: Mark Shields <[email protected]> Date: Mon Aug 16 22:44:30 2021 -0700 Make from_tensorflow.py more GPU memory friendly. (#8763) * Make from_tensorflow.py more GPU memory friendly. Sphinx-gallery runs everything in a single process. There doesn't appear to be any easy way to force Tensorflow to return memory other than terminating the process. This at least gives us a little more wiggle room. * Also deploy_sparse.py. Should probably also be done to tensorflow.rst. commit 2008d6279756a1dd74cf41eec090e98636215e17 Author: Euntaik <[email protected]> Date: Tue Aug 17 10:52:59 2021 +0900 add support for half_pixel_centers in resize (#8689) commit d02e50cbaf05de6955ff7bd7471dadc2a997fe15 Author: Yuanjing Shi <[email protected]> Date: Mon Aug 16 17:57:00 2021 -0700 [AutoScheduler][FIX] Fix exception handling in measure.py (#8754) * fix exception handling * fix linting * stringify the exception from MapResult * use repr instead if str commit e334942db002019979438971440d33ece16585a3 Author: Natan Kaminsky <[email protected]> Date: Mon Aug 16 15:35:03 2021 +0300 Fix builtin_fp16.h path according to: https://discuss.tvm.apache.org/… (#8705) commit 3e0c461f26b5174759dbb932986006f73a94a816 Author: Christopher Sidebottom <[email protected]> Date: Mon Aug 16 09:15:26 2021 +0100 Allow Linker script files to be committed (#8745) This is a source file type needed for https://github.com/apache/tvm/pull/8744 Co-authored-by: Grant Watson <[email protected]> Co-authored-by: Grant Watson <[email protected]> commit 1d087920a0f1a9ea065c2c7992a15235170f5c75 Author: Gustavo Romero <[email protected]> Date: Mon Aug 16 05:09:01 2021 -0300 [microTVM] Fix warnings on Zephyr tests (#8740) Fix the following warning message on Zephyr tests: DeprecationWarning: Please use input parameter mod (tvm.IRModule) instead of deprecated parameter mod (tvm.relay.function.Function) Signd-off-by: Gustavo Romero <[email protected]> commit cddd3485dd58c1de98c1279a465897a36782befe Author: Tianqi Zhang (张天启) <[email protected]> Date: Mon Aug 16 14:04:08 2021 +0800 [Fix][TOPI] remove wrong fix in x86's dense_nopack operator (#8687) commit 2e247825be4119882e6c5c691ccba69a2ad33836 Author: CircleSpin <[email protected]> Date: Mon Aug 16 02:02:01 2021 -0400 [Onnx Operators] Celu (#8741) * complete celu op * forgot to add test * change order in convert_map, remove comment, delete import hiccup Co-authored-by: CircleSpin <[email protected]> commit c4c31de90850d67d3cf1f16b28e92ce431799c8b Author: Andrew Reusch <[email protected]> Date: Sun Aug 15 21:48:59 2021 -0700 Install rust in ci-lint so cargo fmt can move to lint stage. (#8727) commit 1a95f9bd0e84c4f3ebcbd668f26631fd74e8f28f Author: xiaolong18 <[email protected]> Date: Mon Aug 16 04:43:52 2021 +0800 [TF] Support TensorFlow < 1.13 for test_sparse_add (#8647) commit 49224cb8b81b7e0b857935191019981b22787be3 Author: Andrey Malyshev <[email protected]> Date: Sun Aug 15 23:33:04 2021 +0300 Fix use of fallback AutoTVM knobs in default scheduling (#8707) * Fix use of fallback AutoTVM knobs Previously knob values depended on order of explicit cfg update and cfg.define_split calls in fallback mode * Add test for define_split with fallback defined values commit 994a15164cdac97ca610a3b6af4c157e0e465569 Author: AndrewZhaoLuo <[email protected]> Date: Sun Aug 15 09:40:04 2021 -0700 update docs (#8736) Co-authored-by: Andrew Zhao Luo <[email protected]> commit e12ddcafd74cc10cef343fc39a0c6a892a431650 Author: Alperen Bag <[email protected]> Date: Sun Aug 15 07:01:08 2021 +0300 [FRONTEND][PYTORCH] Support fo nn.SiLU added (#8753) commit 3ebd353a7f526cdf21293055a00eeeabe6efae1f Author: Hua Jiang <[email protected]> Date: Sat Aug 14 17:48:50 2021 -0700 [VTA] Make vta graph_pack compatible with latest TVM, and bring back object detection tutorials. (#8731) * [VTA] Make vta graph_pack compatible with latest TVM, and bring back object detection tutorials. * remove deploy_detection.py. * move out deploy_detection.py from legacy folder. * fix build error. commit 901dee54e42d6393cde6eefcc25964db8e24e41d Author: Lunderberg <[email protected]> Date: Sat Aug 14 19:47:08 2021 -0500 [Vulkan] Check at codegen if the shader is within shared memory limits. (#8746) Previously, shaders that do not respect device limits for shared memory could result in segfaults that occur during the call to `vkCreateComputePipelines`. commit 170add2f2fbc3507d1cbfc77ff95312dfe3a1ca9 Author: Robert Kimball <[email protected]> Date: Sat Aug 14 12:00:46 2021 -0700 Add parameter to allow caller to supply a Runner (#8747) * Add parameter to allow caller to supply a Runner * Add unit test for passing in runner to graph tuner commit f5661f4e78cae5f62bc172559040576cd2fd20f2 Author: Lunderberg <[email protected]> Date: Fri Aug 13 13:27:14 2021 -0500 [Docs] Moved the generated tutorials folders into a _staging folder. (#8735) * [Docs] Moved the generated tutorials folders into a _staging folder. Previously, reorganization or renaming of tutorials could cause documentation tests to fail in CI. The CI checks out the version to be tested, which may still have generated documents in `docs/tutorials` and `docs/vta/tutorials`. If a PR moves these to different folders, then they show up as duplicate `*.rst` files, resulting in sphinx warnings. This commit makes a `docs/_staging` folder in which sphinx is run. All tutorials are generated within this folder, and the entire folder can be deleted with `make clean`. As a result, it is safe to reorganize the tutorial without impacting CI. * Updates based on reviews. * Changed graph_runtime references in deploy_classification.py to graph_executor * Removed unnecessary graph_runtime import from tune_alu_vta.py commit a06863ac9406c027b29f346fb6177268f612912d Author: Wuwei Lin <[email protected]> Date: Fri Aug 13 02:54:34 2021 -0400 [TensorIR][M2a] Storage Align (#8693) This PR is part of the TensorIR upstreaming effort (#7527), which adds the one schedule primitive storage_align. Co-authored-by: Siyuan Feng <[email protected]> Co-authored-by: Bohan Hou <[email protected]> Co-authored-by: Ruihang Lai <[email protected]> Co-authored-by: Hongyi Jin <[email protected]> Co-authored-by: Junru Shao <[email protected]> commit ccc09fa7cda5a3975f064d00725aaa619d0c118c Author: Tristan Konolige <[email protected]> Date: Thu Aug 12 21:36:16 2021 -0700 [TVMC] Switch profile flag to use new profiler (#8710) commit 395b308acf39bba20d4f5aceedaa9d31406feb2c Author: Jiawei Liu <[email protected]> Date: Fri Aug 13 12:36:04 2021 +0800 enhance tir signed-unsigned cast (#8706) commit 7cf7adff444b88a9a661219b7cfc6e9fb61dd98f Author: Valery Chernov <[email protected]> Date: Fri Aug 13 04:54:43 2021 +0300 [Torch] chunk and unsafe chunk (#8718) * alternative chunk op was implemented in pytorch frontend. aten::unsafe_chunk was added to op map in pytorch frontend * chunk was replaced by new one in pytorch frontend. it is faster in 2.5 times Co-authored-by: Valery Chernov <[email protected]> commit 8843153bf32129fe92acdb509c8d76ac0eef8e60 Author: Lunderberg <[email protected]> Date: Thu Aug 12 17:40:07 2021 -0500 [UnitTest] Updated tolerances to avoid flaky unit test. (#8723) * [UnitTest] Updated tolerances to avoid flaky unit test. The result was correct, but the atol was just small enough to trigger a CI error for a value that was close to zero in an unrelated PR at #8670. https://ci.tlcpack.ai/blue/organizations/jenkins/tvm/detail/PR-8670/16/pipeline/#step-236-log-1703 * Also updated 32-bit version of test_conv2d_nchw commit 3e37bb5e677fbf6a20a8aaaea4792aa642a1cead Author: lhutton1 <[email protected]> Date: Thu Aug 12 21:59:40 2021 +0100 [CI] Add Arm Compute Library to Arm CI unit test pipeline (#8734) commit 4dd7f6806f05bbc9c33d68d68493113534a34b12 Author: Yuanjing Shi <[email protected]> Date: Thu Aug 12 13:35:04 2021 -0700 [TIR] Use PopenPool instead of multiprocessing.pool (#8492) Co-authored-by: Wuwei Lin <[email protected]> commit 66ac4705aae9bec92047920c8a9273693cd48c44 Author: masahi <[email protected]> Date: Fri Aug 13 02:15:13 2021 +0900 [Relay] Dense alter layout fixed for packed input (#8669) * clean up typerel * add layout transform when input is 3D * add test * update doc to clarify that only 2D input data is supported * add weight_layout attribute in dense * remove explicit layout transform from dense_alter_op.py * Add DensePackInferCorrectLayout to insert layout transform * relax type rel * revert type rel relax and add check on dim * introduce DensePackAttrs to avoid breaking dense op * try fixing arm compute lib test * Update tests/python/contrib/test_arm_compute_lib/test_dense.py Co-authored-by: lhutton1 <[email protected]> * formatting Co-authored-by: lhutton1 <[email protected]> commit 5e20ef968c99d8294e5d185051e7421879f36abc Author: Mehrdad Hessar <[email protected]> Date: Thu Aug 12 09:58:35 2021 -0700 Remove qemu installation from Zephyr RVM (#8701) commit 76a7fa9d92e99b544a6e34731cdc686a22891380 Author: Christopher Sidebottom <[email protected]> Date: Thu Aug 12 13:35:43 2021 +0100 Convert AOT to TECompiler (#8697) * Convert AOT to TECompiler This removes the dependency on "compile_engine.h" from aot_executor_codegen.cc. This required a few changes to how AOT was operating: * AOT run_model is now based on the post lowering main_module * AOTOnDemandAllocator is ran twice to ensure SIDs are updated post-lowering * Moved to using tec::UpdateFunctionMetadata Tests are passing, but would appreciate other validation :smile_cat: * Clarify reasoning behind replanning memory later * Use main_func_info rather than bespoke logic in AOT This moves from using the bespoke AOT UpdateMainWorkspaceSize to the LoweredModule main_func_info property to unify with Graph executor codegen. commit e9380e47f0b97c0b98b97f082b075eaa1308038b Author: Christopher Sidebottom <[email protected]> Date: Thu Aug 12 09:51:24 2021 +0100 Refactor AOT Test Utils parameters into object (#8650) * Refactor AOT Test Utils parameters into object `compile_and_run` was getting quite complicated to understand as well as being mostly duplicated by `comile_and_run_multiple_models`. This patch pulls out some common parameters into a data class `AOTTestNetwork` which makes it clearer what each parameter is doing and provides documentation. * Rename Network -> Model and sizebytes -> size_bytes commit 9586ee2c2e7d902ca366eb18c85cd0d6515426fa Author: Mehrdad Hessar <[email protected]> Date: Wed Aug 11 19:43:59 2021 -0700 increase atol for float32 (#8712) commit 2e6356854763e27ce03a4fbc79b6235ae7397317 Author: Lunderberg <[email protected]> Date: Wed Aug 11 21:43:40 2021 -0500 [Docs][UnitTest] Updated target parametrization documentation (#8724) * [Docs][UnitTest] Updated target parametrization documentation The intended audience are developers writing unit tests, or debugging unit tests that have failed. Therefore, moving the recommended style to the top of the section, and the implementation details to the bottom. * Documentation updates as recommended by tkonolige commit 722efc5dad83e6f1312f372e20a65254a64c6d5b Author: Lunderberg <[email protected]> Date: Wed Aug 11 18:31:21 2021 -0500 [Docker] Refactor/clean-up of docker/bash.sh (#8670) * [Docker] Refactor/clean-up of docker/bash.sh - Added detailed help message, displayed using `-h` or `--help`. - Optional flags handled using `getopt`, can now occur in any order. - `--mount` flag may occur more than once. - Switched from short arguments to docker-run to long arguments (e.g. `--volume` instead of `-v`). Short arguments are good shortcuts for interactive work, but can be more difficult to read in longer scripts. - Mount the `.tvm_test_data` folder, to avoid re-downloading test data already available in the host environment. * [Docker] docker/bash.sh CI fix Dash-prefixed arguments as part of the command now require prefixing with -- to separate them from arguments intended for docker/bash.sh * [Docker] docker/bash.sh, consistent quoting * [Docker] Added --repo-mount-point for docker/bash.sh * [Docker] Updated command-line parsing of docker/bash.sh - Maintained previous behavior, any unrecognized flags after the docker/bash.sh are part of the command, no -- is needed. (e.g. docker/bash.sh ci_gpu make -j2) - Reverted changes to Jenskinsfile to add a --, no longer needed. * [Docker] Fixed multi-argument commands * [Docker] docker/bash.sh check permissions before mounting ~/.tvm_test_data * [Docker] Consistent workplace directory in docker/bash.sh for Jenkins Some locations in the CI perform build commands outside of the build steps (e.g. tests/scripts/task_ci_setup.sh#L38), and cmake doesn't like it if the build directory changes. These should probably be moved into the build steps of the CI, and be packed in tvm_multilib in the Jenkinsfile, but for the meantime maintaining a consistent /workspace directory on all CI nodes allows cmake to run. * [Docker] Updated bash.sh for MacOS compatibility MacOS has an older version of bash that handles arrays slightly differently. All instances of array expansion `"${ARRAY[@]}"` should instead be written as `${ARRAY[@]+"${ARRAY[@]}"}`. Otherwise, `set -u` will erroneously complain about an undefined variable. See https://stackoverflow.com/a/61551944 for details. Even though this is an older version of bash (observed in version 3.2.57), this is the last major version available under GPLv2 and is therefore the default version on MacOSX. At some point, the `docker/bash.sh` could be migrated to python for ease of maintenance/testing. commit e88fe7726f58c415ac825b3a39ec0e034d3fa009 Author: Mehrdad Hessar <[email protected]> Date: Wed Aug 11 16:24:54 2021 -0700 [microTVM] Zephyr Test Refactor (#8713) * refactor host to qemu * remove unused variables * remove skip-build arg * fix microtvm test script commit 2bc0ecef8dd0f822329dc18773bd83d828206042 Author: Mark Shields <[email protected]> Date: Wed Aug 11 14:07:24 2021 -0700 Force a gc between sphinx-gallery items to reclaim GPU memory. (#8722) GPU memory is only released once the PackedFunc for evaling the model is gced by Python. In CI we're noticing intermittent 'CUDA: Out of memory' failures while processing the tutorials, and tracing showed there was no gc happening between items. Not confident this will solve the problem but worth a try. commit 09b989deb77cfb40f468c2566d1f40227af44bf7 Author: Jared Roesch <[email protected]> Date: Wed Aug 11 05:57:52 2021 -0700 [Rust][Fix] Memory leak (#8714) * Fix obvious memory leak in function.rs * Update object point…
…e#8722) GPU memory is only released once the PackedFunc for evaling the model is gced by Python. In CI we're noticing intermittent 'CUDA: Out of memory' failures while processing the tutorials, and tracing showed there was no gc happening between items. Not confident this will solve the problem but worth a try.
…rewrite. (apache#8597) * This continues the work outlined in the RFC https://discuss.tvm.apache.org/t/rfc-relay-tecompiler-rewrite-existing-compile-engine-to-match-updated-compiler-flow/9233 This gets about halfway there for the Interpreter: * Remove direct access to TECompiler from interpreter, and instead call tec::LowerTEExpr when 'preparing' a module and expression for evaluation. * Make clear there's no phase distinction between create_interpreter and evaluate on the Python side -- both must be prepared together as a single IRModule. * But in return make sure the result of evaluate on the Python side is a packed func ready to directly apply 'simple' arguments to an already interpreted closure. * The interpreter builds and caches primitive TIR functions (and their corresponding dynamic shape functions) as packed funcs as they are encountered. * Cleanup uses of interpreter for constant folding on the C++ side. Future work: * Fold LoweredModule into IRModule so tec::LowerTEExpr is just another pass. * Get rid of the implicit caching of lowered functions in TECompiler. * Make calling convention from Relay to TIR explicit, and remove all the function attribute hackery currently needed so the interpreter can correctly invoke lowered functions as it encounters them. * Make TECompiler private. Though could do this now it will make migrating the VM and AOT uses of CompilerEngine harder. Force a gc between sphinx-gallery items to reclaim GPU memory. (apache#8722) GPU memory is only released once the PackedFunc for evaling the model is gced by Python. In CI we're noticing intermittent 'CUDA: Out of memory' failures while processing the tutorials, and tracing showed there was no gc happening between items. Not confident this will solve the problem but worth a try. * Get rid of logs spam.
…e#8722) GPU memory is only released once the PackedFunc for evaling the model is gced by Python. In CI we're noticing intermittent 'CUDA: Out of memory' failures while processing the tutorials, and tracing showed there was no gc happening between items. Not confident this will solve the problem but worth a try.
…rewrite. (apache#8597) * This continues the work outlined in the RFC https://discuss.tvm.apache.org/t/rfc-relay-tecompiler-rewrite-existing-compile-engine-to-match-updated-compiler-flow/9233 This gets about halfway there for the Interpreter: * Remove direct access to TECompiler from interpreter, and instead call tec::LowerTEExpr when 'preparing' a module and expression for evaluation. * Make clear there's no phase distinction between create_interpreter and evaluate on the Python side -- both must be prepared together as a single IRModule. * But in return make sure the result of evaluate on the Python side is a packed func ready to directly apply 'simple' arguments to an already interpreted closure. * The interpreter builds and caches primitive TIR functions (and their corresponding dynamic shape functions) as packed funcs as they are encountered. * Cleanup uses of interpreter for constant folding on the C++ side. Future work: * Fold LoweredModule into IRModule so tec::LowerTEExpr is just another pass. * Get rid of the implicit caching of lowered functions in TECompiler. * Make calling convention from Relay to TIR explicit, and remove all the function attribute hackery currently needed so the interpreter can correctly invoke lowered functions as it encounters them. * Make TECompiler private. Though could do this now it will make migrating the VM and AOT uses of CompilerEngine harder. Force a gc between sphinx-gallery items to reclaim GPU memory. (apache#8722) GPU memory is only released once the PackedFunc for evaling the model is gced by Python. In CI we're noticing intermittent 'CUDA: Out of memory' failures while processing the tutorials, and tracing showed there was no gc happening between items. Not confident this will solve the problem but worth a try. * Get rid of logs spam.
GPU memory is only released once the PackedFunc for evaling the model is gced
by Python. In CI we're noticing intermittent 'CUDA: Out of memory' failures
while processing the tutorials, and tracing showed there was no gc happening
between items. Not confident this will solve the problem but worth a try.