Syncing with Apache/master #6

jinboci · 2020-07-31T09:28:15Z

Description

(Brief description on what this PR is about)
Syncing with Apache/master

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
Code is well-documented:
For user-facing API changes, API doc string has been updated.
For new C++ functions in header files, their functionalities and arguments are documented.
For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
Check the API doc at https://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

Feature1, tests, (and when applicable, API doc)
Feature2, tests, (and when applicable, API doc)

Comments

If this change is a backward incompatible change, why must this change be made.
Interesting edge cases to note here

* fix batch norm when fix_gamma is True * support gradient accumulation for batch norm * mkldnn batchnorm support grad add * unittest for bn * fix bn arg * fix lint * fix mkldnn * fix mkldnn bn * fix grad when fixing gamma * fix naive gpu bn * fix lint * invoke mkldnn and cudnn batchnorm when axis != 1 * backport 18500 * change condition * fix * fix * add mkldnn_off for bn * remove mkldnn_off * recover save_000800.json * cast

* Fix scipy dependency in probability module * Fix copy-paste error * dtype='float32' for digamma and gammaln

) * Add deleting of args aux aux to Partition API Signed-off-by: Serge Panev <[email protected]> * Delete args from Block.params Signed-off-by: Serge Panev <[email protected]> * Fix to use arg/auxdict when optimize_for is called in HybridBlock Signed-off-by: Serge Panev <[email protected]> * Address PR comments Signed-off-by: Serge Panev <[email protected]>

* update footer style * add compiled css of footer styles changes * add same style for footer2 * more fix to the toc

* Add missing args/aux support in optimize_for and deferred inference option Signed-off-by: Serge Panev <[email protected]> * Add input shape_dict, type_dict and stype_dict to optimize_for Signed-off-by: Serge Panev <[email protected]> * Remove warnings for Werror Signed-off-by: Serge Panev <[email protected]> * Address PR comments Signed-off-by: Serge Panev <[email protected]>

CMAKE_CUDA_HOST_COMPILER will be reset if CMAKE_CUDA_COMPILER is not set as of cmake 3.17.3 See https://gitlab.kitware.com/cmake/cmake/-/issues/20826

* Disable test coverage in MKL builds * Enable test parallelization * Set OMP_NUM_THREADS * Fix * Fix unpack_and_init

* Enable GPU Memory profiler tests Previously tests are not run as test_profiler.py was not taken into account on GPU CI runs and some tests were marked for being skipped if run on a CPU-only machine. * Disable broken tests

* Refactor scope functionality in Python API - Remove deprecated metaclass functionality - Remove global state in naming - Switch from threading.local to asyncio compatible contextvars - Stop exposing UUIDs in parameter name * Fix dependencies * Fixes * Fixes * Fix * Fix after merge master

* Add the newest mxnet discuss version. Add d2l.ai * delete [] and insert old version

* add ndarray and boolean indexing for numpy symbol * fix sanity and unit test * ensure consistency between the imperative and symbolic interface * Update python/mxnet/numpy/multiarray.py and add new test Co-authored-by: Leonard Lausen <[email protected]> * Don't rely on indexing_key_expand_implicit_axes for deciding if _npi.advanced_indexing_multiple is applicable * fix sanity Co-authored-by: Leonard Lausen <[email protected]>

… Transpose and Rollaxis (#18707) * support 6+ dims for transpose * test over * reorder code * fix transposeex

…8724)

@szha

* Refactoring of Pooled Storage Manager classes * Adding test for new functionality * Fixing compilation problems which appear for MXNET_USE_CUDA=0 * Fixing compilation problems for WINDOWS and ANDROID * Fixing compilation problems which appear for WINDOWS and __APPLE__ * Fixing lint problems * test_dataloader_context(): Bypassing custom_dev_id pinned mem test on system with GPUs < 2. * Fixing compilation for Android. Elimination of unused includes. * Fixing problems with CPUPinned Storage Manager which appears when MXNET_USE_CUDA = 0 * Removing test_bucketing.py * Imroving CPU_Pinned Pooled Storage Manager case. * Fixing lint problem * The GPU profiling commands calls moved into mutex area * Fixing lint problem * Improved reporting regarding the Storage Manager used. * Fixing lint problem * Trigger CI * Removing some comments, as suggested by @szha * Trigger CI * Trigger CI Co-authored-by: andreii <[email protected]>

Disabling this test for now to unblock other PRs, while I'm looking into it. #18740

This reverts commit 60d0672.

@Retry

* Add sm arch 80 to Makefile * Add TF32 to cuBLAS GEMMs Signed-off-by: Serge Panev <[email protected]> * Add CUDA version guards Signed-off-by: Serge Panev <[email protected]> * Remove useless TF32 for double and old CUDA version Signed-off-by: Serge Panev <[email protected]> * Factorize VERSION_ADJUSTED_TF32_MATH Signed-off-by: Serge Panev <[email protected]> * Add TF32 considerations to test_util.py:check_consistency() * Bypass test_gluon_gpu.py:test_large_models if gmem >32GB * Default tols in assert_almost_equal() now a function of dtype and ctx * Expand types listed by default_tols() * Fix pylint * All with_seed() tests to waitall in teardown * Elevate MXNET_TEST_SEED logging to WARNING * Revert test_gluon_gpu.py:test_rnn_layer to default tols * Fix test_gluon_model_zoo_gpu.py::test_inference and test_operator_gpy.py::test_np_linalg_{solve,tensorinv} * test_numpy_interoperability.py to not fix seed for rest of CI * Further fix to test_np_linalg_tensorinv * Fix test_gluon_data.py:test_dataloader_context when run on 1-GPU system. * Fix test_operator_gpu.py::test_embedding_with_type * Fix test_operator_gpu.py::{test_*convolution_large_c,test_np_linalg_tensorsolve} * Remove unneeded print() from test_numpy_interoperability.py * Unify tol handling of check_consistency() and assert_almost_equal(). Test tweeks. * Add tol handling of assert_almost_equal() with number args * Add tol handling of bool comparisons * Fix test_numpy_op.py::test_np_random_rayleigh * Fix test_operator_gpu.py::test_batchnorm_with_type * Fix test_gluon.py::test_sync_batchnorm in cpu selftest * Improve unittest failure reporting * Add to robustness of test_operator_gpu.py::test_embedding_with_type * Check_consistency() to use equal backward gradients for increased test robustness * Fix test_operator_gpu.py::test_{fully_connected,gemm}. Add default_numeric_eps(). * test_utils.py fix for numeric gradient calc * Reinstate rtol=1e-2 for test_operator.py::test_order * Remove auto-cast of check_consistency() input data to least precise dtype (not needed) * Fix test_operator.py::test_{reciprocol,cbrt,rcbrt}_op * Expand default float64 numeric_eps for test_operator_gpu.py::test_sofmin * Fix segfault-on-error of @Retry decorator. Add test isolation. * assert_almost_equal() to handle a,b scalars * Fix test_operator_gpu.py::test_gluon_{mvn,mvn_v1} race * Fix test_operator_gpu.py::test_flatten_slice_after_conv via scale * Remove test_utils.py:almost_equal_ignore_nan() * Fix sample vs. pop variance issue with test_numpy_op.py::test_npx_batch_norm * Expose test_utils.py:effective_dtype() and use to fix test_operator_gpu.py::test_np_linalg_svd * Fix true_divide int_array / int_scalar -> float_array to honor np_default_dtype * Try test_elemwise_binary_ops serial to avoid pytest worker crash * Fix (log_)softmax backward on empty ndarray * Temporarily log all CI seeds to troubleshoot seed non-determinism * Revert "Temporarily log all CI seeds to troubleshoot seed non-determinism" This reverts commit f60eff2. * Temp log all CI seeds to troubleshoot unwanted seed determinism * Revert "Add sm arch 80 to Makefile" This reverts commit f9306ce. * Same fix of sample vs. pop variance issue, now with test_operator_gpu.py::test_batchnorm * Revert "Temp log all CI seeds to troubleshoot unwanted seed determinism" This reverts commit ff328ef. * Marking test_sparse_dot_grad with garbage_expected after teardown error * Fix flakiness of test_gluon_probability{_v1,_v2}.py::test_gluon_kl{_v1,} * Temp skip of test_aggregate_duplication on gpu * Add seeding to test_{numpy,}_contrib_gluon_data_vision.py. Make created files unique. * Add ndarray module isolation to help debug test_bbox_augmenters worker crash * Marking test_sparse_square_sum serial after pytest worker crash * Fix flakiness of test_gluon_probability{_v1,_v2}.py::test_half_cauchy{_v1,} Co-authored-by: Serge Panev <[email protected]> Co-authored-by: Bart Gawrych <[email protected]>

* enable default large tensor in np * revert cmake change * move test_np_large_array.py to nightly

Replaced by cmake buildsystem as per #16167

New PRs started showing the codecov/project badge again due apparent change in codecov's backend resolving these duplicate options specified in .codecov.yml

* Fix mx.symbol.numpy._Symbol.__deepcopy__ logic error Performed shallow copy instead of deep copy * Test * Fix test

…used code (#18771) * Migrate remaining Dockerfiles to docker-compose.yml - Delete unused Dockerfiles - Delete unused install/*.sh scripts - Consolidate ubuntu_gpu_tensorrt and ubuntu_gpu - Remove deprecated logic in ci/build.py (no longer needed with docker-compose) - Remove ci/docker_cache.py (no longer needed with docker-compose) * Fix * Fix * Fix ubuntu_cpu_jekyll

@larroy

This PR makes it easy to create unittests that require specific settings of environment variables, while avoiding the pitfalls (discussed in comments section). This PR can be considered a recasting and expansion of the great vision of @larroy in creating the EnvManager class in #13140. In its base form, the facility is a drop-in replacement for EnvManager, and is called 'environment': with environment('MXNET_MY_NEW_FEATURE', '1'): <test with feature enabled> with environment('MXNET_MY_NEW_FEATURE', '0'): <test with feature disabled> Like EnvManager, this facility takes care of the save/restore of the previous environment variable state, including when exceptions are raised. In addition though, this PR introduces the features: A similarly-named unittest decorator: @with_environment(key, value) The ability to pass in multiple env vars as a dict (as is needed for some tests) in both forms, so for example: with environment({'MXNET_FEATURE_A': '1', 'MXNET_FEATURE_B': '1'}): <test with both features enabled> Works on Windows! This PR includes a wrapping of the backend's setenv() and getenv() functions, and uses this direct access to the backend environment to keep it in sync with the python environment. This works around the problem that the C Runtime on Windows gets a snapshot of the Python environment at startup that is immutable from Python. with environment() has a simple implementation using the @contextmanager decorator Tests are included that validate the facility works with all combinations of before_val/set_val, namely unset/unset, unset/set, set/unset, set/set. There were 5 unittests previously using EnvManager, and this PR shifts those uses to with environment():, while converting over 20 other ad-hoc uses of os.environ[] within the unittests. This PR also enables those unittests that were bypassed on Windows (due to the inability to set environment variables) to run on all platforms. Further Comments Environment variables are a two-edged sword- they enable useful operating modes for testing, debugging or niche applications, but like all features they must be tested. The correct approach for testing with a particular env var setting is: def set_env_var(key, value): if value is None: os.environ.pop(key, None) else: os.environ[key] = value old_env_var_value = os.environ.get(env_var_name) try: set_env_var(env_var_name, test_env_var_value) <perform test> finally: set_env_var(env_var_name, old_env_var_value ) The above code makes no assumption about whether the before-test and within-test state of the env var is set or unset, and restores the prior environment even if the test raises an exception. This represents a lot of boiler-plate code that could be potentially mishandled. The with environment() context makes it simple to handle all this properly. If an entire unittest wants a forced env var setting, then using the @with_environment() decorator avoids the code indent of the with environment() approach if used otherwise within the test.

* set website default version - test redirect * enable first time redirect on all master website pages * update test code * remove unnecessary test code * fix typo * delete test code

Signed-off-by: Serge Panev <[email protected]>

) Developers can now trigger fine grained checks: python ci/build.py -R --platform ubuntu_cpu /work/runtime_functions.sh sanity_python python ci/build.py -R --platform ubuntu_cpu /work/runtime_functions.sh sanity_license etc

* Remove caffe plugin * Fix * Remove CXX14 feature flag * Update test

* loss for np/nd array * fix flaky

* temp * change test * fix bad func call * test * rectify * doc * change test

Co-authored-by: Lin <[email protected]>

Co-authored-by: Joe Evans <[email protected]>

) * move np tutorials to top level * replace deepnumpy reference to np * add info in card * remove useless entry * replace NDArray API card with np.ndarray * python site refactor * remove duplicated drawer and refactor layout * extend document width to 100% for xl devices

* remove other language bindings section from api page * remove language binding docs redirect * add call for contribution banner * modify call for contribution wording Co-authored-by: Aaron Markham <[email protected]> * more wording modification Co-authored-by: Aaron Markham <[email protected]> * add hyperlink to 1.x version in banner * add reference to the C api deprecation github issue Co-authored-by: Aaron Markham <[email protected]>

Run clang-tidy via cmake only on the code managed by mxnet (and not 3rdparty dependencies), update to clang-tidy-10 and run clang-tidy-10 -fix to fix all the warnings that are enforced on CI. Developers can run clang-tidy by specifying the -DCMAKE_CXX_CLANG_TIDY="clang-tidy-10" to cmake, or using the python ci/build.py -R --platform ubuntu_cpu /work/runtime_functions.sh build_ubuntu_cpu_clang_tidy script.

* make parameter smoother * minor changes

* refactor dlpack and add from_numpy to npx * remove reference of DeepNumPy * map platform-dependent types to fixed-size types * update DMLC_LOG_FATAL_THROW * fix flaky * fix flaky * test no error

* Enable DIST_KVSTORE by default in staticbuild set(USE_DIST_KVSTORE ON CACHE BOOL "Build with DIST_KVSTORE support") * Ensure static linkage of dependencies * Fix for OS X * Fix shell syntax * Alternate approach to force static linkage of libprotobuf

* Fix metric API page * Update index.rst

wkcn and others added 30 commits July 8, 2020 17:01

change bn test (#18688)

a9b16f7

Fix scipy dependency in probability module (#18689)

19e373d

* Fix scipy dependency in probability module * Fix copy-paste error * dtype='float32' for digamma and gammaln

add 'needs triage' label to new bug reports (#18696)

8ebb537

Fix python micro-site table of content bugs (#18664)

9d62392

* update footer style * add compiled css of footer styles changes * add same style for footer2 * more fix to the toc

Merge content from numpy.mxnet.io into mxnet official website (#18691)

7c9c4fc

Fix all anchor shifts on website (#18674)

f125f5f

Set CMAKE_CUDA_COMPILER in aarch64-linux-gnu-toolchain.cmake (#18713)

d8430b6

CMAKE_CUDA_HOST_COMPILER will be reset if CMAKE_CUDA_COMPILER is not set as of cmake 3.17.3 See https://gitlab.kitware.com/cmake/cmake/-/issues/20826

Disable test coverage in MKL builds (#18443)

d512814

* Disable test coverage in MKL builds * Enable test parallelization * Set OMP_NUM_THREADS * Fix * Fix unpack_and_init

Enable GPU Memory profiler tests (#18701)

0dc30a2

* Enable GPU Memory profiler tests Previously tests are not run as test_profiler.py was not taken into account on GPU CI runs and some tests were marked for being skipped if run on a CPU-only machine. * Disable broken tests

Migrate from private to public jetson toolchain files (#18677)

12ec046

Add the newest mxnet discuss version. Add d2l.ai (#18663)

6901325

* Add the newest mxnet discuss version. Add d2l.ai * delete [] and insert old version

[MXNET-1453] Support the intput whose dimension is greater than 6 for…

37bdf0b

… Transpose and Rollaxis (#18707) * support 6+ dims for transpose * test over * reorder code * fix transposeex

Initialize docker cache in build.py for docker-compose containers (#1…

2abf0b8

…8724)

Remove NNPACK integration (#18722)

a77f774

Add qr backward for wide matrices with m < n (#18197)

60d0672

Disable sparse op test (#18741)

3e0df1b

Disabling this test for now to unblock other PRs, while I'm looking into it. #18740

Move gluon.metric api docs (#18733)

cec86ad

Revert "Add qr backward for wide matrices with m < n (#18197)" (#18750)

444a7ee

This reverts commit 60d0672.

[NumPy] enable large tensor in np (#18368)

bf26bcc

* enable default large tensor in np * revert cmake change * move test_np_large_array.py to nightly

Add qr backward for wide inputs ncols>nrows (#18757)

1aec483

Remove Makefile build support (#18721)

a7c6606

Replaced by cmake buildsystem as per #16167

Improve test seeding in test_numpy_interoperablity.py (#18762)

6bb3d72

Remove duplicate settings in .codecov.yml (#18763)

9548b0c

New PRs started showing the codecov/project badge again due apparent change in codecov's backend resolving these duplicate options specified in .codecov.yml

leezu and others added 25 commits July 21, 2020 23:31

Fix mx.symbol.numpy._Symbol.__deepcopy__ logic error (#18686)

a330a02

* Fix mx.symbol.numpy._Symbol.__deepcopy__ logic error Performed shallow copy instead of deep copy * Test * Fix test

Fix crash when accessing already destructed static variables (#18768)

1928117

set website default version to current stable (1.6) version (#18738)

e31ad77

* set website default version - test redirect * enable first time redirect on all master website pages * update test code * remove unnecessary test code * fix typo * delete test code

ONNX import: use Conv pad attribute for symmetrical padding (#18675)

06b5d22

Signed-off-by: Serge Panev <[email protected]>

Remove caffe plugin (#18787)

c1db2d5

* Remove caffe plugin * Fix * Remove CXX14 feature flag * Update test

add support for np.ndarray in autograd.function (#18790)

98b3f73

Update CUB and include it only for CUDA < 11 (#18799)

9e77e81

remove NLL in metric (#18794)

74430a9

[NumPy] loss for np array (#17196)

a807f6d

* loss for np/nd array * fix flaky

[numpy] fix flaky mixed precision binary error (#18660)

7908d7e

* temp * change test * fix bad func call * test * rectify * doc * change test

remove executor manager from API doc (#18802)

f83dbac

Co-authored-by: Lin <[email protected]>

Fix naming in runtime_functions.sh (#18795)

126636c

Cherry-pick large tensor support from #18752. (#18804)

e9829e7

Co-authored-by: Joe Evans <[email protected]>

use regex that is supported by all browsers (#18811)

b685fad

Fix dirichlet flaky tests (#18817)

608afef

* make parameter smoother * minor changes

[NumPy] DLPack refactor and npx.from_numpy (#18656)

045efb2

* refactor dlpack and add from_numpy to npx * remove reference of DeepNumPy * map platform-dependent types to fixed-size types * update DMLC_LOG_FATAL_THROW * fix flaky * fix flaky * test no error

add adaptive left margin for python site document body (#18828)

aa53291

Fixup move gluon.metric api docs (#18748)

ac36089

* Fix metric API page * Update index.rst

jinboci merged commit e0fccbb into jinboci:master Jul 31, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Syncing with Apache/master #6

Syncing with Apache/master #6

jinboci commented Jul 31, 2020

Syncing with Apache/master #6

Syncing with Apache/master #6

Conversation

jinboci commented Jul 31, 2020

Description

Checklist

Essentials

Changes

Comments