diff --git a/NEWS.md b/NEWS.md
index b49c119d87cb..0ba22152d4e0 100644
--- a/NEWS.md
+++ b/NEWS.md
@@ -18,87 +18,161 @@
MXNet Change Log
================
- [MXNet Change Log](#mxnet-change-log)
- * [1.5.1](#151)
- * [1.5.0](#150)
- + [New Features](#new-features)
- - [Automatic Mixed Precision(experimental)](#automatic-mixed-precision-experimental-)
+ - [1.7.0](#170)
+ - [New features](#new-features)
+ - [MXNet Extensions: custom operators, partitioning, and graph passes](#mxnet-extensions-custom-operators-partitioning-and-graph-passes)
+ - [OpPerf utility enabled in the binary distribution](#opperf-utility-enabled-in-the-binary-distribution)
+ - [MKL-DNN](#mkl-dnn)
+ - [MKL-DNN as the default CPU backend in binary distribution](#mkl-dnn-as-the-default-cpu-backend-in-binary-distribution)
+ - [Branding change to DNNL](#branding-change-to-dnnl)
+ - [Support bfloat16 datatype](#support-bfloat16-datatype)
+ - [New operators](#new-operators)
+ - [Feature improvements](#feature-improvements)
+ - [Numpy compatible interface(experimental)](#numpy-compatible-interfaceexperimental)
+ - [Large tensor support](#large-tensor-support)
+ - [MKL-DNN enhancement](#mkl-dnn-enhancement)
+ - [TensorRT integration](#tensorrt-integration)
+ - [Quantization](#quantization)
+ - [Profiler](#profiler)
+ - [ONNX](#onnx)
+ - [New models](#new-models)
+ - [Operator improvements](#operator-improvements)
+ - [Bug fixes](#bug-fixes)
+ - [Front end API](#front-end-api)
+ - [Gluon](#gluon)
+ - [Symbol](#symbol)
+ - [Language Bindings](#language-bindings)
+ - [Python](#python)
+ - [C/C++](#cc)
+ - [R](#r)
+ - [Clojure](#clojure)
+ - [Julia](#julia)
+ - [Perl](#perl)
+ - [Scala](#scala)
+ - [Performance improvements](#performance-improvements)
+ - [Example and tutorials](#example-and-tutorials)
+ - [Website and documentation](#website-and-documentation)
+ - [CI/CD](#cicd)
+ - [License](#license)
+ - [Miscellaneous changes](#miscellaneous-changes)
+ - [1.6.0](#160)
+ - [Deprecation of Python 2](#deprecation-of-python-2)
+ - [New features](#new-features-1)
+ - [NumPy compatible interface and using TVM to generate operators](#numpy-compatible-interface-and-using-tvm-to-generate-operators)
+ - [Graph optimizations](#graph-optimizations)
+ - [Pointwise fusion for GPU](#pointwise-fusion-for-gpu)
+ - [Eliminate common subexpressions](#eliminate-common-subexpressions)
+ - [Default MKLDNN Subgraph fusion](#default-mkldnn-subgraph-fusion)
+ - [New operators](#new-operators-1)
+ - [Feature improvements](#feature-improvements-1)
+ - [Automatic Mixed Precision](#automatic-mixed-precision)
+ - [Gluon Fit API](#gluon-fit-api)
+ - [MKLDNN](#mkldnn)
+ - [Large tensor support](#large-tensor-support-1)
+ - [TensorRT integration](#tensorrt-integration-1)
+ - [Higher order gradient support](#higher-order-gradient-support)
+ - [Operator improvements](#operator-improvements-1)
+ - [Profiler](#profiler-1)
+ - [ONNX import/export](#onnx-importexport)
+ - [Runtime discovery of features](#runtime-discovery-of-features)
+ - [Bug fixes](#bug-fixes-1)
+ - [Front end API](#front-end-api-1)
+ - [Gluon](#gluon-1)
+ - [Symbol](#symbol-1)
+ - [Language Bindings](#language-bindings-1)
+ - [Python](#python-1)
+ - [C/C++](#cc-1)
+ - [Clojure](#clojure-1)
+ - [Julia](#julia-1)
+ - [Perl](#perl-1)
+ - [Scala](#scala-1)
+ - [Performance improvements](#performance-improvements-1)
+ - [Examples and tutorials](#examples-and-tutorials)
+ - [Website and documentation](#website-and-documentation-1)
+ - [CI/CD](#cicd-1)
+ - [Misc](#misc)
+ - [1.5.1](#151)
+ - [Bug-fixes](#bug-fixes-2)
+ - [1.5.0](#150)
+ - [New Features](#new-features-2)
+ - [Automatic Mixed Precision(experimental)](#automatic-mixed-precisionexperimental)
- [MKL-DNN Reduced precision inference and RNN API support](#mkl-dnn-reduced-precision-inference-and-rnn-api-support)
- - [Dynamic Shape(experimental)](#dynamic-shape-experimental-)
- - [Large Tensor Support](#large-tensor-support)
+ - [Dynamic Shape(experimental)](#dynamic-shapeexperimental)
+ - [Large Tensor Support](#large-tensor-support-2)
- [Dependency Update](#dependency-update)
- - [Gluon Fit API(experimental)](#gluon-fit-api-experimental-)
- - [New Operators](#new-operators)
- + [Feature Improvements](#feature-improvements)
+ - [Gluon Fit API(experimental)](#gluon-fit-apiexperimental)
+ - [New Operators](#new-operators-2)
+ - [Feature Improvements](#feature-improvements-2)
- [Operators](#operators)
- - [MKLDNN](#mkldnn)
- - [ONNX](#onnx)
+ - [MKLDNN](#mkldnn-1)
+ - [ONNX](#onnx-1)
- [TensorRT](#tensorrt)
- [FP16 Support](#fp16-support)
- - [Deep Graph Library(DGL) support](#deep-graph-library-dgl--support)
+ - [Deep Graph Library(DGL) support](#deep-graph-librarydgl-support)
- [Horovod Integration](#horovod-integration)
- [Dynamic Shape](#dynamic-shape)
- [Backend Engine](#backend-engine)
- - [Large Tensor Support](#large-tensor-support-1)
- - [Quantization](#quantization)
- - [Profiler](#profiler)
+ - [Large Tensor Support](#large-tensor-support-3)
+ - [Quantization](#quantization-1)
+ - [Profiler](#profiler-2)
- [CoreML](#coreml)
- + [Front End API](#front-end-api)
- - [Gluon](#gluon)
- - [Python](#python)
- + [Language Bindings](#language-bindings)
- - [Scala](#scala)
+ - [Front End API](#front-end-api-2)
+ - [Gluon](#gluon-2)
+ - [Python](#python-2)
+ - [Language Bindings](#language-bindings-2)
+ - [Scala](#scala-2)
- [Java](#java)
- - [C++](#c--)
- - [Clojure](#clojure)
- - [Julia](#julia)
- - [Perl:](#perl-)
- - [R](#r)
- + [Performance Improvements](#performance-improvements)
- + [Example and Tutorials](#example-and-tutorials)
- + [Website](#website)
- + [Documentation](#documentation)
- + [Build and Test](#build-and-test)
- + [Bug-fixes](#bug-fixes)
- + [License](#license)
- + [Depreciations](#depreciations)
- + [Known Issues](#known-issues)
- * [1.4.1](#141)
- + [Bug-fixes](#bug-fixes-1)
- * [1.4.0](#140)
- + [New Features](#new-features-1)
+ - [C++](#c)
+ - [Clojure](#clojure-2)
+ - [Julia](#julia-2)
+ - [Perl:](#perl-2)
+ - [R](#r-1)
+ - [Performance Improvements](#performance-improvements-2)
+ - [Example and Tutorials](#example-and-tutorials-1)
+ - [Website](#website)
+ - [Documentation](#documentation)
+ - [Build and Test](#build-and-test)
+ - [Bug-fixes](#bug-fixes-3)
+ - [License](#license-1)
+ - [Depreciations](#depreciations)
+ - [Known Issues](#known-issues)
+ - [1.4.1](#141)
+ - [Bug-fixes](#bug-fixes-4)
+ - [1.4.0](#140)
+ - [New Features](#new-features-3)
- [Java Inference API](#java-inference-api)
- [Julia API](#julia-api)
- - [Control Flow Operators (experimental)](#control-flow-operators--experimental-)
+ - [Control Flow Operators (experimental)](#control-flow-operators-experimental)
- [SVRG Optimization](#svrg-optimization)
- - [Subgraph API (experimental)](#subgraph-api--experimental-)
+ - [Subgraph API (experimental)](#subgraph-api-experimental)
- [JVM Memory Management](#jvm-memory-management)
- - [Topology-aware AllReduce (experimental)](#topology-aware-allreduce--experimental-)
- - [MKLDNN backend: Graph optimization and Quantization (experimental)](#mkldnn-backend--graph-optimization-and-quantization--experimental-)
- * [Graph Optimization](#graph-optimization)
- * [Quantization](#quantization-1)
- + [New Operators](#new-operators-1)
- + [Feature improvements](#feature-improvements)
+ - [Topology-aware AllReduce (experimental)](#topology-aware-allreduce-experimental)
+ - [MKLDNN backend: Graph optimization and Quantization (experimental)](#mkldnn-backend-graph-optimization-and-quantization-experimental)
+ - [Graph Optimization](#graph-optimization)
+ - [Quantization](#quantization-2)
+ - [New Operators](#new-operators-3)
+ - [Feature improvements](#feature-improvements-3)
- [Operator](#operator)
- [Optimizer](#optimizer)
- [Sparse](#sparse)
- - [ONNX](#onnx-1)
- - [MKLDNN](#mkldnn-1)
+ - [ONNX](#onnx-2)
+ - [MKLDNN](#mkldnn-2)
- [Inference](#inference)
- [Other](#other)
- + [Frontend API updates](#frontend-api-updates)
- - [Gluon](#gluon-1)
- - [Symbol](#symbol)
- + [Language API updates](#language-api-updates)
+ - [Frontend API updates](#frontend-api-updates)
+ - [Gluon](#gluon-3)
+ - [Symbol](#symbol-2)
+ - [Language API updates](#language-api-updates)
- [Java](#java-1)
- - [R](#r-1)
- - [Scala](#scala-1)
- - [Clojure](#clojure-1)
- - [Perl](#perl)
- - [Julia](#julia-1)
- + [Performance benchmarks and improvements](#performance-benchmarks-and-improvements)
- + [Bug fixes](#bug-fixes)
- + [Licensing updates](#licensing-updates)
- + [Improvements](#improvements)
+ - [R](#r-2)
+ - [Scala](#scala-3)
+ - [Clojure](#clojure-3)
+ - [Perl](#perl-3)
+ - [Julia](#julia-3)
+ - [Performance benchmarks and improvements](#performance-benchmarks-and-improvements)
+ - [Bug fixes](#bug-fixes-5)
+ - [Licensing updates](#licensing-updates)
+ - [Improvements](#improvements)
- [Tutorial](#tutorial)
- [Example](#example)
- [Documentation](#documentation-1)
@@ -107,98 +181,1220 @@ MXNet Change Log
- [Installation](#installation)
- [Build and CI](#build-and-ci)
- [3rd party](#3rd-party)
- * [TVM:](#tvm-)
- * [CUDNN:](#cudnn-)
- * [Horovod:](#horovod-)
- + [Deprications](#deprications)
- + [Other](#other-1)
- + [How to build MXNet](#how-to-build-mxnet)
- + [List of submodules used by Apache MXNet (Incubating) and when they were updated last](#list-of-submodules-used-by-apache-mxnet--incubating--and-when-they-were-updated-last)
- * [1.3.1](#131)
- + [Bug fixes](#bug-fixes-1)
- + [Documentation fixes](#documentation-fixes)
- + [Other Improvements](#other-improvements)
- + [Submodule updates](#submodule-updates)
- + [Known issues](#known-issues)
- * [1.3.0](#130)
- + [New Features - Gluon RNN layers are now HybridBlocks](#new-features---gluon-rnn-layers-are-now-hybridblocks)
- + [MKL-DNN improvements](#mkl-dnn-improvements)
- + [New Features - Gluon Model Zoo Pre-trained Models](#new-features---gluon-model-zoo-pre-trained-models)
- + [New Features - Clojure package (experimental)](#new-features---clojure-package--experimental-)
- + [New Features - Synchronized Cross-GPU Batch Norm (experimental)](#new-features---synchronized-cross-gpu-batch-norm--experimental-)
- + [New Features - Sparse Tensor Support for Gluon (experimental)](#new-features---sparse-tensor-support-for-gluon--experimental-)
- + [New Features - Control flow operators (experimental)](#new-features---control-flow-operators--experimental-)
- + [New Features - Scala API Improvements (experimental)](#new-features---scala-api-improvements--experimental-)
- + [New Features - Rounding GPU Memory Pool for dynamic networks with variable-length inputs and outputs (experimental)](#new-features---rounding-gpu-memory-pool-for-dynamic-networks-with-variable-length-inputs-and-outputs--experimental-)
- + [New Features - Topology-aware AllReduce (experimental)](#new-features---topology-aware-allreduce--experimental-)
- + [New Features - Export MXNet models to ONNX format (experimental)](#new-features---export-mxnet-models-to-onnx-format--experimental-)
- + [New Features - TensorRT Runtime Integration (experimental)](#new-features---tensorrt-runtime-integration--experimental-)
- + [New Examples - Scala](#new-examples---scala)
- + [Maintenance - Flaky Tests improvement effort](#maintenance---flaky-tests-improvement-effort)
- + [Maintenance - MXNet Model Backwards Compatibility Checker](#maintenance---mxnet-model-backwards-compatibility-checker)
- + [Maintenance - Integrated testing for "the Straight Dope"](#maintenance---integrated-testing-for--the-straight-dope-)
- + [Bug-fixes](#bug-fixes-2)
- + [Performance Improvements](#performance-improvements-1)
- + [API Changes](#api-changes)
- + [Other features](#other-features)
- + [Usability Improvements](#usability-improvements)
- * [1.2.0](#120)
- + [New Features - Added Scala Inference APIs](#new-features---added-scala-inference-apis)
- + [New Features - Added a Module to Import ONNX models into MXNet](#new-features---added-a-module-to-import-onnx-models-into-mxnet)
- + [New Features - Added Support for Model Quantization with Calibration](#new-features---added-support-for-model-quantization-with-calibration)
- + [New Features - MKL-DNN Integration](#new-features---mkl-dnn-integration)
- + [New Features - Added Exception Handling Support for Operators](#new-features---added-exception-handling-support-for-operators)
- + [New Features - Enhanced FP16 support](#new-features---enhanced-fp16-support)
- + [New Features - Added Profiling Enhancements](#new-features---added-profiling-enhancements)
- + [Breaking Changes](#breaking-changes)
- + [Bug Fixes](#bug-fixes)
- + [Performance Improvements](#performance-improvements-2)
- + [API Changes](#api-changes-1)
- + [Sparse Support](#sparse-support)
- + [Deprecations](#deprecations)
- + [Other Features](#other-features)
- + [Usability Improvements](#usability-improvements-1)
- + [Known Issues](#known-issues-1)
- * [1.1.0](#110)
- + [Usability Improvements](#usability-improvements-2)
- + [Bug-fixes](#bug-fixes-3)
- + [New Features](#new-features-2)
- + [API Changes](#api-changes-2)
- + [Deprecations](#deprecations-1)
- + [Performance Improvements](#performance-improvements-3)
- + [Known Issues](#known-issues-2)
- * [1.0.0](#100)
- + [Performance](#performance)
- + [New Features - Gradient Compression [Experimental]](#new-features---gradient-compression--experimental-)
- + [New Features - Support of NVIDIA Collective Communication Library (NCCL) [Experimental]](#new-features---support-of-nvidia-collective-communication-library--nccl---experimental-)
- + [New Features - Advanced Indexing [General Availability]](#new-features---advanced-indexing--general-availability-)
- + [New Features - Gluon [General Availability]](#new-features---gluon--general-availability-)
- + [New Features - ARM / Raspberry Pi support [Experimental]](#new-features---arm---raspberry-pi-support--experimental-)
- + [New Features - NVIDIA Jetson support [Experimental]](#new-features---nvidia-jetson-support--experimental-)
- + [New Features - Sparse Tensor Support [General Availability]](#new-features---sparse-tensor-support--general-availability-)
- + [Bug-fixes](#bug-fixes-4)
- + [Doc Updates](#doc-updates)
- * [0.12.1](#0121)
- + [Bug-fixes](#bug-fixes-5)
- * [0.12.0](#0120)
- + [Performance](#performance-1)
- + [New Features - Gluon](#new-features---gluon)
- + [New Features - Autograd](#new-features---autograd)
- + [New Features - Sparse Tensor Support](#new-features---sparse-tensor-support)
- + [Other New Features](#other-new-features)
- + [API Changes](#api-changes-3)
- + [Bug-fixes](#bug-fixes-6)
- * [0.11.0](#0110)
- + [Major Features](#major-features)
- + [API Changes](#api-changes-4)
- + [Performance Improvements](#performance-improvements-4)
- + [Bugfixes](#bugfixes)
- + [Refactors](#refactors)
- * [0.10.0](#0100)
- * [0.9.3](#093)
- * [v0.8](#v08)
- * [v0.7](#v07)
- * [v0.5 (initial release)](#v05--initial-release-)
+ - [TVM:](#tvm)
+ - [CUDNN:](#cudnn)
+ - [Horovod:](#horovod)
+ - [Deprications](#deprications)
+ - [Other](#other-1)
+ - [How to build MXNet](#how-to-build-mxnet)
+ - [List of submodules used by Apache MXNet (Incubating) and when they were updated last](#list-of-submodules-used-by-apache-mxnet-incubating-and-when-they-were-updated-last)
+ - [1.3.1](#131)
+ - [Bug fixes](#bug-fixes-6)
+ - [Documentation fixes](#documentation-fixes)
+ - [Other Improvements](#other-improvements)
+ - [Submodule updates](#submodule-updates)
+ - [Known issues](#known-issues-1)
+ - [1.3.0](#130)
+ - [New Features - Gluon RNN layers are now HybridBlocks](#new-features---gluon-rnn-layers-are-now-hybridblocks)
+ - [MKL-DNN improvements](#mkl-dnn-improvements)
+ - [New Features - Gluon Model Zoo Pre-trained Models](#new-features---gluon-model-zoo-pre-trained-models)
+ - [New Features - Clojure package (experimental)](#new-features---clojure-package-experimental)
+ - [New Features - Synchronized Cross-GPU Batch Norm (experimental)](#new-features---synchronized-cross-gpu-batch-norm-experimental)
+ - [New Features - Sparse Tensor Support for Gluon (experimental)](#new-features---sparse-tensor-support-for-gluon-experimental)
+ - [New Features - Control flow operators (experimental)](#new-features---control-flow-operators-experimental)
+ - [New Features - Scala API Improvements (experimental)](#new-features---scala-api-improvements-experimental)
+ - [New Features - Rounding GPU Memory Pool for dynamic networks with variable-length inputs and outputs (experimental)](#new-features---rounding-gpu-memory-pool-for-dynamic-networks-with-variable-length-inputs-and-outputs-experimental)
+ - [New Features - Topology-aware AllReduce (experimental)](#new-features---topology-aware-allreduce-experimental)
+ - [New Features - Export MXNet models to ONNX format (experimental)](#new-features---export-mxnet-models-to-onnx-format-experimental)
+ - [New Features - TensorRT Runtime Integration (experimental)](#new-features---tensorrt-runtime-integration-experimental)
+ - [New Examples - Scala](#new-examples---scala)
+ - [Maintenance - Flaky Tests improvement effort](#maintenance---flaky-tests-improvement-effort)
+ - [Maintenance - MXNet Model Backwards Compatibility Checker](#maintenance---mxnet-model-backwards-compatibility-checker)
+ - [Maintenance - Integrated testing for "the Straight Dope"](#maintenance---integrated-testing-for-%22the-straight-dope%22)
+ - [Bug-fixes](#bug-fixes-7)
+ - [Performance Improvements](#performance-improvements-3)
+ - [API Changes](#api-changes)
+ - [Other features](#other-features)
+ - [Usability Improvements](#usability-improvements)
+ - [1.2.0](#120)
+ - [New Features - Added Scala Inference APIs](#new-features---added-scala-inference-apis)
+ - [New Features - Added a Module to Import ONNX models into MXNet](#new-features---added-a-module-to-import-onnx-models-into-mxnet)
+ - [New Features - Added Support for Model Quantization with Calibration](#new-features---added-support-for-model-quantization-with-calibration)
+ - [New Features - MKL-DNN Integration](#new-features---mkl-dnn-integration)
+ - [New Features - Added Exception Handling Support for Operators](#new-features---added-exception-handling-support-for-operators)
+ - [New Features - Enhanced FP16 support](#new-features---enhanced-fp16-support)
+ - [New Features - Added Profiling Enhancements](#new-features---added-profiling-enhancements)
+ - [Breaking Changes](#breaking-changes)
+ - [Bug Fixes](#bug-fixes-8)
+ - [Performance Improvements](#performance-improvements-4)
+ - [API Changes](#api-changes-1)
+ - [Sparse Support](#sparse-support)
+ - [Deprecations](#deprecations)
+ - [Other Features](#other-features-1)
+ - [Usability Improvements](#usability-improvements-1)
+ - [Known Issues](#known-issues-2)
+ - [1.1.0](#110)
+ - [Usability Improvements](#usability-improvements-2)
+ - [Bug-fixes](#bug-fixes-9)
+ - [New Features](#new-features-4)
+ - [API Changes](#api-changes-2)
+ - [Deprecations](#deprecations-1)
+ - [Performance Improvements](#performance-improvements-5)
+ - [Known Issues](#known-issues-3)
+ - [1.0.0](#100)
+ - [Performance](#performance)
+ - [New Features - Gradient Compression [Experimental]](#new-features---gradient-compression-experimental)
+ - [New Features - Support of NVIDIA Collective Communication Library (NCCL) [Experimental]](#new-features---support-of-nvidia-collective-communication-library-nccl-experimental)
+ - [New Features - Advanced Indexing [General Availability]](#new-features---advanced-indexing-general-availability)
+ - [New Features - Gluon [General Availability]](#new-features---gluon-general-availability)
+ - [New Features - ARM / Raspberry Pi support [Experimental]](#new-features---arm--raspberry-pi-support-experimental)
+ - [New Features - NVIDIA Jetson support [Experimental]](#new-features---nvidia-jetson-support-experimental)
+ - [New Features - Sparse Tensor Support [General Availability]](#new-features---sparse-tensor-support-general-availability)
+ - [Bug-fixes](#bug-fixes-10)
+ - [Doc Updates](#doc-updates)
+ - [0.12.1](#0121)
+ - [Bug-fixes](#bug-fixes-11)
+ - [0.12.0](#0120)
+ - [Performance](#performance-1)
+ - [New Features - Gluon](#new-features---gluon)
+ - [New Features - Autograd](#new-features---autograd)
+ - [New Features - Sparse Tensor Support](#new-features---sparse-tensor-support)
+ - [Other New Features](#other-new-features)
+ - [API Changes](#api-changes-3)
+ - [Bug-fixes](#bug-fixes-12)
+ - [0.11.0](#0110)
+ - [Major Features](#major-features)
+ - [API Changes](#api-changes-4)
+ - [Performance Improvements](#performance-improvements-6)
+ - [Bugfixes](#bugfixes)
+ - [Refactors](#refactors)
+ - [0.10.0](#0100)
+ - [0.9.3](#093)
+ - [v0.8](#v08)
+ - [v0.7](#v07)
+ - [v0.5 (initial release)](#v05-initial-release)
+
+## 1.7.0
+
+### New features
+#### MXNet Extensions: custom operators, partitioning, and graph passes
+
+Adds support for extending MXNet with custom operators, partitioning strategies, and graph passes. All implemented in a library easily compiled separately from the MXNet codebase, and dynamically loaded at runtime into any prebuilt installation of MXNet.
+
+ - fix for number of inputs/outputs for backward custom ops (#17069)
+ - Enhancements for custom subgraph op (#17194)
+ - Disable flaky test_custom_op_fork (#17481)
+ - fix custom op makefile (#17516)
+ - Update CustomOp doc with changes for GPU support (#17486)
+ - [WIP] MXNet Extensions enhancements (#17885) (#18128)
+ - Dynamic subgraph property (#17034)
+ - Dynamic subgraph property doc (#17585)
+ - [1.7] Backport MXNet Extension PRs (#17623, #17569, #17762) #18063 (#18069)
+
+#### OpPerf utility enabled in the binary distribution
+ - [OpPerf] Add Neural network loss ops (#17482)
+ - [OpPerf] Fixes the issue when you pass NDArray to run_perf_test (#17508)
+ - [OpPerf] Fix markdown for native profile and add profile param in function desc (#17494)
+ - [OpPerf] Add Indexing ops (#16253)
+ - [OpPerf] Implement remaining random sampling ops (#17502)
+ - [OpPerf] Implement remaining GEMM ops (#17501)
+ - [OpPerf] Implement all linalg ops (#17528)
+ - [OpPerf] Fixed native output ordering, added warmup & runs command line args (#17571)
+ - [OpPerf] Add norm, cast ops, remaining optimizer ops (#17542)
+ - [OpPerf] Fixed Python profiler bug (#17642)
+
+#### MKL-DNN
+##### MKL-DNN as the default CPU backend in binary distribution
+##### Branding change to DNNL
+ - Upgrade MKL-DNN dependency to v1.1 (#16823)
+
+##### Support bfloat16 datatype
+ - Add bfloat16 floating-point format support based on AMP (#17265)
+
+#### New operators
+ - [New Op] Add deformable conv v2 (#16341)
+ - Add MXNet Ops for fast multihead attention (#16408)
+ - Support boolean elemwise/broadcast binary add, multiply and true_divide (#16728)
+ - add gammaln, erf, erfinv (#16811)
+ - add aligned roi introduced in Detectron2 (#16619)
+ - Implement atleast_1d/2d/3d (#17099)
+ - Interleaved MHA for CPU path (#17138)
+ - Lamb optimizer update (#16715)
+ - Quantized Embedding (#16691)
+ - Add gelu fuse ops (#18082) (#18092)
+
+### Feature improvements
+#### Numpy compatible interface(experimental)
+ - [NumPy] NumPy support for linalg.inv (#16730)
+ - add numpy op nan_to_num (#16717)
+ - [Numpy] Add sampling method for bernoulli (#16638)
+ - Fix numpy-compatible mean output type for integer inputs (#16792)
+ - [Numpy] Fix collect_params().zero_grad() in gluon numpy interface (#16716)
+ - [Numpy][Operator] 'where' Implementation in MXNet (#16829)
+ - [Numpy] Random.normal() with backward (#16330)
+ - Add OP diag [numpy] (#16786)
+ - Mixed precison binary op backward (use in) for numpy (#16791)
+ - add numpy op diagflat [numpy] (#16813)
+ - add op bitwise_or [numpy] (#16801)
+ - [Numpy] Implementation npx.{sample}_n (#16876)
+ - [Numpy] Add NumPy support for np.linalg.det and np.linalg.slogdet (#16800)
+ - Op Unravel_index PR [Numpy] (#16862)
+ - [Numpy] Fix imperative basic indexing in numpy (#16902)
+ - [Numpy] Basic indexing in symbolic interface of DeepNumpy (#16621)
+ - [Numpy] add op full_like, c++ impl, fix zeros_like, ones_like type inference (#16804)
+ - [Numpy] Implement numpy operator 'average' (#16720)
+ - [Bugfix] [Numpy] Add `kAddTo` and kNullOp to Transpose (#16979)
+ - set rtol = 1e-2 and atol = 1e-4 when dtype == np.float32 in test_numpy_op.py:test_np_linalg_solve (#17025)
+ - Op_Diagonal [Numpy] (#16989)
+ - numpy bincount (#16965)
+ - [numpy] add op bitwise_not (#16947)
+ - [Numpy ]Modify np.random.shuffle to enable inplace by default (#17133)
+ - [numpy] fix argsort typo (#17150)
+ - [numpy] add op round (#17175)
+ - [numpy]Add op delete (#17023)
+ - [numpy] add op flipud, fliplr (#17192)
+ - [CI] Re-enable testing with numpy 1.18 (#17200)
+ - [Numpy] Add broadcast_to scalar case (#17233)
+ - [Numpy] Random.gamma() implemented (#16152)
+ - [Numpy] add row_stack (=vstack) (#17171)
+ - [Numpy] Add infra for performing constraint check (#17272)
+ - porting numpy-compatible hstack to master and add dstack for interoperability (#17030)
+ - adding asnumpy() to output of gather(implicitly called) to fix gather test in large vector and tensor tests (#17290)
+ - [numpy] add op random.exponential (#17280)
+ - [NumPy] Add NumPy support for norm (#17014)
+ - [numpy]add op random.lognormal (#17415)
+ - Add numpy random weibull operator (#17505)
+ - [numpy] Add np.random.pareto and np.random.power (#17517)
+ - [Numpy] Add sort op (#17393)
+ - [numpy]implement exponential backward (#17401)
+ - [Numpy] Where operator scalar version (#17249)
+ - [numpy] add op matmul (#16990)
+ - [numpy]add op random.logistic, random.gumbel (#17302)
+ - [numpy][Do Not Review]add op insert (#16865)
+ - [numpy] add op random.rayleigh (#17541)
+ - [numpy] add fallback ops (#17609)
+ - [numpy] add op pad (#17328)
+ - [numpy] add op fabs, sometrue, round_ (#17619)
+ - Add arange_like to npx (#16883)
+ - try to move shape_array to npx (#16897)
+ - support np.argsort (#16949)
+ - np.broadcast_to extension (#17358)
+ - support bitwise_and (#16861)
+ - fix np.argmax/argmin output data type (#17476)
+ - add op random.beta (#17390)
+ - add op isnan isinf (#17535)
+ - array_split pr (#17032)
+ - Mixed data type binary ops (#16699)
+ - randn implemented (#17141)
+ - refactor and reduce float types for some functions, also add bitwise_xor (#16827)
+ - any/all (#17087)
+ - amax (#17176)
+ - fix format (#17100)
+ - add op empty_like, add nan_to_num to dispatch (#17169)
+ - handle array_like fill_value for np.full; add unit test coverage (#17245)
+ - add np.amin (#17538)
+ - add npx.gather_nd (#17477)
+ - add np.random.chisquare (#17524)
+ - add polyval (#17416)
+ - add isposinf isneginf isfinite (#17563)
+ - Support broadcast assign for `npi_boolean_mask_assign_tensor` (#17131)
+ - Implement Weibull backward (#17590)
+ - support np.dsplit, fix some error msgs and corner cases for hsplit and vsplit, add interoperability tests for h/v/dsplit (#17478)
+ - add np.product (#17489)
+ - Implement np.random.pareto backward (#17607)
+ - add np.ediff1d (#17624)
+ - more support for boolean indexing and assign (#18352)
+ - Fix einsum gradient (#18482)
+ - [v1.7.x] Backport PRs of numpy features (#18653)
+ - [v1.7.x] backport mixed type binary ops to v1.7.x (#18649)
+ - revise activations (#18700)
+
+#### Large tensor support
+ - [Large Tensor] Add support to Random Sample & Pdf ops (#17445)
+ - [Large Tensor] Add LT support for NN optimizers and 1 activation function (#17444)
+ - [Large Tensor] Fixed SoftmaxActivation op (#17634)
+ - [Large Tensor] Fixed col2im op (#17622)
+ - [Large Tensor] Fixed Spatial Transformer op (#17617)
+ - [Large Tensor] Fix ravel_multi_index op (#17644)
+ - Sparse int64 Large tensor support (#16898)
+ - Re-Enabling Large Tensor Nightly on GPU (#16164)
+ - enabling build stage gpu_int64 to enable large tensor nightly runs (#17546)
+ - [Large Tensor] Fixed Embedding op (#17599)
+
+#### MKL-DNN enhancement
+ - MKLDNN FC : Add error info when mkldnn fc bias dimension is wrong (#16692)
+ - [MKLDNN] support mkldnn gelu (#16710)
+ - [MKLDNN] Fix int8 convolution/fc bias overflow (#16734)
+ - [MKLDNN] use dim_t instead of int in slice/transpose operators (#16737)
+ - Mkldnn fullyConnect bwd bug fix (#16890)
+ - Revert Mkldnn fullyConnect bwd bug fix (#16890) (#16907)
+ - [MKLDNN] Use MKLDNNRun (#16772)
+ - [MKLDNN] mkldnn RNN operator enhancement (#17075)
+ - [MKLDNN] enable MaxPooling with full pooling convention (#16860)
+ - update mkldnn to v1.1.2 (#17165)
+ - improve mkldnn doc (#17198)
+ - [MKLDNN] Fix _copyto (#17173)
+ - [MKLDNN] Support channel wise quantization for FullyConnected (#17187)
+ - fixed seed for mkldnn test (#17386)
+ - add mkldnn softmax backward (#17170)
+ - cmake: copy dnnl headers to include/mkldnn (#17647)
+ - [mkldnn]Mkldnn bn opt backport from master to 1.7x (#18009)
+ - [v1.x] Update 3rdparty/mkldnn remote URL and pin to v1.3 (#17972) (#18033)
+ - [v1.x] backport #17900 [MKLDNN] support using any format in pooling backward (#18067)
+ - Static link MKL-DNN library (#16731)
+ - Add large tensor nightly tests for MKL-DNN operators (#16184)
+ - [MKL-DNN] Enable and Optimization for s8 eltwise_add (#16931)
+ - [MKL-DNN] Enhance Quantization Method (#17161)
+ - Static Build and CD for mxnet-cu102/mxnet-cu102mkl (#17074)
+ - MKL-DNN RNN backward path enhancement (#17183)
+ - cmake: check USE_OPENMP and pass proper MKL-DNN build flags (#17356)
+ - update mkl to 2020.0 (#17355)
+ - Enable MKL-DNN by default in pip packages (#16899)
+ - Enable MKL-DNN FullyConnected backward (#17318)
+ - Softmax primitive cache and in-place computation (#17152)
+ - boolean_mask_assign with start_axis (#16886)
+ - use identity_with_cast (#16913)
+ - change error tolerance for bf16 bn (#18110)
+ - [v1.x] Backport #17689 and #17884 to v1.x branch (#18064)
+ - refactor codes and add an option to skip/check weight's version to reduce overhead (#17707) (#18039)
+ - [v1.x] Backport #17702 and #17872 to v1.x branch (#18038)
+
+#### TensorRT integration
+ - Update TensorRT tutorial to build-from-source. (#14860)
+ - Minor fix, use RAII for TensorRT builder and network object (#17189)
+
+#### Quantization
+ - Add silent option to quantization script (#17094)
+
+#### Profiler
+ - Implemented final two binary ops, added default params for functionality (#17407)
+ - Implement remaining nn_activation ops in opperf (#17475)
+ - Implement all miscellaneous ops (#17511)
+ - Implement remaining nn_basic ops in opperf (#17456)
+
+#### ONNX
+ - Fix memory leak reported by ASAN in NNVM to ONNX conversion (#15516)
+ - ONNX export: Gather (#15995)
+ - ONNX export: Slice op - Handle None value for ends (#14942)
+
+#### New models
+ - [Model] Implement Neural Collaborative Filtering with MXNet (#16689)
+ - Further optimization for NCF model (#17148)
+ - HMM Model (#17120)
+
+#### Operator improvements
+ - Faster GPU NMS operator (#16542)
+ - [MXNET-1421] Added (CuDNN)BatchNorm operator to the list of mirrored operators (#16022)
+ - dynamic custom operator support (#15921)
+ - Multi Precision Lamb Update operator (#16885)
+ - Add im2col and col2im operator (#16502)
+ - Quantized Elemwise Mul Operator (#17147)
+ - Enhancements for MXTensor for custom operators (#17204)
+ - Enabling large tensor support for binary broadcast operators (#16755)
+ - Fix operators lying about their number of inputs (#17049)
+ - [WIP] Fallback mechanism for mx.np operators (#16923)
+ - Dynamic custom operator GPU support (#17270)
+ - Fix flaky - test_operator_gpu.test_np_insert (#17620)
+ - MXNet FFI for Operator Imperative Invocation (#17510)
+ - [MXNET-978] Higher Order Gradient Support `logp1`, `expm1`, `square`. (#15416)
+ - [MXNET-978] Higher Order Gradient Support `arcsin`, `arccos`. (#15515)
+ - [MXNET-978] Higher Order Gradient Support `rsqrt`, `rcbrt`. (#15476)
+ - gather_nd: check bound and wrap negative indices (#17208)
+ - Remove dilation restriction for conv3d (#17491)
+ - Fix storage type infer of softmax backward (#17576)
+ - Fix and optimize handling of vectorized memory accesses (#17767) (#18113)
+ - Cherry-pick of #17995 and #17937 to 1.x branch (#18041)
+ - No tensor cores for fp32 interleaved attention, remove div by 8 restriction (#17994) (#18085)
+ - GPU gemms true fp16 (#17466) (#18023)
+ - Add support for boolean inputs to FusedOp (#16796)
+
+#### Bug fixes
+ - [BUG FIX] Always preserve batch dimension in batches returned from dataloader (#16233)
+ - Fix SliceChannel Type inference (#16748)
+ - change _generate_op_module_signature get_module_file open with encoding=utf-8,it fix some encode error in Chinese windows system. (#16738)
+ - Fix rtrue_divide grad (#16769)
+ - fix inv test flakiness using random matrices generated by SVD (#16782)
+ - [MXNET-1426] Fix the wrong result of sum, mean, argmin, argmax when inputs contain inf or nan (#16234)
+ - Fix (#16781)
+ - fix expand_dims fall back when input's ndim is 0 (#16837)
+ - [fix] missing input log higher order. (#15331)
+ - Fix IndentationError in setup.py (#16857)
+ - Fix a few np issues (#16849)
+ - Fix InferAttr/InferShapeAttr not calling inference for all nodes in a graph (#16836)
+ - fix for enable model parallelism for non-fp32 data (#16683)
+ - Fix NDArrayIter iteration bug when last_batch_handle='pad' (#16166)
+ - Fix crashing on Windows in ObjectPool ~ctor (#16941)
+ - Fix NDArrayIter cant pad when size is large (#17001)
+ - fix axis=-1 bug (#17016)
+ - Fix CUDNN detection for CMake build (#17019)
+ - Fix omp assert issue (#17039)
+ - mshadow: fix vector access (#17021)
+ - [BUGFIX] Fix race condition in kvstore.pushpull (#17007)
+ - [BUGFIX] Fix trainer param order (#17068)
+ - [BugFix] fix filter channel calculation in ModulatedDeformableConvV2 (#17070)
+ - Fix reshape interoperability test (#17155)
+ - fix norm sparse fallback (#17149)
+ - fix py27 quantization (#17153)
+ - fix int8 add ut (#17166)
+ - Fix and clean up Ubuntu build from source instructions (#17229)
+ - fix lstm layer with projection save params (#17266)
+ - Fix rendering of ubuntu_setup.md codeblocks (#17294)
+ - Fix #17267, add expected and got datatype for concat error msgs (#17271)
+ - [BUGFIX] fix model zoo parallel download (#17372)
+ - fix use int8, uint8, int32, int64 (#17188)
+ - [Fix] Add ctx to the original ndarray and revise the usage of context to ctx (#16819)
+ - Fix ndarray indexing bug (#16895)
+ - fix requantize flaky test (#16709)
+ - Initial checkin (#16856)
+ - Fix flakey test_ndarray.py:test_reduce (#17312)
+ - fix flaky test: boolean index and fix bugs (#17222)
+ - Fix IOT Devices section of Get Started page (#17326)
+ - add logic for no batch size while getting data arrays from executors (#17772) (#18122)
+ - Fix reverse shape inference in LayerNorm (#17683)
+ - fix full and full_like when input is boolean (#17668)
+ - Fix MBCC inference (#17660)
+ - Additional fix for vector access. (#17230)
+ - Cherrypick Fix nightly large_vector test caused by incorrect with_seed path (#18178) (#18220)
+ - [1.7] Pass args fix3 (#18237)
+ - fixing batch_norm and layer_norm for large tensors (#17805) (#18261)
+ - [1.7.x] Backport of LSTM and GRU fix (#17898) and RNN op (#17632) (#18316)
+ - [v1.7.x] backport #18500 - [Bug Fixed] Fix batch norm when grad_req is `add` (#18517)
+ - Fix the monitor_callback invalid issue during calibration with variable input shapes (#18632) (#18703)
+
+### Front end API
+ - Fix the problem in printing feature in c++ API examples : feature_extract (#15686)
+ - updating MXNet version to 1.6.0 in base.h for C APIs (#16905)
+ - [API] unified API for custom kvstores (#17010)
+ - fix parameter names in the estimator api (#17051)
+ - adding docs for 64bit C APIs of large tensor (#17309)
+ - Add API docs to INT64 APIs (#16617)
+
+#### Gluon
+ - [Quantization] Enhance gluon quantization API (#16695)
+ - [Gluon] Improve estimator usability and fix logging logic (#16810)
+ - Fix test_gluon.py:test_sync_batchnorm when number of GPUS > 4 (#16834)
+ - [Gluon] Update contrib.Estimator LoggingHandler to support logging per batch interval (#16922)
+ - Include eval_net the validation model in the gluon estimator api (#16957)
+ - Fix Gluon Estimator nightly test (#17042)
+ - [MXNET-1431] Multiple channel support in Gluon PReLU (#16262)
+ - Fix gluon.Trainer regression if no kvstore is used with sparse gradients (#17199)
+ - refactor gluon.utils.split_data() following np.array_split() (#17123)
+ - Add RandomApply in gluon's transforms (#17242)
+ - Partitioning Gluon HybridBlocks (#15969)
+ - Random rotation (#16794)
+ - bump up atol for gradient check (#16843)
+ - Extend estimator.evaluate() to support event handlers (#16971)
+ - [MXNET-1438] Adding SDML loss function (#17298)
+
+#### Symbol
+ - Add unoptimized symbol to executor for sharing (#16798)
+ - Enforces NDArray type in get_symbol (#16871)
+ - Fix #17164 symbolblock with BatchNorm inside during cast to fp16 (#17212)
+ - autograd video and image link fixes and removing symbol tutorials (#17227)
+ - Fix CosineEmbeddingLoss in when symbol API is used (#17308)
+ - Fix Horovod build error due to missing exported symbols (#17348)
+ - Update symbol.py (#17408)
+ - update symbol to json (#16948)
+
+### Language Bindings
+#### Python
+ - Python 2 compatibility fix in base.py
+ - adding stacktrace in Jenkinsfile_utils.groovy to inspect Python2 failure cause in CI (#17065)
+ - Fix image display in python autograd tutorial (#17243)
+ - Fix Python 3 compatibility in example/speech_recognition (#17354)
+ - Stop testing Python 2 on CI (#15990)
+ - Docs: Python tutorials doc fixes (#17435)
+ - pin python dependencies (#17556)
+ - Python 2 cleanup (#17583)
+
+#### C/C++
+ - Simplify C++ flags (#17413)
+
+#### R
+ - fix R docs (#16733)
+ - [R package] Make R package compilation support opencv 4.0 (#16934)
+ - Support R-package with cmake build and fix installation instructions (#17228)
+ - Fix R-package/src/Makevars for OpenCV 4 (#17404)
+ - Fix typo in Install the MXNet Package for R (#17340)
+
+#### Clojure
+
+#### Julia
+ - [MXNET-1440] julia: porting `current_context` (#17142)
+ - julia: porting `context.empty_cache` (#17172)
+ - pin Markdown version to 3.1 in Julia doc build (#17549)
+
+#### Perl
+ - [Perl] - ndarray operator overloading enhancements (#16779)
+ - MXNET-1447 [Perl] Runtime features and large tensor support. (#17610)
+
+#### Scala
+ - Fix scala publish & nvidia-docker cublas issue (#16968)
+ - Fix publishing scala gpu with cpu instance (#16987)
+ - swap wget to curl in Scala scripts (#17041)
+ - [Scala/Java] Remove unnecessary data slicing (#17544)
+ - quantile_scalar (#17572)
+ - Fix get_started scala gpu (#17434)
+ - Fix MBCC & scala publish pipeline (#17643)
+ - Bump up additional scala 1.x branch to 1.7.0 (#17765)
+
+### Performance improvements
+ - Build.py improvement (#16976)
+ - Improvements to config.cmake (#17639)
+ - [Done] BilinearResize2D optimized (#16292)
+ - Speed fused_op compilation by caching ptx and jit-compiled functions (#16783)
+ - Improve the speed of the pointwise fusion graph pass (#17114)
+ - broadcast_axis optimization (#17091)
+ - Optimize AddTakeGrad Tensor Sum (#17906) (#18045)
+
+### Example and tutorials
+ - Add CustomOp tutorial doc (#17241)
+ - Correct the grammar in 1-ndarray tutorial (#17513)
+
+### Website and documentation
+ - Website edits (#17050)
+ - [Website 2.0] Nightly Build for v1.x (#17956)
+ - [docs] Fix runtime feature detection documentation (#16746)
+ - Adding user guidelines for using MXNet built with Large Tensor Support (#16894)
+ - fix typo and doc (#16921)
+ - large tensor faq doc fix (#16953)
+ - [DOC] Add a few tips for running horovod (#17235)
+ - Update NOTICE to fix copyright years (#17330)
+ - [DOC] Fix tutorial link, and better error msg (#17057)
+ - doc fix for argmax & argmin (#17604)
+
+### CI/CD
+ - support mixed-precision true_divide (#16711)
+ - Try to fix CI (#16908)
+ - mixed precision for power (#16859)
+ - Fix desired precision for test_ndarray.py:test_reduce (#16992)
+ - [reproducibility] multi_sum_sq review, AtomicAdd removal (#17002)
+ - fix precision problem in linalg_solve, linalg_tensorinv, linalg_cholesky op test (#16981)
+ - grouping large array tests based on type and updating nightly CI function (#17305)
+ - [LICENSE] fix cpp predcit license (#17377)
+ - [CI] Fix static build pipeline (#17474)
+ - skipping tests that cannot fit in nightly CI machine corrected imports (#17450)
+ - Update Windows CI scripts to use syntax compatible with Win 2019 server powershell. (#17526)
+ - Fix Non-ASCII character in docstring (#17600)
+ - [CI] Follow redirects when downloading apache-maven-3.3.9-bin.tar.gz (#17608)
+ - [CI] Upgrade sphinx and autodocsumm (#17594)
+ - Reduce load on CI due to excessive log flood (#17629)
+ - Enable users to specify BLAS (#17648)
+ - [CI] Add AMI id to instance info on builds (#17649)
+ - [v1.7.x] Backport staggered CI builds (#17999 & #18119) (#18142)
+ - [v1.7.x] Backport #17177 to 1.7.x (Fix incorrect calculation results when the C locale is set to a locale that uses commas as the decimal separator) (#18147)
+ - Fix formatting and typos in CD README.md (#16703)
+ - [CD] dynamic libmxet pipeline fix + small fixes (#16966)
+ - [CD] enable s3 publish for nightly builds in cd (#17112)
+ - [CD] fix CD pipeline (#17259)
+ - [CD] update publish path (#17453)
+ - fix CD and remove leftover from #15990 (#17551)
+ - Fix nightly build (#16773)
+ - Update pypi_publish.py to disable nighlty build upload to Pypi (#17082)
+ - [v1.7.x] update jetson dockerfile to support CUDA 10.0 (#18339)
+ - Remove manually created symbolic link to ninja-build (#18437) (#18456)
+ - Increase staggered build timeout to 180 min (#18568) (#18585)
+
+### License
+ - Don't relicense FindCUDAToolkit.cmake (#17334)
+ - fix license and copyright issues (#17364)
+ - Update ps-lite LICENSE (#17351)
+ - remove unused file with license issue (#17371)
+ - Update LICENSE for fonts (#17365)
+ - license np_einsum file under bsd (#17367)
+ - Update Apache License for mshadow (#18109) (#18134)
+ - Julia: remove downloading of the non-ASF binary build (#18489) (#18502)
+ - Add missing license header for md files (#18541)
+ - [v1.7.x]License checker enhancement (#18478)
+
+### Miscellaneous changes
+ - Link fixes4 (#16764)
+ - Refactoring names for mxnet version of nnvm to avoid conflicting with the original tvm/nnvm. (#15303)
+ - minor typo fix (#17008)
+ - Add micro averaging strategy to pearsonr metric (#16878)
+ - introduce gradient update handler to the base estimator (#16900)
+ - fix latency calculation and print issue (#17217)
+ - add inference benchmark script (#16978)
+ - change the wording and log level to be more in line with the general use (#16626)
+ - Updated logos. (#16719)
+ - Pinning rvm version to satisfy Jekyll build (#18016)
+ - Workaround gnu_tls handshake error on Ubuntu 14.04 Nvidia Docker (#18044)
+
+## 1.6.0
+
+### Deprecation of Python 2
+
+MXNet community [voted](https://lists.apache.org/thread.html/r3a2db0f22a1680cc56804191446fef2289595798ca19fd17de1ff03e%40%3Cdev.mxnet.apache.org%3E) to no longer support Python 2 in future releases of MXNet. Therefore, MXNet 1.6 release is going to be the last MXNet release to support Python 2.
+
+### New features
+
+#### NumPy compatible interface and using TVM to generate operators
+
+NumPy has long been established as the standard math library in Python, the most prevalent language for the deep learning community. With this library as the cornerstone, there are now the largest ecosystem and community for scientific computing. The popularity of NumPy comes from its flexibility and generality.
+
+In #14253, the MXNet community reached consensus on moving towards a NumPy-compatible programing experience and committed to a major endeavor on providing NumPy compatible operators.
+
+The primary goal of the projects below is to provide the equivalent usability and expressiveness of NumPy in MXNet to facilitate Deep Learning model development, which not only helps existing deep learning practitioners but also provides people in the existing NumPy community with a shortcut for getting started in Deep Learning. The efforts towards this goal would also help a secondary goal, which is to enable the existing NumPy ecosystem to utilize GPUs and accelerators to speed up large scale computation.
+
+ - Infra to use tvm write op kernels (#15550)
+ - fix boolean_mask for 0-size output (#15731)
+ - fix tvm cmake (#15781)
+ - Numpy-compatible Infra (#15581)
+ - [MXNET-1206] Support NDArray indexing with None and Ellipsis (#13143)
+ - numpy-compatible sum (#15810)
+ - [Numpy] Numpy compatible slicing (#15798)
+ - Numpy Tensordot and Dot Operator (#15820)
+ - numpy linspace (#15852)
+ - tvm infra for op attrs (#15854)
+ - Port several np ops to master (#15867)
+ - numpy-compatible split upstream (#15841)
+ - Numpy-compatible concatenate upstream (#15894)
+ - Numpy-compatible stack upstream (#15842)
+ - [Numpy] Numpy behavior random.uniform() (#15858)
+ - Tvm broadcast backward (#15938)
+ - np elemwise unary ops upstream (#15831)
+ - [Numpy] random.randint() implemented (#15956)
+ - Refines NDArray indexing and adds numpy ndarray indexing [READY FOR REVIEW] (#15942)
+ - Port ops from np branch (#16018)
+ - numpy-compatible cumsum upstream (#15924)
+ - NumPy-compatible infrastructure on Gluon (#16024)
+ - [OP] Support range as advanced index for ndarrays (#16047)
+ - Numpy compatible max min (#16046)
+ - NumPy-compatible Mean, Std and Var (#16014)
+ - Add fluent methods mean, std, var for ndarray (#16077)
+ - numpy multinomial op (#15878)
+ - add numpy operator remainder (#16080)
+ - [Numpy] Random.choice implemented (#16089)
+ - Fix sample.normal shape inference
+ - Numpy add numpy op indices (#15837)
+ - [Numpy] Numpy copysign (#15851)
+ - numpy operator ravel, derive from reshape (#16016)
+ - Add __array_function__
+ - Improved error mesages
+ - Fix np.choice
+ - add exception check for numpy reshape (#16180)
+ - [Numpy] Numpy behavior normal distribution (#16109)
+ - fix multinomial bug on gpu (#16204)
+ - [Numpy] Differentiable svd (#15795)
+ - add epsilon to sum(pvalue) upperbound (#16211)
+ - np compatible vstack (#15850)
+ - Numpy add numpy op roll (#15902)
+ - add numpy compatible trace (#16008)
+ - add numpy op hanning, hamming, blackman (#15815)
+ - [Numpy]flip (#15819)
+ - numpy operator around (#16126)
+ - numpy operator arctan2 (#15890)
+ - numpy operator nonzero (#15838)
+ - numpy operator hypot (#15901)
+ - tvm numpy operator deg2rad && rad2deg (#16015)
+ - numpy op unique
+ - try to fix bug
+ - fix memory bug and disable some test
+ - fix according to review
+ - Numpy operators: `lcm`, `tril`, `identity` and `take` (#16264)
+ - [numpy] Cosmetic improvement on mxnet.numpy builtin op signature in documentation (#16305)
+ - Disable Pylint false error in numpy_op_signature (#16370)
+ - boolean_mask_assign operator for future boolean indexing (#16361)
+ - Implements ldexp. (#15845)
+ - Numpy Operators: Inner, Outer, vdot (#15846)
+ - Numpy det and slogdet operators (#15861)
+ - Fix random op signature
+ - fix choice signature
+ - add raise test for shape
+ - Add boolean ndarray (#15940)
+ - global numpy shape flag (#16335)
+ - numpy-compatible histogram (#16266)
+ - [Numpy] Numpy compatible dstack (#15871)
+ - numpy eye op (#16132)
+ - Numpy compatible vsplit; minor changes to split (#15983)
+ - add numpy op logspace (#15825)
+ - add numpy op bitwise_xor, hsplit, moveaxis, rot90 (#16257)
+ - Fix optimizer bug for np attribute (#16494)
+ - Tests of NumPy interoperability (#16469)
+ - improve unary and binary operator handling and refactor tests (#16423)
+ - [DOC] Fix numpy op doc (#16504)
+ - [Numpy] More numpy dispatch tests (#16426)
+ - [Numpy] einsum (#15911)
+ - Add test pipeline for USE_TVM_OP=OFF on Unix (#16450)
+ - Numpy dispatch test of ...... (#16422)
+ - setup and concatenate, copy, expand_dims, expm1 (#16493)
+ - add sum for boolean type in mainline (#16436)
+ - [Numpy] SVD outputs tuple (#16530)
+ - numpy op doc: max, min, prod (#16506)
+ - add interface for rand
+ - Fix numpy bugs (#16537)
+ - pickler override for np ndarrays (#16561)
+ - [numpy]op test in new pattern (#16556)
+ - Enforce adding documentation for builtin numpy operators (#16575)
+ - [Numpy] Support N_D(N>=3) batch_dot (#16586)
+ - [Numpy] Loading numpy-incompatible NDArray in numpy-compatible mode (#16597)
+ - Fix index overflow bug in einsum (#16589)
+ - add npx reshape (#16640)
+ - add type switch to weight tensor (#16543)
+ - numpy doc enhancement (#16637)
+ - Infra for tvm op runtime dispatch (#16100)
+ - [NumPy][Operator] NumPy operator `may_share_memory` and `shares_memory` (#16533)
+ - [Numpy] Numpy operator diff (#15906)
+ - Miscellaneous fix for several numpy issues (#16664)
+ - [Numpy] implement np.column_stack (#16594)
+ - [numpy] add numpy operator : append (#16564)
+ - Backport of #16711, #16737, #16408 to 1.6 branch (#16763)
+ - Backport to 1.6 (#16773, #16781, #16783, #16716, #16699, #16728, #16769, #16792) (#16832)
+ - [Backport][v1.6.x] Fix the wrong result of sum, mean, argmin, argmax when inputs contain inf or nan (#16884)
+ - Backport of #16827, #16791 and #16888 to 1.6 branch (#16901)
+ - port shape op to 1.6.x (#16912)
+ - [Numpy] Fix imperative basic indexing in numpy (#16902) (#16919)
+ - Backport #16895, #16922, #16878, #16979 and #16900 to 1.6 (#17029)
+
+
+#### Graph optimizations
+
+##### Pointwise fusion for GPU
+
+DL models, besides compute intensive operations like convolutions and fully connected layers, feature a lot of simple pointwise (aka elementwise) operations (like elementwise addition etc.). Performance of those operations is fully memory bandwidth bound and so limit speedups from newer GPU hardware, which typically has high compute/memory bandwidth ratio. When multiple of such operations are chained one after another, it results in a series of unnecessary stores and loads as well as potential increased memory usage to store the intermediate results. Pointwise fusion helps in alleviating those problems by just-in-time generation of fused operators, which do not store intermediate results in memory, resulting in performance and memory usage improvements.
+
+ - Pointwise fusion for GPU (#15167)
+ - Backport #16798, #16836 and #16838 to 1.6 (#16874)
+ - Add support for boolean inputs to FusedOp (#16796) (#16892)
+ - Workaround problem with fusion in CUDA 9 (#17028) (#17035)
+
+##### Eliminate common subexpressions
+
+ - Eliminate common expressions (#15657)
+
+##### Default MKLDNN Subgraph fusion
+
+ - [MKLDNN] Enable subgraph backend mkldnn by default. (#15518)
+
+#### New operators
+
+ - [OP] Add a new arange_like operator to contrib (#15400)
+ - PDF operators for each distribution for which we have a random sampler (plus also the PDF of the Dirichlet). Supports probabilities and log-probabilities, as well as gradients. (#14617)
+ - Group Normalization (#14959)
+ - Add RROIAlign (#16017)
+ - Add fast implementation of LARS (#16122)
+ - Round and sign straight-through-estimators C operators. (#16373)
+ - New ops for RCNN + old ops improvements for RCNN (#16215)
+ - Comparison ops implemented using mshadow (#16414)
+ - Add mask target generator operator for Mask-RCNN (#16268)
+ - Move MRCNNMaskTarget op to contrib (#16486)
+ - Mxnet allclose (#14443)
+ - Aggregated adamw update (#16398)
+ - Make mrcnn_mask_target arg mask_size a 2d tuple (#16567)
+ - Dgl ops 2 (#16416)
+ - Lamb optimizer update (#16715)
+ - [OP] changing data type of 't' to int in lamb_update_phase1 (#16903)
+ - Multi Precision Lamb Update operator (#16885)
+ - Interleaved MHA for CPU path (#17138) (#17211)
+
+### Feature improvements
+
+#### Automatic Mixed Precision
+
+ - [AMP] Move topk from FP16_FP32_FUNCS to FP32_FUNCS (#15342)
+ - Conversion from FP32 model to Mixed Precision model (#15118)
+ - Update fp16 docs: Block.cast is inplace (#15458)
+ - FP16 Support for C Predict API (#15245)
+ - Add AMP Conversion support for BucketingModule (#15528)
+
+#### Gluon Fit API
+
+ - Fixing build for gluon estimator test, including libtvm in pack libs (#16148)
+ - [Estimator] handle composite metrics in estimator (#16676)
+ - [Estimator] refactor estimator to allow overriding evaluate/fit of a batch (#16678)
+ - [Estimator] refactor estimator and clarify docs (#16694)
+ - [Gluon] Improve estimator usability and fix logging logic (#16810) (#16846)
+ - Backport Gluon estimator changes to 1.6 (#17048)
+ - fix parameter names in the estimator api (#17051) (#17162)
+
+
+#### MKLDNN
+
+ - Upgrade MKL-DNN submodule to v0.20 release (#15422)
+ - Fix quantized concat when inputs are mixed int8 and uint8 (#15693)
+ - [MKLDNN]Enhance Quantization APIs and Tutorial (#15448)
+ - Add quantization support for GluonCV (#15754)
+ - add int8 bn mkldnn implementation and test (#15664)
+ - [Quantization]support exclude operators while quantization (#15910)
+ - [MKLDNN]Support fullyconnected and element-wise ops fusion (#15950)
+ - Disable test coverage for Clang MKLDNN (#15977)
+ - update support MKLDNN BN conditions (#15870)
+ - [MKLDNN] Fix out of bound access of req vector (#16000)
+ - add uint8 bn mkldnn implementation (#16003)
+ - Improve quantization flow (#15961)
+ - [MKLDNN] fix uint8 batch norm memory misuse (#16034)
+ - MKL-DNN RNN checks NDArray version (#16071)
+ - Float64 fallback for mkldnn subgraph and rnn op (#15853)
+ - Update MKL-DNN dependency (#16073)
+ - Integrate MKL-DNN leakyrelu (#16075)
+ - [MKLDNN] NDArray reorder in C API and deconv (#16265)
+ - Fix mkldnn reshape (#16455)
+ - [MKLDNN] Fix uint quantized fc when not fusing with requantize (#16523)
+ - [MKLDNN]Fix reorder2default (#16602)
+ - Upgrade MKL-DNN dependency to v1.0 (#16555)
+ - Revert "[MKLDNN]Fix reorder2default (#16602)" (#16697)
+ - [v1.6.x] Backport #16837 into v1.6.x (#16847)
+ - Initial checkin (#16856) (#16872)
+
+#### Large tensor support
+
+ - [MXNET-1413] Adding Large Tensor support for sort operators (#15170)
+ - Large Index Support for Slice (#15593)
+ - Add large tensor support binary arithmetic (#15785)
+ - Large tensor support for random ops (#15783)
+ - Add Large Tensor Support for Sequence, NN Ops (#15807)
+ - Add power, exponent, log ops large tensor support (#15794)
+ - removing unnecessary int64 C apis that were added to support Large Tensors and Vectors (#15944)
+ - creating ndarray directly using mxnet ndarray primitives to reduce memory footprint of tests for topk, sort and argsort (#15900)
+ - Adding tests to verify support for Large Tensors in additional Ops along with new C_Apis supporting 64bit indexing (#15895)
+ - Added tests to verify Large Vector Support for initial set of ops (#15943)
+ - Added more tests for Large Indices (#15960)
+ - Add Large tensor vector test cases (#15941)
+ - Test large vector mean operator and fix a few bugs (#16079)
+ - Reducing memory footprint of one_hot for Large Array Testing (#16136)
+ - removing MXNDArrayLoadFromBuffer64 and MXNDArrayLoad64 (#16203)
+ - Fix large array tests (#16328)
+ - added more tests to verify support for large vector (#16477)
+ - added support for large tensors for Dropout operator and tests to verify support for more operators (#16409)
+ - adding large tensor support for add_n and tests for more ops (#16476)
+ - adding large tensor support for pad operator (#15126)
+ - Added large tensor support and test for gather_nd (#16371)
+ - Large Vector tests for DGL Ops Part 2 (#16497)
+ - Showing proper error message when an attempt is made to create large tensor but MXNet is not built with it (#16570)
+
+#### TensorRT integration
+
+ - enable TensorRT integration with cpp api (#15335)
+ - Add unit tests for TensorRT integration and fix some bugs (#15399)
+
+#### Higher order gradient support
+
+ - [MXNET-978] Higher order gradient for sigmoid (#15288)
+ - [MXNET-978] Higher Order Gradient Support `reciprocal`, `abs`. (#15413)
+ - [MXNET-978] Add higher order gradient support `tan`, `tanh` (#15253)
+ - [MXNET-978] Higher Order Gradient Support `arctan`, `arctanh`, `radians`. (#15531)
+ - [MXNET-978] Higher Order Gradient Support `sqrt`, `cbrt`. (#15474)
+ - [MXNET-978] Higher Order Gradient Support `clip`, `dropout`. (#15746)
+ - [MXNET-978] Higher Order Gradient Support `sinh`, `cosh`. (#15412)
+ - [MXNET-978] n-th order gradient test support. (#15611)
+ - [MXNET-978] Fully connected, higher order grad (#14779)
+ - [MXNET-978] Higher Order Gradient Support `arcsinh`, `arccosh`. (#15530)
+
+#### Operator improvements
+
+ - broadcast axis is alias to broadcast axes; doc fix (#15546)
+ - Utility to help developers debug operators: Tensor Inspector (#15490)
+ - Softmax with length (#15169)
+ - in-place reshape ops (#14053)
+ - Add missing default axis value to symbol.squeeze op (#15707)
+ - Add matrix determinant operator in linalg (#15007)
+ - Add fp16 support for topk (#15560)
+ - [MXNET-1399] multiclass-mcc metric enhancements (#14874)
+ - new raise mode for nd.take and fix backward for wrap mode (#15887)
+
+#### Profiler
+
+ - Fixing duplication in operator profiling (#15240)
+ - Custom Operator Profiling Enhancement (#15210)
+ - [Opperf] Make module/namespace of the operator parameterized (#15226)
+ - Opperf: Support Python<3.6 (#15487)
+ - Add transpose_conv, sorting and searching operator benchmarks to Opperf (#15475)
+ - Deprecate USE_PROFILER flag (#15595)
+ - Update profiler.md (#15477)
+ - [Opperf] Add array rearrange operators to opperf (#15606)
+ - [OpPerf] PDF Random ops fix (#15661)
+ - [Opperf] Add optimizer update operator benchmarks to opperf (#15522)
+ - fix broadcast op param (#15714)
+ - [OpPerf] Profiler flag for Python, Cpp (#15881)
+ - [Opperf] Filter out deprecated ops (#15541)
+ - [OpPerf] Handle positional arguments (#15761)
+ - [OpPerf] Take care of 4d param (#15736)
+ - Add Median,p50,p99 to python profiler (#15953)
+ - adding "total" (total time) to profiler aggregate stats sorting criteria (#16055)
+
+#### ONNX import/export
+
+ - Correct ONNX documentation (#15914)
+ - [MXNET-895] ONNX import/export: TopK (#13627)
+
+#### Runtime discovery of features
+
+ - Making Features as a singleton for improved caching (#15835)
+
+#### Bug fixes
+
+ - [bug] fix higher grad log (#15120)
+ - Showing proper error when csr array is not 2D in shape. (#15242)
+ - add 'asnumpy' dtype option to check_symbolic_backward (#15186)
+ - point fix the vector declaration in MultiBoxDetection (#15300)
+ - Temporarily Commenting out Flaky Test (#15436)
+ - Fix memory leak in NaiveEngine (#15405)
+ - fix nightly CI failure (#15452)
+ - Small typo fixes in batch_norm-inl.h (#15527)
+ - Bypass cuda/cudnn checks if no driver. (#15551)
+ - Julia path patch (#15561)
+ - Fix AMP Tutorial failures (#15526)
+ - Fix warnings in CLang: (#15270)
+ - Fix dumps for Constant initializer (#15150)
+ - fix normalize mean error bug (#15539)
+ - [fix] print `self` in warning. (#15614)
+ - [MXNET-1411] solve pylint error issue#14851 (#15113)
+ - [Flaky test] Skip test_operator_gpu.test_convolution_independent_gradients (#15631)
+ - Fix subgraph with custom_op (#15671)
+ - Fix USE_BLAS == openblas check (#15691)
+ - update previous flaky naive engine test (#15651)
+ - make TransposeShape infer shape form both sides (#15713)
+ - Skip Flaky Test (#15722)
+ - Revert "Dynamic Library Loading Support" (#15755)
+ - Fix flaky test test_global_metric (#15756)
+ - Fix PR #15489 (Dynamic Library Loading Support) (#15760)
+ - Refactor LibraryInitializer so it's thread safe. Fixes random sporadical concurrency crashes. (#15762)
+ - Fix backward_clip num inputs and type of clip params (#15688)
+ - fixing problem with existing Singleton Caching (#15868)
+ - Allow operators with multiple outputs in get_atomic_symbol (#15740)
+ - Fix ConcatType backward type inference (#15829)
+ - Add disable attr to subgraph property (#15926)
+ - Re-enable flaky test_prelu (#15777)
+ - declare explicitly the tblob default assign operator and copy constructor (#15937)
+ - Discard needless test cases in `test_convolution_independent_gradients` (#15939)
+ - fix naive engine for multi-threaded inference (#15574)
+ - Fix get_rows_per_block (#15979)
+ - Fix a memory misalignment in topk operator (#15948)
+ - Decouple dtype from shape for Random multinomial (#15980)
+ - Fix dtype inference in arange_like operator (#15930)
+ - Disable laop_6 (#15976)
+ - Fix flaky clojure profile test (#16058)
+ - fix test_pick test time is too long (#16066)
+ - [fix] Support nullop in `transpose` (#15865)
+ - fix flaky test (#16074)
+ - fix some test files test time is too long (#16067)
+ - Fix gradient tensor mutate in `{adam/ftrl/rmprop/rmspropalex}_update`. (#15768)
+ - Fix unary operator ceil/floor/trunc when data type is integer (#14251)
+ - Fix failing tests (#16117)
+ - Fixes NAG optimizer #15543 (#16053)
+ - avoid test relu at the origin due to discontinuous gradient (#16133)
+ - Fix remaining errors reported by D2L (#16157)
+ - use 1E-4 in groupnorm test(#16169)
+ - Sequence last fix (#16156)
+ - fixing test for model compatibility checker (#16159)
+ - assert_allclose -> rtol=1e-10 (#16198)
+ - [MEMORY] retry GPU memory allocation if fragmented (#16194)
+ - improve dataloader signals and messages (#16114)
+ - Update ndarray.py (#16205)
+ - fix flaky test (#16191)
+ - Solve #14116, #15143 (#15144)
+ - [MXNET-1422] Fix wrong results of min([inf, inf]) and max([-inf,-inf]) (#16226)
+ - Fix inconsistent interpolation method values (#16212)
+ - set fixed seed for profiler (#16155)
+ - Fix MXNDArrayGetData (#16289)
+ - fix atol for test_preloaded_multi_sgd (#16356)
+ - Fix windows flakiness (#16415)
+ - cuDNN non-persistant bidirectional RNN dgrad sync fix (#16391)
+ - [BUGFIX] Minor type issues in Squeeze (#16448)
+ - Fix Nightly Tests for Binaries (#16451)
+ - Fix dtype bug (#16467)
+ - Fix flakey pylint CI failures (#16462)
+ - Load NDArray only to GPU if GPU is present (#16432)
+ - Bug fix for the input of same axes of the swapaxes operator (#16513)
+ - Fix learning rate scheduler being unexpectedly overwritten by optimizer's default value (#16487)
+ - disable tests (#16536)
+ - fix pylint in CI (#16540)
+ - image crop gpu (#16464)
+ - Build dmlc-core with old thread_local implementation (#16526)
+ - fix doc for topk (#16571)
+ - RNNOp to call cudaEventCreate lazily (#16584)
+ - add encoding to the stub files for potential utf8 char in doc strings (#16580)
+ - Surpress subgraph log in CI (#16607)
+ - Fix dequantize memory corruption (#16606)
+ - Fix for wrong reqs set after switching from training to inference (#16553)
+ - Disables test_bulking_operator_gpu due to flakiness (#16611)
+ - Imagenet inference to nightly fix (#16599)
+ - Move some subgraph verbose to MXNET_SUBGRAPH_VERBOSE=2 (#16622)
+ - RNNOp only call cuda/cudnn if GPU ctx is requested (#16632)
+ - fix bad encode (#16641)
+ - Disable float16 test (#16643)
+ - Fix GetMKLDNNData for delay alloc (#16618)
+ - Move ops which don't support FP16 dtype to FP32 list (#16668)
+ - no such method => modified function args (#16610)
+ - fix cuDNN RNN dtype_with_fallback_ bug (#16671)
+ - Add check if scipy is imported in sparse.py (#16574)
+ - Added launch bounds to the reduce kernels (#16397)
+ - fix install dir (#16690)
+ - fix binary dependencies in CD and nightly (#16693)
+ - Fix SliceChannel Type inference (#16748) (#16797)
+ - fix flakiness of test_np_mixed_precision_binary_funcs (#16873)
+ - Fix test_gluon.py:test_sync_batchnorm when number of GPUS > 4 (#16835)
+ - Omp fork numthreads fix 1.6 (#17000)
+ - [BUGFIX] Fix race condition in kvstore.pushpull (#17007) (#17052)
+ - Backport #17002, #17068 and #17114 to 1.6 branch (#17137)
+ - Backport 3rdparty/openmp fixes (#17193)
+ - fix norm sparse fallback (#17149)
+
+### Front end API
+
+ - Expose get_all_registered_operators and get_operator_arguments in the… (#15364)
+ - Add magic method `abs` to NDArray and Symbol. (#15680)
+ - Dynamic Library Loading Support (#15489)
+ - [MXNET-1294] Add KVSTORE PushPull API (#15559)
+
+#### Gluon
+
+ - [Dataset] Add take, filter, sample API to dataset (#16078)
+ - Add register_op_hook for gluon (#15839)
+ - [Dataset] add shard API (#16175)
+ - Add list_ctx to ParameterDict (#16185)
+ - [Gluon] Support None argument in HybridBlock (#16280)
+ - Aggregated zero grad (#16446)
+ - try to fix block (#16465)
+ - [Gluon] Don't serialize shared parameters twice (#16582)
+ - Initializer.__eq__ (#16680)
+
+#### Symbol
+
+ - Add symbol api for randn and fix shape issue for randn ndarray and symbol api (#15772)
+ - Graph Partition API (#15886)
+
+### Language Bindings
+
+#### Python
+
+MXNet community [voted](https://lists.apache.org/thread.html/r3a2db0f22a1680cc56804191446fef2289595798ca19fd17de1ff03e%40%3Cdev.mxnet.apache.org%3E) to no longer support Python 2 in future releases of MXNet. Therefore, MXNet 1.6 release is going to be the last MXNet release to support Python 2.
+
+#### C/C++
+
+ - [C++] Improve inference script to support benchmark on Imagenet (#15164)
+ - C Api for simplebind, fix comment for trigoops, add atol to assert (#16585)
+
+#### Clojure
+
+ - Extend Clojure BERT example (#15023)
+ - [Clojure] Add fastText example (#15340)
+ - make clojure api generator tests less brittle (#15579)
+
+#### Julia
+
+ - add julia env settings (#15523)
+ - julia: bump window prebult binary version to v1.5.0 (#15608)
+ - julia: remove Travis CI related files (#15616)
+ - julia: bump binding version to v1.6.0 (#15607)
+ - julia: rename build env var `MXNET_HOME` to `MXNET_ROOT` (#15568)
+ - Revert "julia: rename build env var `MXNET_HOME` to `MXNET_ROOT` (#15568)" (#16147)
+ - julia: fix `mx.forward` kwargs checking (#16138)
+ - julia: implement `context.num_gpus` (#16236)
+ - julia: add `AbstractMXError` as parent type (#16235)
+ - [MXNET-1430] julia: implement context.gpu_memory_info (#16324)
+ - julia/docs: more DRY on page rendering (#16396)
+
+#### Perl
+
+ - [Perl] - simplify aliasing strategy (#15395)
+ - [Perl] - ndarray to native array conversion fix (#16635)
+
+#### Scala
+
+ - Add Sparse NDArray support for Scala (#15378)
+ - fix the bug on Scala Sparse (#15500)
+ - fix heap-use-after-free in scala (#15503)
+ - Bump Scala version to 1.6 (#15660)
+ - Fix Scala Symbolic API some/Some typo (#15687)
+ - Faster Scala NDArray to BufferedImage function (#16219)
+
+### Performance improvements
+
+ - Proper bulking of ops not using FCompute (#15272)
+ - improve layernorm CPU performance (#15313)
+ - Efficient MXNet sampling in the multinomial distribution (#15311)
+ - Revert default return type for indices in argsort() and topk() back to float32 (#15360)
+ - Use omp threads for cpu data loader (#15379)
+ - Accelerate ROIPooling layer (#14894)
+ - Avoid memory copy for dropout inference (#15521)
+ - Add omp parallel optimization for _contrib_BilinearReisze2D (#15584)
+ - Softmax optimization for GPU (#15545)
+ - Speed up group executor (#16069)
+ - FullyConnected Bias performance improvement on GPU (#16039)
+ - Embedding gradient performance optimization on GPU (#16355)
+ - Faster Transpose 2D (#16104)
+ - Pseudo 2D transpose kernel (#16229)
+ - Faster general take (#16615)
+
+### Examples and tutorials
+
+ - [TUTORIAL] Gluon performance tips and tricks (#15427)
+ - Updating profiler tutorial to include new custom operator profiling (#15403)
+ - [TUTORIAL] Gluon and Sparse NDArray (#15396)
+ - [TUTORIAL] Revise Naming tutorial (#15365)
+ - Revise Symbol tutorial (#15343)
+ - Two fixes for info_gan.md example Code (#15323)
+ - Rebase #13757 to master (#15189)
+ - Tensor Inspector Tutorial (#15517)
+ - logging (#15106)
+ - update profiler tutorial (#15580)
+ - [MXNET-1358] Fit api tutorial (#15353)
+ - Tutorials nighly fix (#16179)
+ - Update add_op_in_backend.md (#16403)
+ - typo fix in r doc lstm tutorial (#16546)
+ - [MKL-DNN] Add mxnet mkldnn cmake tutorial (#16688)
+
+### Website and documentation
+
+ - [DOC] Clarify that global pooling is going to reset padding (#15269)
+ - Update sparse_retain Documentation (#15394)
+ - nano instructions (#15117)
+ - remove comments from nano instructions (#15433)
+ - REAME MTCNN Link URL Error in original website (#15020)
+ - Update Horovod docs links in README (#15366)
+ - fix doc for sort and argsort (#15317)
+ - fix comment (#15481)
+ - Improve docs for AMP (#15455)
+ - [Doc] Add MKL install method apt/yum into tutorial (#15491)
+ - Julia docs (#15454)
+ - Docs: Fix misprints (#15505)
+ - website build for julia: fix path to be static (#15554)
+ - some minor typos/clarifications (#15538)
+ - refine Nano setup directions (#15524)
+ - [Doc] add squeeze to Array change shape (#15549)
+ - fix typo (#15648)
+ - Fix url (404 error) (#15683)
+ - update julia install doc (#15609)
+ - [DOC] refine autograd docs (#15109)
+ - [DOC] Fix many arguments in the doc: reshape_like, arange_like, shape_array (#15752)
+ - Add Gather_nd Scatter_nd to NDArray API category doc (#15689)
+ - [Dependency Update] [Doc] move the general prerequisite software to the top (#15896)
+ - typo in docs (#16094)
+ - [WIP] New Website: New Docs [1/3] (#15884)
+ - [DOC] Fix doc for nn.Embedding, nn.Dense and nd.Embedding (#15869)
+ - [DOC] Consistent capitalization: mxnet -> MXNet, scala -> Scala (#16041)
+ - New Website: Remove Old Content [2/3] (#15885)
+ - New Website: New Pipeline [3/3] (#15883)
+ - Update KL Divergence formula (#16170)
+ - fix broken links (#16255)
+ - redirect to the 404 page (#16287)
+ - add google-analytics config (#16271)
+ - Fixing links for website + Fixing search (#16284)
+ - Minor fix in ToTensor documentation. (#16299)
+ - adding redirects so that old website API links surfaced from searches (#16342)
+ - Fix code block formatting in Why MXNet doc page (#16334)
+ - Julia: add API docs back (#16363)
+ - Change mailing list url in footer to point to instructions about how to subscribe instead (#16384)
+ - Add instructions to report a security vulnerability (#16383)
+ - [DOC] fix installation selector wrong history (#16381)
+ - Beta build (#16411)
+ - [WIP] Improving Python Docs API (#16392)
+ - fix autodoc for spurrious toggles (#16452)
+ - [Doc] Update the download page with 1.5.1 release (#16442)
+ - Fixing broken links (#16500)
+ - add binary and docs build command options (#16514)
+ - add option to remove indexes (#16525)
+ - Correct Google Analytics Tracker (#16490)
+ - [Doc] Use mirror link in the download page (#16501)
+ - checking broken link fixes work (#16538)
+ - detect number of procs during sphinx build (#16512)
+ - fixed broken links across multiple files (#16581)
+ - fix missing docs due to git add issues (#16496)
+ - second round of fixing broken links in multiple files (#16598)
+ - Python Docstring Convetion (#16550)
+ - [MXNET-1434] Fix a broken link for basic C++ tutorial (#16461)
+ - Fix python doc build issue (#16630)
+ - fixing broken links in multiple files - round 3 (#16634)
+
+### CI/CD
+
+ - Fix build_ccache_wrappers: (#14631)
+ - Remove mhard-float option. This is already deprecated by Google. (#15435)
+ - CI: upgrade Julia version from 1.0.3 to 1.0.4 (#15502)
+ - Add -R option to ci/build.py to avoid rebuilding containers (#15426)
+ - [Dependency Update] Bump up the CI Nvidia docker to CUDA 10.1 (#14986)
+ - fixed config.mk and Makefile bugs for installing mkl (#15424)
+ - Add -DMXNET_USE_OPENMP to Makefiles so libinfo gets updated accordingly (#15498)
+ - [Dependency Update] Dependency update doc (#15045)
+ - Remove Scala package test on build (#15915)
+ - Refactor for windows CI 'out of heap space' errors (#15922)
+ - Fix Nightly Maven GPU (#15989)
+ - Windows cmake flags cleanup (#16013)
+ - Disable flaky test in test_amp_conversion (#16031)
+ - Updates git_init Jenkins utility function to support checking out a particular commit id
+ - Adds artifact repository scripts
+ - Adds CD pipeline framework
+ - Adds static libmxnet release pipeline
+ - Updates CD pipeline
+ - Adds documentation
+ - Updates kvstore functions to use pushd and popd
+ - Throws exceptions instead o magic numbers
+ - Updates artifact repository cli to use --libtype instead of --static or --dynamic
+ - Clarifies ci_utils and cd_utils origin remark
+ - Adds clarifying note on why ubuntu 14.04 is being used for compilation
+ - Removes MXNET_SHA
+ - Removes set_release_job_name
+ - Adds license headers
+ - Updates artifact repository to expect licenses
+ - Moves ci/cd to cd directory
+ - Takes downstream job name from environment
+ - Updates order of parameters
+ - Updates job type parameter to dropdown
+ - Adds libmxnet feature extraction code comments
+ - Removes ccache setup from static build
+ - Disable test coverage of C++ codebase on CI (#15981)
+ - Update readme and project.clj comment (#16084)
+ - Enable tvm_op for ci (#15889)
+ - Not to search for coverage files when none exist (#16107)
+ - Fixes openblas installation for static build
+ - Update python dependencies (#16105)
+ - CD Fixes (#16127)
+ - Adds dynamic libmxnet to CD pipeline (#16163)
+ - Fix README Build Status (#16183)
+ - subscribe to build and CD changes (#16192)
+ - [CD] Add COMMIT_ID param to release job (#16202)
+ - Fix lack of dylib support in Makefile when use lapack (#15813)
+ - Removes git status update stop gap solution (#16285)
+ - add mkl installation temp fix (#16304)
+ - add 'Release' cmake flag (#16294)
+ - S3 upload artifacts (#16336)
+ - Fix nightly scala pipeline (#16362)
+ - remove redundant branch name (#16372)
+ - Skipping installing nightly test (#16418)
+ - Adds PyPI CD Pipeline (#16190)
+ - upgrade the pytest version (#16429)
+ - Revert "add mkl installation temp fix (#16304)" (#16369)
+ - increase docker cache timeout (#16430)
+ - Adds pip requirements file to nightly gpu ci image (#16472)
+ - [CD] Adds python docker pipeline (#16547)
+ - Move imagenet inference to nightly (#16577)
+ - Backport #16980 #17031 #17018 #17019 to 1.6 branch (#17213)
+
+### Misc
+
+ - update committer info (#15289)
+ - Typo fix in plan_memory relase -> release. (#15299)
+ - indent changes (#15321)
+ - Had a few PRs merged. Hope to become an official contributor and potentially a commiter. (#15451)
+ - cuda/cuDNN lib version checking. Force cuDNN v7 usage. (#15449)
+ - Improve diagnose.py, adding build features info and binary library path. (#15499)
+ - update ratcheck for apache-rat 0.13 release (#15417)
+ - add myself to interested modules (#15590)
+ - 1.5.0 news (#15137)
+ - bump up version from 1.5.0 to 1.6.0 on master (#15072)
+ - Remove myself from CODEOWNERS (#15617)
+ - remove mshadow submodule
+ - import mshadow source tree
+ - cuDNN support cleanup (#15812)
+ - Remove requests_failed_to_import handling
+ - Update CODEOWNERS. (#15972)
+ - Improve diagnose.py to display environment variables (#15715)
+ - Update README.md (#16035)
+ - [Dev] update ps-lite dependency (#15936)
+ - Typedef cleanup (#15899)
+ - add KEY for Tao Lv (#16081)
+ - remove 'foo' and other print msg from test (#16088)
+ - Revert accidental change to CMakelists (#16040)
+ - Update env_var.md (#16145)
+ - Update dmlc-core (#16149)
+ - adding codeowners (#16165)
+ - Factorize CUDA_KERNEL_LOOP used in CUDA kernels (#16197)
+ - add code of conduct and conflict resolution (#16343)
+ - simple typo error in NEWS.md (#16344)
+ - update NEWS.md and README.md (#16385)
+ - split issue templates (#16558)
+ - Create SECURITY.md (#16573)
## 1.5.1
Apache MXNet (incubating) 1.5.1 is a maintenance release incorporating important bug fixes and important performance improvements. All users of Apache MXNet (incubating) 1.5.0 are advised to upgrade. You can install Apache MXNet (incubating) 1.5.1 at the usual place. Please review these Release Notes to learn the bug fixes.
@@ -228,9 +1424,9 @@ Apache MXNet (incubating) 1.5.1 is a maintenance release incorporating important
### New Features
#### Automatic Mixed Precision(experimental)
-Training Deep Learning networks is a very computationally intensive task. Novel model architectures tend to have increasing numbers of layers and parameters, which slow down training. Fortunately, software optimizations and new generations of training hardware make it a feasible task.
+Training Deep Learning networks is a very computationally intensive task. Novel model architectures tend to have increasing numbers of layers and parameters, which slow down training. Fortunately, software optimizations and new generations of training hardware make it a feasible task.
However, most of the hardware and software optimization opportunities exist in exploiting lower precision (e.g. FP16) to, for example, utilize Tensor Cores available on new Volta and Turing GPUs. While training in FP16 showed great success in image classification tasks, other more complicated neural networks typically stayed in FP32 due to difficulties in applying the FP16 training guidelines.
-That is where AMP (Automatic Mixed Precision) comes into play. It automatically applies the guidelines of FP16 training, using FP16 precision where it provides the most benefit, while conservatively keeping in full FP32 precision operations unsafe to do in FP16. To learn more about AMP, check out this [tutorial](https://github.com/apache/incubator-mxnet/blob/master/docs/tutorials/amp/amp_tutorial.md).
+That is where AMP (Automatic Mixed Precision) comes into play. It automatically applies the guidelines of FP16 training, using FP16 precision where it provides the most benefit, while conservatively keeping in full FP32 precision operations unsafe to do in FP16. To learn more about AMP, check out this [tutorial](https://github.com/apache/incubator-mxnet/blob/master/docs/tutorials/amp/amp_tutorial.md).
#### MKL-DNN Reduced precision inference and RNN API support
Two advanced features, fused computation and reduced-precision kernels, are introduced by MKL-DNN in the recent version. These features can significantly speed up the inference performance on CPU for a broad range of deep learning topologies. MXNet MKL-DNN backend provides optimized implementations for various operators covering a broad range of applications including image classification, object detection, and natural language processing. Refer to the [MKL-DNN operator documentation](https://github.com/apache/incubator-mxnet/blob/v1.5.x/docs/tutorials/mkldnn/operator_list.md) for more information.
@@ -245,13 +1441,13 @@ Note: Currently dynamic shape does not work with Gluon deferred initialization.
#### Large Tensor Support
Currently, MXNet supports maximal tensor size of around 4 billon (2^32). This is due to uint32_t being used as the default data type for tensor size, as well as variable indexing.
-This limitation has created many problems when larger tensors are used in the model.
+This limitation has created many problems when larger tensors are used in the model.
A naive solution to this problem is to replace all uint32_t in the MXNet backend source code to int64_t.
This solution is not viable, however, because many data structures use uint32_t as the data type for its members.
-Unnecessarily replacing these variables to int64_t will increase the memory consumption causing another limitation. Second, MXNet has many submodule dependencies.
-Updating the variable types in the MXNet repository is not enough. We also need to make sure different libraries, such as MKLDNN, MShadow etc. supports the int64_t integer data type.
+Unnecessarily replacing these variables to int64_t will increase the memory consumption causing another limitation. Second, MXNet has many submodule dependencies.
+Updating the variable types in the MXNet repository is not enough. We also need to make sure different libraries, such as MKLDNN, MShadow etc. supports the int64_t integer data type.
Third, many front end APIs assume unsigned 32-bit integer interface. Only updating the interface in C/C++ will cause all the language bindings to fail.
-Therefore, we need a systematic approach to enhance MXNet to support large tensors.
+Therefore, we need a systematic approach to enhance MXNet to support large tensors.
Now you can enable large tensor support by changing the following build flag to 1: `USE_INT64_TENSOR_SIZE = 1`. Note this is set to 0 by default.
For more details please refer to the [design document](https://cwiki.apache.org/confluence/display/MXNET/Large+Tensor+Support).
@@ -260,7 +1456,7 @@ MXNet has added support for CUDA 10, CUDA 10.1, cudnn7.5, NCCL 2.4.2, and numpy
These updates are available through PyPI packages and build from source, refer to [installation guide](https://mxnet.apache.org/versions/master/install/index.html) for more details.
#### Gluon Fit API(experimental)
-Training a model in Gluon requires users to write the training loop. This is useful because of its imperative nature, however repeating the same code across multiple models can become tedious and repetitive with boilerplate code.
+Training a model in Gluon requires users to write the training loop. This is useful because of its imperative nature, however repeating the same code across multiple models can become tedious and repetitive with boilerplate code.
The training loop can also be overwhelming to some users new to deep learning. We have introduced an Estimator and Fit API to help facilitate training loop.
Note: this feature is still experimental, for more details, refer to [design document](https://cwiki.apache.org/confluence/display/MXNET/Gluon+Fit+API+-+Tech+Design).
@@ -521,7 +1717,7 @@ Note: this feature is still experimental, for more details, refer to [design doc
* Update inception_inference.cpp (#14674)
* Optimize C++ API (#13496)
-#### Clojure
+#### Clojure
* [Clojure] - Add Spec Validations to the Optimizer namespace (#13499)
* [Clojure] Add Spec Validations for the Random namespace (#13523)
* [Clojure] Correct the versions in the README so they correspond to the latest maven.org release ([#13507)
@@ -1093,38 +2289,38 @@ Apache MXNet (incubating) 1.4.1 is a maintenance release incorporating important
## 1.4.0
-- [New Features](#new-features)
+- [New Features](#new-features-2)
* [Java Inference API](#java-inference-api)
* [Julia API](#julia-api)
- * [Control Flow Operators (experimental)](#control-flow-operators--experimental-)
+ * [Control Flow Operators (experimental)](#control-flow-operators-experimental)
* [SVRG Optimization](#svrg-optimization)
- * [Subgraph API (experimental)](#subgraph-api--experimental-)
+ * [Subgraph API (experimental)](#subgraph-api-experimental)
* [JVM Memory Management](#jvm-memory-management)
- * [Topology-aware AllReduce (experimental)](#topology-aware-allreduce--experimental-)
- * [MKLDNN backend: Graph optimization and Quantization (experimental)](#mkldnn-backend--graph-optimization-and-quantization--experimental-)
+ * [Topology-aware AllReduce (experimental)](#topology-aware-allreduce-experimental)
+ * [MKLDNN backend: Graph optimization and Quantization (experimental)](#mkldnn-backend--graph-optimization-and-quantization-experimental)
+ [Graph Optimization](#graph-optimization)
+ [Quantization](#quantization)
-- [New Operators](#new-operators)
-- [Feature improvements](#feature-improvements)
+- [New Operators](#new-operators-3)
+- [Feature improvements](#feature-improvements-3)
* [Operator](#operator)
* [Optimizer](#optimizer)
* [Sparse](#sparse)
* [ONNX](#onnx)
- * [MKLDNN](#mkldnn)
+ * [MKLDNN](#mkldnn-2)
* [Inference](#inference)
* [Other](#other)
- [Frontend API updates](#frontend-api-updates)
- * [Gluon](#gluon)
- * [Symbol](#symbol)
+ * [Gluon](#gluon-2)
+ * [Symbol](#symbol-1)
- [Language API updates](#language-api-updates)
* [Java](#java)
* [R](#r)
- * [Scala](#scala)
- * [Clojure](#clojure)
- * [Perl](#perl)
- * [Julia](#julia)
+ * [Scala](#scala-2)
+ * [Clojure](#clojure-2)
+ * [Perl](#perl-2)
+ * [Julia](#julia-2)
- [Performance benchmarks and improvements](#performance-benchmarks-and-improvements)
-- [Bug fixes](#bug-fixes)
+- [Bug fixes](#bug-fixes-4)
- [Licensing updates](#licensing-updates)
- [Improvements](#improvements)
* [Tutorial](#tutorial)
@@ -1135,9 +2331,9 @@ Apache MXNet (incubating) 1.4.1 is a maintenance release incorporating important
* [Installation](#installation)
* [Build and CI](#build-and-ci)
* [3rd party](#3rd-party)
- + [TVM:](#tvm-)
- + [CUDNN:](#cudnn-)
- + [Horovod:](#horovod-)
+ + [TVM:](#tvm)
+ + [CUDNN:](#cudnn)
+ + [Horovod:](#horovod)
- [Deprications](#deprications)
- [Other](#other-1)
- [How to build MXNet](#how-to-build-mxnet)
@@ -1668,20 +2864,20 @@ Submodule@commit ID::Last updated by MXNet:: Last update in submodule
### Bug fixes
-* [MXNET-953] Fix oob memory read (v1.3.x) / [#13118](https://github.com/apache/incubator-mxnet/pull/13118)
+* [MXNET-953] Fix oob memory read (v1.3.x) / [#13118](https://github.com/apache/incubator-mxnet/pull/13118)
Simple bugfix addressing an out-of-bounds memory read.
-* [MXNET-969] Fix buffer overflow in RNNOp (v1.3.x) / [#13119](https://github.com/apache/incubator-mxnet/pull/13119)
+* [MXNET-969] Fix buffer overflow in RNNOp (v1.3.x) / [#13119](https://github.com/apache/incubator-mxnet/pull/13119)
This fixes an buffer overflow detected by ASAN.
-* CudnnFind() usage improvements (v1.3.x) / [#13123](https://github.com/apache/incubator-mxnet/pull/13123)
+* CudnnFind() usage improvements (v1.3.x) / [#13123](https://github.com/apache/incubator-mxnet/pull/13123)
This PR improves the MXNet's use of cudnnFind() to address a few issues:
1. With the gluon imperative style, cudnnFind() is called during forward(), and so might have its timings perturbed by other GPU activity (including potentially other cudnnFind() calls).
2. With some cuda drivers versions, care is needed to ensure that the large I/O and workspace cudaMallocs() performed by cudnnFind() are immediately released and available to MXNet.
- 3. cudnnFind() makes both conv I/O and workspace allocations that must be covered by the GPU global memory headroom defined by MXNET_GPU_MEM_POOL_RESERVE. Per issue #12662, large convolutions can result in out-of-memory errors, even when MXNet's storage allocator has free memory in its pool.
-
+ 3. cudnnFind() makes both conv I/O and workspace allocations that must be covered by the GPU global memory headroom defined by MXNET_GPU_MEM_POOL_RESERVE. Per issue #12662, large convolutions can result in out-of-memory errors, even when MXNet's storage allocator has free memory in its pool.
+
This PR addresses these issues, providing the following benefits:
1. Consistent algo choice for a given convolution type in a model, both for instances in the same GPU and in other GPUs in a multi-GPU training setting.
2. Consistent algo choice from run to run, based on eliminating sources of interference of the cudnnFind() timing process.
@@ -1689,38 +2885,38 @@ This fixes an buffer overflow detected by ASAN.
4. Increased training performance based on being able to consistently run with models that approach the GPU's full global memory footprint.
5. Adds a unittest for and solves issue #12662.
-* [MXNET-922] Fix memleak in profiler (v1.3.x) / [#13120](https://github.com/apache/incubator-mxnet/pull/13120)
+* [MXNET-922] Fix memleak in profiler (v1.3.x) / [#13120](https://github.com/apache/incubator-mxnet/pull/13120)
Fix a memleak reported locally by ASAN during a normal inference test.
-* Fix lazy record io when used with dataloader and multi_worker > 0 (v1.3.x) / [#13124](https://github.com/apache/incubator-mxnet/pull/13124)
+* Fix lazy record io when used with dataloader and multi_worker > 0 (v1.3.x) / [#13124](https://github.com/apache/incubator-mxnet/pull/13124)
Fixes multi_worker data loader when record file is used. The MXRecordIO instance needs to require a new file handler after fork to be safely manipulated simultaneously.
This fix also safely voids the previous temporary fixes #12093 #11370.
-* fixed symbols naming in RNNCell, LSTMCell, GRUCell (v1.3.x) / [#13158](https://github.com/apache/incubator-mxnet/pull/13158)
+* fixed symbols naming in RNNCell, LSTMCell, GRUCell (v1.3.x) / [#13158](https://github.com/apache/incubator-mxnet/pull/13158)
This fixes #12783, by assigning all nodes in hybrid_forward a unique name. Some operations were in fact performed without attaching the appropriate (time) prefix to the name, which makes serialized graphs non-deserializable.
-* Fixed `__setattr__` method of `_MXClassPropertyMetaClass` (v1.3.x) / [#13157](https://github.com/apache/incubator-mxnet/pull/13157)
+* Fixed `__setattr__` method of `_MXClassPropertyMetaClass` (v1.3.x) / [#13157](https://github.com/apache/incubator-mxnet/pull/13157)
Fixed `__setattr__` method
-* allow foreach on input with 0 length (v1.3.x) / [#13151](https://github.com/apache/incubator-mxnet/pull/13151)
+* allow foreach on input with 0 length (v1.3.x) / [#13151](https://github.com/apache/incubator-mxnet/pull/13151)
Fix #12470. With this change, outs shape can be inferred correctly.
-* Infer dtype in SymbolBlock import from input symbol (v1.3.x) / [#13117](https://github.com/apache/incubator-mxnet/pull/13117)
- Fix for the issue - #11849
- Currently, Gluon symbol block cannot import any symbol with type other than fp32. All the parameters are created as FP32 leading to failure in importing the params when it is of type fp16, fp64 etc,
- In this PR, we infer the type of the symbol being imported and create the Symbol Block Parameters with that inferred type.
+* Infer dtype in SymbolBlock import from input symbol (v1.3.x) / [#13117](https://github.com/apache/incubator-mxnet/pull/13117)
+ Fix for the issue - #11849
+ Currently, Gluon symbol block cannot import any symbol with type other than fp32. All the parameters are created as FP32 leading to failure in importing the params when it is of type fp16, fp64 etc,
+ In this PR, we infer the type of the symbol being imported and create the Symbol Block Parameters with that inferred type.
Added the tests
### Documentation fixes
-* Document the newly added env variable (v1.3.x) / [#13156](https://github.com/apache/incubator-mxnet/pull/13156)
+* Document the newly added env variable (v1.3.x) / [#13156](https://github.com/apache/incubator-mxnet/pull/13156)
Document the env variable: MXNET_ENFORCE_DETERMINISM added in PR: [#12992](https://github.com/apache/incubator-mxnet/pull/12992)
-* fix broken links (v1.3.x) / [#13155](https://github.com/apache/incubator-mxnet/pull/13155)
+* fix broken links (v1.3.x) / [#13155](https://github.com/apache/incubator-mxnet/pull/13155)
This PR fixes broken links on the website.
-* fix broken Python IO API docs (v1.3.x) / [#13154](https://github.com/apache/incubator-mxnet/pull/13154)
+* fix broken Python IO API docs (v1.3.x) / [#13154](https://github.com/apache/incubator-mxnet/pull/13154)
Fixes [#12854: Data Iterators documentation is broken](https://github.com/apache/incubator-mxnet/issues/12854)
This PR manually specifies members of the IO module so that the docs will render as expected. This is workaround in the docs to deal with a bug introduced in the Python code/structure since v1.3.0. See the comments for more info.
@@ -1729,7 +2925,7 @@ This fixes an buffer overflow detected by ASAN.
This is important for any future modules - that they recognize this issue and make efforts to map the params and other elements.
-* add/update infer_range docs (v1.3.x) / [#13153](https://github.com/apache/incubator-mxnet/pull/13153)
+* add/update infer_range docs (v1.3.x) / [#13153](https://github.com/apache/incubator-mxnet/pull/13153)
This PR adds or updates the docs for the infer_range feature.
Clarifies the param in the C op docs
@@ -1740,13 +2936,13 @@ This fixes an buffer overflow detected by ASAN.
### Other Improvements
-* [MXNET-1179] Enforce deterministic algorithms in convolution layers (v1.3.x) / [#13152](https://github.com/apache/incubator-mxnet/pull/13152)
+* [MXNET-1179] Enforce deterministic algorithms in convolution layers (v1.3.x) / [#13152](https://github.com/apache/incubator-mxnet/pull/13152)
Some of the CUDNN convolution algorithms are non-deterministic (see issue #11341). This PR adds an env variable to enforce determinism in the convolution operators. If set to true, only deterministic CUDNN algorithms will be used. If no deterministic algorithm is available, MXNet will error out.
### Submodule updates
-* update mshadow (v1.3.x) / [#13122](https://github.com/apache/incubator-mxnet/pull/13122)
+* update mshadow (v1.3.x) / [#13122](https://github.com/apache/incubator-mxnet/pull/13122)
Update mshadow for omp acceleration when nvcc is not present
### Known issues
@@ -1871,7 +3067,7 @@ For more information and examples, see [full release notes](https://cwiki.apache
- CTC operator performance improvement from HawkAaron/MXNet-CTC (#11834)
- Improve performance of broadcast ops backward pass (#11252)
- Improved numerical stability as a result of using stable L2 norm (#11573)
-- Accelerate the performance of topk for GPU and CPU side (#12085 #10997 ; This changes the behavior of topk when nan values occur in the input)
+- Accelerate the performance of topk for GPU and CPU side (#12085 #10997 ; This changes the behavior of topk when nan values occur in the input)
- Support for dot(dns, csr) = dns and dot(dns, csr.T) = dns on CPU ([#11113](https://github.com/apache/incubator-mxnet/pull/11113))
- Performance improvement for Batch Dot on CPU from mshadow ([mshadow PR#342](https://github.com/dmlc/mshadow/pull/342))
diff --git a/ci/docker/Dockerfile.build.ubuntu_cpu_jekyll b/ci/docker/Dockerfile.build.ubuntu_cpu_jekyll
index 88815d783f18..bc91286ecf21 100644
--- a/ci/docker/Dockerfile.build.ubuntu_cpu_jekyll
+++ b/ci/docker/Dockerfile.build.ubuntu_cpu_jekyll
@@ -28,6 +28,7 @@ RUN apt-get update && apt-get install -y \
build-essential \
git \
zlib1g-dev \
+ wget \
gnupg2 \
curl
diff --git a/docs/python_docs/_static/feedback.css b/docs/python_docs/_static/feedback.css
new file mode 100644
index 000000000000..b4a64ec5c280
--- /dev/null
+++ b/docs/python_docs/_static/feedback.css
@@ -0,0 +1,37 @@
+.feedback-container {
+ text-align: center;
+}
+
+.feedback-answer-container {
+ display: inline-block;
+}
+
+.feedback-question {
+ display: inline-block;
+ padding: 0.5em 1em 0.5em 1em;
+}
+
+.feedback-answer {
+ display: inline-block;
+ padding: 0.5em 1em 0.5em 1em;
+ color: #048ccc;
+ cursor: pointer;
+}
+
+.feedback-answer:hover {
+ color: #ffffff;
+ background-color: #048ccc;
+}
+
+.feedback-thank-you {
+ display: none;
+ padding: 0.5em 1em 0.5em 1em;
+}
+
+.feedback-hr-top {
+ margin-top: 50px;
+}
+
+.feedback-hr-bottom {
+ margin-bottom: 30px;
+}
diff --git a/docs/python_docs/_static/feedback.js b/docs/python_docs/_static/feedback.js
new file mode 100644
index 000000000000..f45423765b74
--- /dev/null
+++ b/docs/python_docs/_static/feedback.js
@@ -0,0 +1,33 @@
+/*!
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+$(document).ready(function() {
+ $(".feedback-answer").on("click", function () {
+ $(".feedback-question").remove();
+ $(".feedback-answer-container").remove();
+ $(".feedback-thank-you").show();
+ ga("send", {
+ hitType: "event",
+ eventCategory: "Did this page help you?",
+ eventAction: $(this).attr("data-response"),
+ eventLabel: window.location.pathname || "unknown",
+ eventValue: $(this).attr("data-response") === "yes" ? 1 : 0
+ });
+ });
+});
diff --git a/docs/python_docs/_static/mxnet.css b/docs/python_docs/_static/mxnet.css
index 7d4f7f115424..08415c83222b 100644
--- a/docs/python_docs/_static/mxnet.css
+++ b/docs/python_docs/_static/mxnet.css
@@ -53,7 +53,7 @@
}
.mdl-layout__header-row {
- height: 84px !important;
+ height: 80px !important;
}
.mdl-shadow--2dp {
@@ -203,4 +203,8 @@ p {
float: right;
margin: 4px;
cursor: pointer;
-}
\ No newline at end of file
+}
+
+.scrollUp {
+ transform: translateY(-80px);
+}
diff --git a/docs/python_docs/python/api/gluon/index.rst b/docs/python_docs/python/api/gluon/index.rst
index c08e8aa73627..cf76ef42f5c2 100644
--- a/docs/python_docs/python/api/gluon/index.rst
+++ b/docs/python_docs/python/api/gluon/index.rst
@@ -49,7 +49,7 @@ Tutorials
.. card::
:title: Gluon Guide
- :link: ../tutorials/packages/gluon/index.html
+ :link: ../../tutorials/packages/gluon/index.html
The Gluon guide. Start here!
diff --git a/docs/python_docs/python/tutorials/deploy/inference/image_classification_jetson.md b/docs/python_docs/python/tutorials/deploy/inference/image_classification_jetson.md
new file mode 100644
index 000000000000..299a40f6a807
--- /dev/null
+++ b/docs/python_docs/python/tutorials/deploy/inference/image_classification_jetson.md
@@ -0,0 +1,117 @@
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+# Image Classication using pretrained ResNet-50 model on Jetson module
+
+This tutorial shows how to install MXNet v1.x with Jetson support and use it to deploy a pre-trained MXNet model for image classification on a Jetson module.
+
+## What's in this tutorial?
+
+This tutorial shows how to:
+
+1. Install MXNet v1.x along with its dependencies on a Jetson module (This tutorial has been tested on Jetson Xavier AGX and Jetson Nano modules)
+
+2. Deploy a pre-trained MXNet model for image classifcation on the module
+
+## Who's this tutorial for?
+
+This tutorial would benefit developers working on Jetson modules implementing deep learning applications. It assumes that readers have a Jetson module setup with Jetpack installed, are familiar with the Jetson working environment and are somewhat familiar with deep learning using MXNet.
+
+## Prerequisites
+
+To complete this tutorial, you need:
+
+* A [Jetson module](https://developer.nvidia.com/embedded/develop/hardware) setup with [Jetpack 4.4](https://docs.nvidia.com/jetson/jetpack/release-notes/) installed using NVIDIA [SDK Manager](https://developer.nvidia.com/nvidia-sdk-manager)
+
+* An SSH connection to the module OR display and keyboard setup to directly open shell on the module
+
+* [Swapfile](https://help.ubuntu.com/community/SwapFaq) installed, especially on Jetson Nano for additional memory (increase memory if the inference script terminates with a `Killed` message)
+
+## Installing MXNet v1.x with Jetson support
+
+To install MXNet with Jetson support, you can follow the [installation guide](https://mxnet.apache.org/get_started/jetson_setup) on MXNet official website.
+
+Alternatively, you can also directly install MXNet v1.6 wheel with Jetson support, hosted on a public s3 bucket. Here are the steps to install this wheel:
+
+*WARNING: this MXNet wheel is provided for your convenience but it contains packages that are not provided nor endorsed by the Apache Software Foundation.
+As such, they might contain software components with more restrictive licenses than the Apache License and you'll need to decide whether they are appropriate for your usage. Like all Apache Releases, the
+official Apache MXNet (incubating) releases consist of source code only and are found at https://mxnet.apache.org/get_started/download .*
+
+We start by installing MXNet dependencies
+```bash
+sudo apt-get update
+sudo apt-get install -y git build-essential libopenblas-dev libopencv-dev python3-pip
+sudo pip3 install -U pip
+```
+
+Then we download and install MXNet v1.6 wheel with Jetson support
+```bash
+wget https://mxnet-public.s3.us-east-2.amazonaws.com/install/jetson/1.6.0/mxnet_cu102-1.6.0-py2.py3-none-linux_aarch64.whl
+sudo pip3 install mxnet_cu102-1.6.0-py2.py3-none-linux_aarch64.whl
+```
+
+And we are done. You can test the installation now by importing mxnet from python3
+```bash
+>>> python3 -c 'import mxnet'
+```
+
+## Running a pre-trained ResNet-50 model on Jetson
+
+We are now ready to run a pre-trained model and run inference on a Jetson module. In this tutorial we are using ResNet-50 model trained on Imagenet dataset. We run the following classification script with either cpu/gpu context using python3.
+
+```python
+from mxnet import gluon
+import mxnet as mx
+
+# set context
+ctx = mx.gpu()
+
+# load pre-trained model
+net = gluon.model_zoo.vision.resnet50_v1(pretrained=True, ctx=ctx)
+net.hybridize(static_alloc=True, static_shape=True)
+
+# load labels
+lbl_path = gluon.utils.download('http://data.mxnet.io/models/imagenet/synset.txt')
+with open(lbl_path, 'r') as f:
+ labels = [l.rstrip() for l in f]
+
+# download and format image as (batch, RGB, width, height)
+img_path = gluon.utils.download('https://github.com/dmlc/web-data/blob/master/mxnet/doc/tutorials/python/predict_image/cat.jpg?raw=true')
+img = mx.image.imread(img_path)
+img = mx.image.imresize(img, 224, 224) # resize
+img = mx.image.color_normalize(img.astype(dtype='float32')/255,
+ mean=mx.nd.array([0.485, 0.456, 0.406]),
+ std=mx.nd.array([0.229, 0.224, 0.225])) # normalize
+img = img.transpose((2, 0, 1)) # channel first
+img = img.expand_dims(axis=0) # batchify
+img = img.as_in_context(ctx)
+
+prob = net(img).softmax() # predict and normalize output
+idx = prob.topk(k=5)[0] # get top 5 result
+for i in idx:
+ i = int(i.asscalar())
+ print('With prob = %.5f, it contains %s' % (prob[0,i].asscalar(), labels[i]))
+```
+
+After running the above script, you should get the following output showing the five classes that the image most relates to with probability:
+```bash
+With prob = 0.41940, it contains n02119789 kit fox, Vulpes macrotis
+With prob = 0.28096, it contains n02119022 red fox, Vulpes vulpes
+With prob = 0.06857, it contains n02124075 Egyptian cat
+With prob = 0.03046, it contains n02120505 grey fox, gray fox, Urocyon cinereoargenteus
+With prob = 0.02770, it contains n02441942 weasel
+```
diff --git a/docs/python_docs/themes/mx-theme/mxtheme/feedback.html b/docs/python_docs/themes/mx-theme/mxtheme/feedback.html
new file mode 100644
index 000000000000..8a9b1b53029a
--- /dev/null
+++ b/docs/python_docs/themes/mx-theme/mxtheme/feedback.html
@@ -0,0 +1,10 @@
+
- {% markdown %}{% include /get_started/windows/r/cpu.md %}{% endmarkdown %}
-
+ {% markdown %}{% include /get_started/windows/r/build-from-source.md %}{% endmarkdown %}
+
-
- {% markdown %}{% include /get_started/windows/r/gpu.md %}{% endmarkdown %}
-
-
-
- {% markdown %}{% include /get_started/windows/scala/scala.md %}{% endmarkdown %}
-
+ {% markdown %}{% include /get_started/windows/scala/build-from-source.md %}{% endmarkdown %}
+
-
- {% markdown %}{% include /get_started/windows/clojure/clojure.md %}{% endmarkdown %}
-
+ {% markdown %}{% include /get_started/windows/clojure/build-from-source.md %}{% endmarkdown %}
+
-
- {% markdown %}{% include /get_started/windows/java/java.md %}{% endmarkdown %}
-
+ {% markdown %}{% include /get_started/windows/java/build-from-source.md %}{% endmarkdown %}
-
-
- {% markdown %}{% include /get_started/windows/julia/pkg.md %}{% endmarkdown %}
-
-
- {% markdown %}{% include /get_started/windows/julia/build-from-source.md %}{% endmarkdown %}
-
-
+ {% markdown %}{% include /get_started/windows/julia/build-from-source.md %}{% endmarkdown %}
-
- {% markdown %}{% include /get_started/windows/perl/perl.md %}{% endmarkdown %}
-
+ {% markdown %}{% include /get_started/windows/perl/build-from-source.md %}{% endmarkdown %}
-
- {% markdown %}{% include /get_started/windows/cpp/cpp.md %}{% endmarkdown %}
-
-
-
- For more installation options, refer to the MXNet Windows installation guide.
-
+ {% markdown %}{% include /get_started/windows/cpp/build-from-source.md %}{% endmarkdown %}
+
+
diff --git a/docs/static_site/src/_includes/get_started/linux/clojure/cpu.md b/docs/static_site/src/_includes/get_started/linux/clojure/build-from-source.md
similarity index 100%
rename from docs/static_site/src/_includes/get_started/linux/clojure/cpu.md
rename to docs/static_site/src/_includes/get_started/linux/clojure/build-from-source.md
diff --git a/docs/static_site/src/_includes/get_started/linux/clojure/gpu.md b/docs/static_site/src/_includes/get_started/linux/clojure/gpu.md
deleted file mode 100644
index 26293f6e077f..000000000000
--- a/docs/static_site/src/_includes/get_started/linux/clojure/gpu.md
+++ /dev/null
@@ -1,15 +0,0 @@
-You can use the Maven packages defined in the following dependency to include MXNet in your Clojure
-project. To maximize leverage, the Clojure package has been built on the existing Scala package. Please
-refer to the [MXNet-Scala setup guide]({{'/get_started/scala_setup'|relative_url}}) for a detailed set of instructions
-to help you with the setup process that is required to use the Clojure dependency.
-
-
-
-{% highlight html %}
-
- org.apache.mxnet.contrib.clojure
- clojure-mxnet-linux-gpu
-
-{% endhighlight %}
\ No newline at end of file
diff --git a/docs/static_site/src/_includes/get_started/linux/cpp/build-from-source.md b/docs/static_site/src/_includes/get_started/linux/cpp/build-from-source.md
new file mode 100644
index 000000000000..d37cf77a3afc
--- /dev/null
+++ b/docs/static_site/src/_includes/get_started/linux/cpp/build-from-source.md
@@ -0,0 +1,2 @@
+To use the C++ package, build from source the `USE_CPP_PACKAGE=1` option. Please
+refer to the build from source instructions linked above.
diff --git a/docs/static_site/src/_includes/get_started/linux/cpp/cpp.md b/docs/static_site/src/_includes/get_started/linux/cpp/cpp.md
deleted file mode 100644
index dc492d801262..000000000000
--- a/docs/static_site/src/_includes/get_started/linux/cpp/cpp.md
+++ /dev/null
@@ -1,4 +0,0 @@
-To enable the C++ package, build from source using `make USE_CPP_PACKAGE=1`.
-Refer to the [MXNet C++ setup guide](/get_started/cpp_setup.html)
-for full instructions.
-
\ No newline at end of file
diff --git a/docs/static_site/src/_includes/get_started/linux/java/build-from-source.md b/docs/static_site/src/_includes/get_started/linux/java/build-from-source.md
new file mode 100644
index 000000000000..585cc3e41890
--- /dev/null
+++ b/docs/static_site/src/_includes/get_started/linux/java/build-from-source.md
@@ -0,0 +1,6 @@
+Previously available binaries distributed via Maven have been removed as they
+redistributed Category-X binaries in violation of Apache Software Foundation
+(ASF) policies.
+
+At this point in time, no third-party binary Java packages are available. Please
+follow the build from source instructions linked above.
diff --git a/docs/static_site/src/_includes/get_started/linux/java/gpu.md b/docs/static_site/src/_includes/get_started/linux/java/gpu.md
deleted file mode 100644
index 6f6757f6e2ea..000000000000
--- a/docs/static_site/src/_includes/get_started/linux/java/gpu.md
+++ /dev/null
@@ -1,17 +0,0 @@
-You can use the Maven packages defined in the following dependency to include MXNet in your Java
-project. The Java API is provided as a subset of the Scala API and is intended for inference only.
-Please refer to the MXNet-Java setup guide for a detailed set of
-instructions to help you with the setup process.
-
-
-
-
-
-{% highlight html %}
-
- org.apache.mxnet
- mxnet-full_2.11-linux-x86_64-gpu
- [1.5.0, )
-
-{% endhighlight %}
\ No newline at end of file
diff --git a/docs/static_site/src/_includes/get_started/linux/julia/build-from-source.md b/docs/static_site/src/_includes/get_started/linux/julia/build-from-source.md
index 018aca9d7387..cef507527bfe 100644
--- a/docs/static_site/src/_includes/get_started/linux/julia/build-from-source.md
+++ b/docs/static_site/src/_includes/get_started/linux/julia/build-from-source.md
@@ -1,2 +1 @@
-Refer to the [Julia section of the MXNet Ubuntu installation guide](/get_started/ubuntu_setup#install-the-mxnet-package-for-julia).
-
+Please follow the build from source instructions linked above.
diff --git a/docs/static_site/src/_includes/get_started/linux/julia/pkg.md b/docs/static_site/src/_includes/get_started/linux/julia/pkg.md
deleted file mode 100644
index 35971305ee2d..000000000000
--- a/docs/static_site/src/_includes/get_started/linux/julia/pkg.md
+++ /dev/null
@@ -1,10 +0,0 @@
-Install a pinned version of MXNet like this:
-
-{% highlight julia %}
-]add MXNet#v1.5.0
-{% endhighlight %}
-
-Or directly install the latest release:
-{% highlight julia %}
-]add MXNet
-{% endhighlight %}
\ No newline at end of file
diff --git a/docs/static_site/src/_includes/get_started/linux/perl/build-from-source.md b/docs/static_site/src/_includes/get_started/linux/perl/build-from-source.md
new file mode 100644
index 000000000000..cef507527bfe
--- /dev/null
+++ b/docs/static_site/src/_includes/get_started/linux/perl/build-from-source.md
@@ -0,0 +1 @@
+Please follow the build from source instructions linked above.
diff --git a/docs/static_site/src/_includes/get_started/linux/perl/perl.md b/docs/static_site/src/_includes/get_started/linux/perl/perl.md
deleted file mode 100644
index 02d82e457726..000000000000
--- a/docs/static_site/src/_includes/get_started/linux/perl/perl.md
+++ /dev/null
@@ -1 +0,0 @@
-Refer to the [Perl section of the MXNet Ubuntu installation guide](get_started/ubuntu_setup.html#install-the-mxnet-package-for-perl).
\ No newline at end of file
diff --git a/docs/static_site/src/_includes/get_started/linux/python/cpu/build-from-source.md b/docs/static_site/src/_includes/get_started/linux/python/cpu/build-from-source.md
index 4adf039739d5..cef507527bfe 100644
--- a/docs/static_site/src/_includes/get_started/linux/python/cpu/build-from-source.md
+++ b/docs/static_site/src/_includes/get_started/linux/python/cpu/build-from-source.md
@@ -1 +1 @@
-To build from source, refer to the [MXNet Ubuntu installation guide]({{'/get_started/ubuntu_setup' | relative_url}}).
+Please follow the build from source instructions linked above.
diff --git a/docs/static_site/src/_includes/get_started/linux/python/cpu/docker.md b/docs/static_site/src/_includes/get_started/linux/python/cpu/docker.md
index 8f2c8eb143a4..7eab6d36dec2 100644
--- a/docs/static_site/src/_includes/get_started/linux/python/cpu/docker.md
+++ b/docs/static_site/src/_includes/get_started/linux/python/cpu/docker.md
@@ -1,22 +1,18 @@
-Docker images with *MXNet* are available at [DockerHub](https://hub.docker.com/r/mxnet/).
-
-**Step 1** Install Docker on your machine by following the [docker installation
-instructions](https://docs.docker.com/engine/installation/linux/ubuntu/#install-using-the-repository).
-
-*Note* - You can install Community Edition (CE) to get started with *MXNet*.
-
-**Step 2** [Optional] Post installation steps to manage Docker as a non-root user.
+WARNING: the following links and names of binary distributions are provided for
+your convenience but they point to packages that are *not* provided nor endorsed
+by the Apache Software Foundation. As such, they might contain software
+components with more restrictive licenses than the Apache License and you'll
+need to decide whether they are appropriate for your usage. Like all Apache
+Releases, the official Apache MXNet (incubating) releases consist of source code
+only and are found at
+the [Download page](https://mxnet.apache.org/get_started/download).
+
-Follow the four steps in this [docker
-documentation](https://docs.docker.com/engine/installation/linux/linux-postinstall/#manage-docker-as-a-non-root-user)
-to allow managing docker containers without *sudo*.
-
-If you skip this step, you need to use *sudo* each time you invoke Docker.
-
-**Step 3** Pull the MXNet docker image.
+Docker images with *MXNet* are available at [DockerHub](https://hub.docker.com/r/mxnet/).
+After you installed Docker on your machine, you can use them via:
{% highlight bash %}
-$ docker pull mxnet/python # Use sudo if you skip Step 2
+$ docker pull mxnet/python
{% endhighlight %}
You can list docker images to see if mxnet/python docker image pull was successful.
@@ -28,16 +24,4 @@ REPOSITORY TAG IMAGE ID CREATED SIZE
mxnet/python latest 00d026968b3c 3 weeks ago 1.41 GB
{% endhighlight %}
-Using the latest MXNet with [Intel MKL-DNN](https://github.com/intel/mkl-dnn) is
-recommended for the
-fastest inference speeds with MXNet.
-
-{% highlight bash %}
-$ docker pull mxnet/python:1.3.0_cpu_mkl # Use sudo if you skip Step 2
-$ docker images # Use sudo if you skip Step 2
-
-REPOSITORY TAG IMAGE ID CREATED SIZE
-mxnet/python 1.3.0_cpu_mkl deaf9bf61d29 4 days ago 678 MB
-{% endhighlight %}
-
-**Step 4** Validate the installation.
\ No newline at end of file
+You can then validate the installation.
diff --git a/docs/static_site/src/_includes/get_started/linux/python/cpu/pip.md b/docs/static_site/src/_includes/get_started/linux/python/cpu/pip.md
index 81db75ea9038..8f9d4e04b188 100644
--- a/docs/static_site/src/_includes/get_started/linux/python/cpu/pip.md
+++ b/docs/static_site/src/_includes/get_started/linux/python/cpu/pip.md
@@ -1,62 +1,122 @@
+WARNING: the following PyPI package names are provided for your convenience but
+they point to packages that are *not* provided nor endorsed by the Apache
+Software Foundation. As such, they might contain software components with more
+restrictive licenses than the Apache License and you'll need to decide whether
+they are appropriate for your usage. The packages linked here contain GPL GCC
+Runtime Library components. Like all Apache Releases, the official Apache MXNet
+(incubating) releases consist of source code only and are found at the [Download
+page](https://mxnet.apache.org/get_started/download).
+
Run the following command:
-
+
+{% highlight bash %}
+pip install mxnet
+{% endhighlight %}
+
+Start from 1.7.0 release, oneDNN(previously known as: MKL-DNN/DNNL) is enabled
+in pip packages by default.
+
+oneAPI Deep Neural Network Library (oneDNN) is an open-source cross-platform
+performance library of basic building blocks for deep learning applications.
+The library is optimized for Intel Architecture Processors, Intel Processor
+Graphics and Xe architecture-based Graphics. Support for other architectures
+such as Arm* 64-bit Architecture (AArch64) and OpenPOWER* Power ISA (PPC64) is
+experimental.
+
+oneDNN is intended for deep learning applications and framework developers
+interested in improving application performance on Intel CPUs and GPUs, more
+details can be found here.
+
+You can find performance numbers in the
+
+MXNet tuning guide.
+
+To install native MXNet without oneDNN, run the following command:
+
+{% highlight bash %}
+pip install mxnet-native
+{% endhighlight %}
+
+
+
+
{% highlight bash %}
-$ pip install mxnet
+pip install mxnet==1.6.0
{% endhighlight %}
MKL-DNN enabled pip packages are optimized for Intel hardware. You can find
-performance numbers
-in the MXNet tuning guide.
+performance numbers in the
+
+MXNet tuning guide.
{% highlight bash %}
-$ pip install mxnet-mkl
+pip install mxnet-mkl==1.6.0
{% endhighlight %}
-
+
+
+
+{% highlight bash %}
+pip install mxnet==1.5.1
+{% endhighlight %}
+
+MKL-DNN enabled pip packages are optimized for Intel hardware. You can find
+performance numbers in the
+
+MXNet tuning guide.
+
+{% highlight bash %}
+pip install mxnet-mkl==1.5.1
+{% endhighlight %}
+
+
{% highlight bash %}
-$ pip install mxnet==1.4.1
+pip install mxnet==1.4.1
{% endhighlight %}
MKL-DNN enabled pip packages are optimized for Intel hardware. You can find
-performance numbers
-in the MXNet tuning guide.
+performance numbers in the
+
+MXNet tuning guide.
{% highlight bash %}
-$ pip install mxnet-mkl==1.4.1
+pip install mxnet-mkl==1.4.1
{% endhighlight %}
{% highlight bash %}
-$ pip install mxnet==1.3.1
+pip install mxnet==1.3.1
{% endhighlight %}
MKL-DNN enabled pip packages are optimized for Intel hardware. You can find
-performance numbers
-in the MXNet tuning guide.
+performance numbers in the
+
+MXNet tuning guide.
{% highlight bash %}
-$ pip install mxnet-mkl==1.3.1
+pip install mxnet-mkl==1.3.1
{% endhighlight %}
{% highlight bash %}
-$ pip install mxnet==1.2.1
+pip install mxnet==1.2.1
{% endhighlight %}
MKL-DNN enabled pip packages are optimized for Intel hardware. You can find
-performance numbers
-in the MXNet tuning guide.
+performance numbers in the
+
+MXNet tuning guide.
{% highlight bash %}
-$ pip install mxnet-mkl==1.2.1
+pip install mxnet-mkl==1.2.1
{% endhighlight %}
-
-{% highlight bash %}
-$ pip install mxnet --pre
-{% endhighlight %}
-
-MKL-DNN enabled pip packages are optimized for Intel hardware. You can find
-performance numbers
-in the MXNet tuning guide.
-
-{% highlight bash %}
-$ pip install mxnet-mkl --pre
-{% endhighlight %}
-
-
-{% include /get_started/pip_snippet.md %}
\ No newline at end of file
+{% include /get_started/pip_snippet.md %}
diff --git a/docs/static_site/src/_includes/get_started/linux/python/gpu/build-from-source.md b/docs/static_site/src/_includes/get_started/linux/python/gpu/build-from-source.md
index 4adf039739d5..cef507527bfe 100644
--- a/docs/static_site/src/_includes/get_started/linux/python/gpu/build-from-source.md
+++ b/docs/static_site/src/_includes/get_started/linux/python/gpu/build-from-source.md
@@ -1 +1 @@
-To build from source, refer to the [MXNet Ubuntu installation guide]({{'/get_started/ubuntu_setup' | relative_url}}).
+Please follow the build from source instructions linked above.
diff --git a/docs/static_site/src/_includes/get_started/linux/python/gpu/docker.md b/docs/static_site/src/_includes/get_started/linux/python/gpu/docker.md
index dc21cf10b4ef..f963bc9c58de 100644
--- a/docs/static_site/src/_includes/get_started/linux/python/gpu/docker.md
+++ b/docs/static_site/src/_includes/get_started/linux/python/gpu/docker.md
@@ -1,24 +1,19 @@
-Docker images with *MXNet* are available at [DockerHub](https://hub.docker.com/r/mxnet/).
-
-**Step 1** Install Docker on your machine by following the [docker installation
-instructions](https://docs.docker.com/engine/installation/linux/ubuntu/#install-using-the-repository).
-
-*Note* - You can install Community Edition (CE) to get started with *MXNet*.
-
-**Step 2** [Optional] Post installation steps to manage Docker as a non-root user.
+WARNING: the following links and names of binary distributions are provided for
+your convenience but they point to packages that are *not* provided nor endorsed
+by the Apache Software Foundation. As such, they might contain software
+components with more restrictive licenses than the Apache License and you'll
+need to decide whether they are appropriate for your usage. Like all Apache
+Releases, the official Apache MXNet (incubating) releases consist of source code
+only and are found at
+the [Download page](https://mxnet.apache.org/get_started/download).
-Follow the four steps in this [docker
-documentation](https://docs.docker.com/engine/installation/linux/linux-postinstall/#manage-docker-as-a-non-root-user)
-to allow managing docker containers without *sudo*.
-
-If you skip this step, you need to use *sudo* each time you invoke Docker.
+Docker images with *MXNet* are available at [DockerHub](https://hub.docker.com/r/mxnet/).
-**Step 3** Install *nvidia-docker-plugin* following the [installation
-instructions](https://github.com/NVIDIA/nvidia-docker/wiki). *nvidia-docker-plugin*
-is required to
-enable the usage of GPUs from the docker containers.
+Please follow the [NVidia Docker installation
+instructions](https://github.com/NVIDIA/nvidia-docker/wiki) to enable the usage
+of GPUs from the docker containers.
-**Step 4** Pull the MXNet docker image.
+After you installed Docker on your machine, you can use them via:
{% highlight bash %}
$ docker pull mxnet/python:gpu # Use sudo if you skip Step 2
@@ -33,16 +28,4 @@ REPOSITORY TAG IMAGE ID CREATED SIZE
mxnet/python gpu 493b2683c269 3 weeks ago 4.77 GB
{% endhighlight %}
-Using the latest MXNet with [Intel MKL-DNN](https://github.com/intel/mkl-dnn) is
-recommended for the
-fastest inference speeds with MXNet.
-
-{% highlight bash %}
-$ docker pull mxnet/python:1.3.0_cpu_mkl # Use sudo if you skip Step 2
-$ docker images # Use sudo if you skip Step 2
-
-REPOSITORY TAG IMAGE ID CREATED SIZE
-mxnet/python 1.3.0_gpu_cu92_mkl adcb3ab19f50 4 days ago 4.23 GB
-{% endhighlight %}
-
-**Step 5** Validate the installation.
\ No newline at end of file
+You can then validate the installation.
diff --git a/docs/static_site/src/_includes/get_started/linux/python/gpu/pip.md b/docs/static_site/src/_includes/get_started/linux/python/gpu/pip.md
index 249cd5b54052..91d32aed1ccd 100644
--- a/docs/static_site/src/_includes/get_started/linux/python/gpu/pip.md
+++ b/docs/static_site/src/_includes/get_started/linux/python/gpu/pip.md
@@ -1,11 +1,35 @@
+WARNING: the following PyPI package names are provided for your convenience but
+they point to packages that are *not* provided nor endorsed by the Apache
+Software Foundation. As such, they might contain software components with more
+restrictive licenses than the Apache License and you'll need to decide whether
+they are appropriate for your usage. The packages linked here contain
+proprietary parts of the NVidia CUDA SDK and GPL GCC Runtime Library components.
+Like all Apache Releases, the official Apache MXNet (incubating) releases
+consist of source code only and are found at the [Download
+page](https://mxnet.apache.org/get_started/download).
+
Run the following command:
-
-
{% include /get_started/pip_snippet.md %}
-{% include /get_started/gpu_snippet.md %}
\ No newline at end of file
+{% include /get_started/gpu_snippet.md %}
diff --git a/docs/static_site/src/_includes/get_started/linux/r/build-from-source.md b/docs/static_site/src/_includes/get_started/linux/r/build-from-source.md
new file mode 100644
index 000000000000..1150a309223d
--- /dev/null
+++ b/docs/static_site/src/_includes/get_started/linux/r/build-from-source.md
@@ -0,0 +1,2 @@
+You will need to R v3.4.4+ and build MXNet from source. Please follow the
+instructions linked above.
diff --git a/docs/static_site/src/_includes/get_started/linux/r/cpu.md b/docs/static_site/src/_includes/get_started/linux/r/cpu.md
deleted file mode 100644
index 1077362c9a65..000000000000
--- a/docs/static_site/src/_includes/get_started/linux/r/cpu.md
+++ /dev/null
@@ -1,10 +0,0 @@
-The default version of R that is installed with `apt-get` is insufficient. You will need
-to first [install R v3.4.4+ and build MXNet from source](/get_started/ubuntu_setup.html#install-the-mxnet-package-for-r).
-
-After you have setup R v3.4.4+ and MXNet, you can build and install the MXNet R bindings with the following, assuming that `incubator-mxnet` is the source directory you used to build MXNet as follows:
-
-{% highlight bash %}
-$ cd incubator-mxnet
-$ mkdir build; cd build; cmake -DUSE_CUDA=OFF ..; make -j $(nproc); cd ..
-$ make -f R-package/Makefile rpkg
-{% endhighlight %}
diff --git a/docs/static_site/src/_includes/get_started/linux/r/gpu.md b/docs/static_site/src/_includes/get_started/linux/r/gpu.md
deleted file mode 100644
index 1ef16df25da5..000000000000
--- a/docs/static_site/src/_includes/get_started/linux/r/gpu.md
+++ /dev/null
@@ -1,17 +0,0 @@
-The default version of R that is installed with `apt-get` is insufficient. You will need
-to first
-[install R v3.4.4+ and build MXNet from
-source](/get_started/ubuntu_setup.html#install-the-mxnet-package-for-r).
-
-After you have setup R v3.4.4+ and MXNet, you can build and install the MXNet R bindings
-with the
-following, assuming that `incubator-mxnet` is the source directory you used to build
-MXNet as follows:
-
-{% highlight bash %}
-$ cd incubator-mxnet
-$ mkdir build; cd build; cmake ..; make -j $(nproc); cd ..
-$ make -f R-package/Makefile rpkg
-{% endhighlight %}
-
-{% include /get_started/gpu_snippet.md %}
diff --git a/docs/static_site/src/_includes/get_started/linux/java/cpu.md b/docs/static_site/src/_includes/get_started/linux/scala/build-from-source.md
similarity index 100%
rename from docs/static_site/src/_includes/get_started/linux/java/cpu.md
rename to docs/static_site/src/_includes/get_started/linux/scala/build-from-source.md
diff --git a/docs/static_site/src/_includes/get_started/linux/scala/cpu.md b/docs/static_site/src/_includes/get_started/linux/scala/cpu.md
deleted file mode 100644
index 3cc96bade7df..000000000000
--- a/docs/static_site/src/_includes/get_started/linux/scala/cpu.md
+++ /dev/null
@@ -1,14 +0,0 @@
-You can use the Maven packages defined in the following dependency to include MXNet in your Scala
-project. Please refer to the [MXNet-Scala setup guide]({{'/get_started/scala_setup'|relative_url}}) for
-a detailed set of instructions to help you with the setup process.
-
-
-
-{% highlight html %}
-
-org.apache.mxnet
-mxnet-full_2.11-linux-x86_64-cpu
-
-{% endhighlight %}
\ No newline at end of file
diff --git a/docs/static_site/src/_includes/get_started/linux/scala/gpu.md b/docs/static_site/src/_includes/get_started/linux/scala/gpu.md
deleted file mode 100644
index 5e2f9b179205..000000000000
--- a/docs/static_site/src/_includes/get_started/linux/scala/gpu.md
+++ /dev/null
@@ -1,16 +0,0 @@
-You can use the Maven packages defined in the following dependency to include MXNet in
-your Scala
-project. Please refer to the MXNet-Scala setup guide for
-a detailed set
-of instructions to help you with the setup process.
-
-
-
-{% highlight html %}
-
-org.apache.mxnet
-mxnet-full_2.11-linux-x86_64-gpu
-
-{% endhighlight %}
diff --git a/docs/static_site/src/_includes/get_started/macos b/docs/static_site/src/_includes/get_started/macos
new file mode 120000
index 000000000000..9c52cb36f47e
--- /dev/null
+++ b/docs/static_site/src/_includes/get_started/macos
@@ -0,0 +1 @@
+linux
\ No newline at end of file
diff --git a/docs/static_site/src/_includes/get_started/macos/clojure/cpu.md b/docs/static_site/src/_includes/get_started/macos/clojure/cpu.md
deleted file mode 100644
index 1e27af5e5dd7..000000000000
--- a/docs/static_site/src/_includes/get_started/macos/clojure/cpu.md
+++ /dev/null
@@ -1,17 +0,0 @@
-
-You can use the Maven packages defined in the following dependency to include MXNet in
-your Clojure project. To maximize leverage, the Clojure package has been built on the existing Scala
-package. Please refer to the [MXNet-Scala setup guide](scala_setup.html) for a detailed set
-of instructions to help you with the setup process that is required to use the Clojure dependency.
-
-
-
-{% highlight html %}
-
- org.apache.mxnet.contrib.clojure
- clojure-mxnet-osx-cpu
-
-{% endhighlight %}
-
diff --git a/docs/static_site/src/_includes/get_started/macos/clojure/gpu.md b/docs/static_site/src/_includes/get_started/macos/clojure/gpu.md
deleted file mode 100644
index ccbc24db96e7..000000000000
--- a/docs/static_site/src/_includes/get_started/macos/clojure/gpu.md
+++ /dev/null
@@ -1 +0,0 @@
-Not available at this time.
diff --git a/docs/static_site/src/_includes/get_started/macos/cpp/cpp.md b/docs/static_site/src/_includes/get_started/macos/cpp/cpp.md
deleted file mode 100644
index cb9e613928d9..000000000000
--- a/docs/static_site/src/_includes/get_started/macos/cpp/cpp.md
+++ /dev/null
@@ -1,3 +0,0 @@
-To enable the C++ package, build from source using `make USE_CPP_PACKAGE=1`.
-Refer to the [MXNet C++ setup guide](/get_started/cpp_setup.html) for full instructions.
-
\ No newline at end of file
diff --git a/docs/static_site/src/_includes/get_started/macos/java/cpu.md b/docs/static_site/src/_includes/get_started/macos/java/cpu.md
deleted file mode 100644
index 002037a15771..000000000000
--- a/docs/static_site/src/_includes/get_started/macos/java/cpu.md
+++ /dev/null
@@ -1,16 +0,0 @@
-You can use the Maven packages defined in the following dependency to include MXNet in
-your Java project. The Java API is provided as a subset of the Scala API and is intended for
-inference only.
-Please refer to the [MXNet-Java setup guide](/get_started/java_setup.html) for a detailed set of instructions to help you with the setup process.
-
-
-
-{% highlight html %}
-
- org.apache.mxnet
- mxnet-full_2.11-linux-x86_64-cpu
- [1.5.0, )
-
-{% endhighlight %}
\ No newline at end of file
diff --git a/docs/static_site/src/_includes/get_started/macos/java/gpu.md b/docs/static_site/src/_includes/get_started/macos/java/gpu.md
deleted file mode 100644
index b17ef33478ea..000000000000
--- a/docs/static_site/src/_includes/get_started/macos/java/gpu.md
+++ /dev/null
@@ -1 +0,0 @@
-Not available at this time.
\ No newline at end of file
diff --git a/docs/static_site/src/_includes/get_started/macos/julia/build-from-source.md b/docs/static_site/src/_includes/get_started/macos/julia/build-from-source.md
deleted file mode 100644
index b864d8a49086..000000000000
--- a/docs/static_site/src/_includes/get_started/macos/julia/build-from-source.md
+++ /dev/null
@@ -1 +0,0 @@
-Refer to the [Julia section of the MXNet macOS installation guide](get_started/osx_setup.html#install-the-mxnet-package-for-julia).
\ No newline at end of file
diff --git a/docs/static_site/src/_includes/get_started/macos/julia/pkg.md b/docs/static_site/src/_includes/get_started/macos/julia/pkg.md
deleted file mode 100644
index 35971305ee2d..000000000000
--- a/docs/static_site/src/_includes/get_started/macos/julia/pkg.md
+++ /dev/null
@@ -1,10 +0,0 @@
-Install a pinned version of MXNet like this:
-
-{% highlight julia %}
-]add MXNet#v1.5.0
-{% endhighlight %}
-
-Or directly install the latest release:
-{% highlight julia %}
-]add MXNet
-{% endhighlight %}
\ No newline at end of file
diff --git a/docs/static_site/src/_includes/get_started/macos/perl/perl.md b/docs/static_site/src/_includes/get_started/macos/perl/perl.md
deleted file mode 100644
index 45d59ddf78a6..000000000000
--- a/docs/static_site/src/_includes/get_started/macos/perl/perl.md
+++ /dev/null
@@ -1 +0,0 @@
-Refer to the [Perl section of installation guide](/get_started/osx_setup.html#install-the-mxnet-package-for-perl).
\ No newline at end of file
diff --git a/docs/static_site/src/_includes/get_started/macos/python/cpu/build-from-source.md b/docs/static_site/src/_includes/get_started/macos/python/cpu/build-from-source.md
deleted file mode 100644
index ee8e378ec20e..000000000000
--- a/docs/static_site/src/_includes/get_started/macos/python/cpu/build-from-source.md
+++ /dev/null
@@ -1,2 +0,0 @@
-To build from source, refer to the [MXNet macOS installation guide](/get_started/osx_setup.html).
-MXNet developers should refer to the MXNet wiki's [Developer Setup on Mac](https://cwiki.apache.org/confluence/display/MXNET/MXNet+Developer+Setup+on+Mac).
diff --git a/docs/static_site/src/_includes/get_started/macos/python/cpu/docker.md b/docs/static_site/src/_includes/get_started/macos/python/cpu/docker.md
deleted file mode 100644
index c8631ec26ab3..000000000000
--- a/docs/static_site/src/_includes/get_started/macos/python/cpu/docker.md
+++ /dev/null
@@ -1,35 +0,0 @@
-Docker images with *MXNet* are available at [DockerHub](https://hub.docker.com/r/mxnet/).
-
-**Step 1** Install Docker on your machine by following the [docker installation
-instructions](https://docs.docker.com/docker-for-mac/install/#install-and-run-docker-for-mac).
-
-*Note* - You can install Community Edition (CE) to get started with *MXNet*.
-
-**Step 2** Pull the MXNet docker image.
-
-{% highlight bash %}
-$ docker pull mxnet/python
-{% endhighlight %}
-
-You can list docker images to see if mxnet/python docker image pull was successful.
-
-{% highlight bash %}
-$ docker images
-
-REPOSITORY TAG IMAGE ID CREATED SIZE
-mxnet/python latest 00d026968b3c 3 weeks ago 1.41 GB
-{% endhighlight %}
-
-Using the latest MXNet with [Intel MKL-DNN](https://github.com/intel/mkl-dnn) is
-recommended for the
-fastest inference speeds with MXNet.
-
-{% highlight bash %}
-$ docker pull mxnet/python:1.3.0_cpu_mkl # Use sudo if you skip Step 2
-$ docker images # Use sudo if you skip Step 2
-
-REPOSITORY TAG IMAGE ID CREATED SIZE
-mxnet/python 1.3.0_cpu_mkl deaf9bf61d29 4 days ago 678 MB
-{% endhighlight %}
-
-**Step 4** Validate the installation.
\ No newline at end of file
diff --git a/docs/static_site/src/_includes/get_started/macos/python/cpu/pip.md b/docs/static_site/src/_includes/get_started/macos/python/cpu/pip.md
deleted file mode 100644
index beb5eb4fb797..000000000000
--- a/docs/static_site/src/_includes/get_started/macos/python/cpu/pip.md
+++ /dev/null
@@ -1,73 +0,0 @@
-Run the following command:
-
-
-
-{% include /get_started/pip_snippet.md %}
\ No newline at end of file
diff --git a/docs/static_site/src/_includes/get_started/macos/python/gpu/build-from-source.md b/docs/static_site/src/_includes/get_started/macos/python/gpu/build-from-source.md
deleted file mode 100644
index ffcf88ecdd25..000000000000
--- a/docs/static_site/src/_includes/get_started/macos/python/gpu/build-from-source.md
+++ /dev/null
@@ -1,2 +0,0 @@
-Refer to the [MXNet macOS installation guide](get_started/osx_setup.html).
-MXNet developers should refer to the MXNet wiki's [Developer Setup on Mac](https://cwiki.apache.org/confluence/display/MXNET/MXNet+Developer+Setup+on+Mac).
\ No newline at end of file
diff --git a/docs/static_site/src/_includes/get_started/macos/python/gpu/pip_docker.md b/docs/static_site/src/_includes/get_started/macos/python/gpu/pip_docker.md
deleted file mode 100644
index 87281fe85e38..000000000000
--- a/docs/static_site/src/_includes/get_started/macos/python/gpu/pip_docker.md
+++ /dev/null
@@ -1 +0,0 @@
-This option is only available by building from source. Refer to the [MXNet macOS installation guide](get_started/osx_setup.html).
\ No newline at end of file
diff --git a/docs/static_site/src/_includes/get_started/macos/r/cpu.md b/docs/static_site/src/_includes/get_started/macos/r/cpu.md
deleted file mode 100644
index 1e0d96b91ed7..000000000000
--- a/docs/static_site/src/_includes/get_started/macos/r/cpu.md
+++ /dev/null
@@ -1,28 +0,0 @@
-To run MXNet you also should have OpenCV and OpenBLAS installed. You may install them with `brew` as follows:
-
-{% highlight bash %}
-brew install opencv
-brew install openblas
-{% endhighlight %}
-
-To ensure MXNet R package runs with the version of OpenBLAS installed, create a symbolic link as follows:
-
-{% highlight bash %}
-ln -sf /usr/local/opt/openblas/lib/libopenblas.dylib
-/usr/local/opt/openblas/lib/libopenblasp-r0.3.1.dylib
-{% endhighlight %}
-
-Note: packages for 3.6.x are not yet available.
-
-Install 3.5.x of R from [CRAN](https://cran.r-project.org/bin/macosx/). The latest is
-[v3.5.3](https://cran.r-project.org/bin/macosx/R-3.5.3.pkg).
-
-You can [build MXNet-R from source](get_started/osx_setup.html#install-the-mxnet-package-for-r), or
-you can use a pre-built binary:
-
-{% highlight r %}
-cran <- getOption("repos")
-cran["dmlc"] <- "https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/R/CRAN/"
-options(repos = cran)
-install.packages("mxnet")
-{% endhighlight %}
diff --git a/docs/static_site/src/_includes/get_started/macos/r/gpu.md b/docs/static_site/src/_includes/get_started/macos/r/gpu.md
deleted file mode 100644
index 3fc556a8dd91..000000000000
--- a/docs/static_site/src/_includes/get_started/macos/r/gpu.md
+++ /dev/null
@@ -1 +0,0 @@
-Be the first one to contribute this guide!
\ No newline at end of file
diff --git a/docs/static_site/src/_includes/get_started/macos/scala/cpu.md b/docs/static_site/src/_includes/get_started/macos/scala/cpu.md
deleted file mode 100644
index 0bc69eb2e63e..000000000000
--- a/docs/static_site/src/_includes/get_started/macos/scala/cpu.md
+++ /dev/null
@@ -1,14 +0,0 @@
-You can use the Maven packages defined in the following dependency to include MXNet in your Scala
-project. Please refer to the [MXNet-Scala setup guide](/get_started/scala_setup.html) for a detailed set
-of instructions to help you with the setup process.
-
-
-
-{% highlight html %}
-
- org.apache.mxnet
- mxnet-full_2.11-osx-x86_64-cpu
-
-{% endhighlight %}
\ No newline at end of file
diff --git a/docs/static_site/src/_includes/get_started/macos/scala/gpu.md b/docs/static_site/src/_includes/get_started/macos/scala/gpu.md
deleted file mode 100644
index b17ef33478ea..000000000000
--- a/docs/static_site/src/_includes/get_started/macos/scala/gpu.md
+++ /dev/null
@@ -1 +0,0 @@
-Not available at this time.
\ No newline at end of file
diff --git a/docs/static_site/src/_includes/get_started/pip_snippet.md b/docs/static_site/src/_includes/get_started/pip_snippet.md
index 2c4d932fc816..9e67acce5eda 100644
--- a/docs/static_site/src/_includes/get_started/pip_snippet.md
+++ b/docs/static_site/src/_includes/get_started/pip_snippet.md
@@ -1,14 +1,13 @@
-MXNet offers MKL pip packages that will be much faster when running on Intel hardware.
-Check the chart below for other options, refer to PyPI for
-other MXNet pip packages, or validate your MXNet installation.
+You can then validate your MXNet installation.
-
**NOTES:**
-*mxnet-cu101mkl* means the package is built with CUDA/cuDNN and MKL-DNN enabled and the CUDA version is 10.1.
+*mxnet-cu101* means the package is built with CUDA/cuDNN and the CUDA version is
+10.1.
All MKL pip packages are experimental prior to version 1.3.0.
diff --git a/docs/static_site/src/_includes/get_started/windows b/docs/static_site/src/_includes/get_started/windows
new file mode 120000
index 000000000000..9c52cb36f47e
--- /dev/null
+++ b/docs/static_site/src/_includes/get_started/windows
@@ -0,0 +1 @@
+linux
\ No newline at end of file
diff --git a/docs/static_site/src/_includes/get_started/windows/clojure/clojure.md b/docs/static_site/src/_includes/get_started/windows/clojure/clojure.md
deleted file mode 100644
index 0b25ab9018d3..000000000000
--- a/docs/static_site/src/_includes/get_started/windows/clojure/clojure.md
+++ /dev/null
@@ -1 +0,0 @@
-MXNet-Clojure for Windows is not yet available.
\ No newline at end of file
diff --git a/docs/static_site/src/_includes/get_started/windows/cpp/cpp.md b/docs/static_site/src/_includes/get_started/windows/cpp/cpp.md
deleted file mode 100644
index 023735b6c3b1..000000000000
--- a/docs/static_site/src/_includes/get_started/windows/cpp/cpp.md
+++ /dev/null
@@ -1,3 +0,0 @@
-To enable the C++ package, build from source using `make USE_CPP_PACKAGE=1`.
-Refer to the [MXNet C++ setup guide](/get_started/cpp_setup.html) for full instructions.
-
diff --git a/docs/static_site/src/_includes/get_started/windows/java/java.md b/docs/static_site/src/_includes/get_started/windows/java/java.md
deleted file mode 100644
index 0db1f50590a2..000000000000
--- a/docs/static_site/src/_includes/get_started/windows/java/java.md
+++ /dev/null
@@ -1 +0,0 @@
-MXNet-Java for Windows is not yet available.
\ No newline at end of file
diff --git a/docs/static_site/src/_includes/get_started/windows/julia/build-from-source.md b/docs/static_site/src/_includes/get_started/windows/julia/build-from-source.md
deleted file mode 100644
index 4fc600468ad1..000000000000
--- a/docs/static_site/src/_includes/get_started/windows/julia/build-from-source.md
+++ /dev/null
@@ -1 +0,0 @@
-Refer to the [Julia section of the MXNet Windows installation guide](/get_started/windows_setup.html#install-the-mxnet-package-for-julia).
diff --git a/docs/static_site/src/_includes/get_started/windows/julia/pkg.md b/docs/static_site/src/_includes/get_started/windows/julia/pkg.md
deleted file mode 100644
index cb79177e5bbe..000000000000
--- a/docs/static_site/src/_includes/get_started/windows/julia/pkg.md
+++ /dev/null
@@ -1,10 +0,0 @@
-Install a pinned version of MXNet like this:
-
-{% highlight julia %}
-]add MXNet#v1.5.0
-{% endhighlight %}
-
-Or directly install the latest release:
-{% highlight julia %}
-]add MXNet
-{% endhighlight %}
diff --git a/docs/static_site/src/_includes/get_started/windows/perl/perl.md b/docs/static_site/src/_includes/get_started/windows/perl/perl.md
deleted file mode 100644
index 1a8eea5261ba..000000000000
--- a/docs/static_site/src/_includes/get_started/windows/perl/perl.md
+++ /dev/null
@@ -1 +0,0 @@
-Refer to the [Perl section of the MXNet Windows installation guide](/get_started/windows_setup.html#install-the-mxnet-package-for-perl).
\ No newline at end of file
diff --git a/docs/static_site/src/_includes/get_started/windows/python/cpu/build-from-source.md b/docs/static_site/src/_includes/get_started/windows/python/cpu/build-from-source.md
deleted file mode 100644
index af36205337d2..000000000000
--- a/docs/static_site/src/_includes/get_started/windows/python/cpu/build-from-source.md
+++ /dev/null
@@ -1 +0,0 @@
-Refer to the [MXNet Windows installation guide](/get_started/windows_setup.html)
\ No newline at end of file
diff --git a/docs/static_site/src/_includes/get_started/windows/python/cpu/docker.md b/docs/static_site/src/_includes/get_started/windows/python/cpu/docker.md
deleted file mode 100644
index a061b8bbab65..000000000000
--- a/docs/static_site/src/_includes/get_started/windows/python/cpu/docker.md
+++ /dev/null
@@ -1,34 +0,0 @@
-Docker images with *MXNet* are available at [Docker Hub](https://hub.docker.com/r/mxnet/).
-
-**Step 1** Install Docker on your machine by following the docker installation instructions
-
-*Note* - You can install Community Edition (CE) to get started with *MXNet*.
-
-**Step 2** Pull the MXNet docker image.
-
-{% highlight bash %}
-$ docker pull mxnet/python # Use sudo if you skip Step 2
-{% endhighlight %}
-
-You can list docker images to see if mxnet/python docker image pull was successful.
-
-{% highlight bash %}
-$ docker images # Use sudo if you skip Step 2
-
-REPOSITORY TAG IMAGE ID CREATED SIZE
-mxnet/python latest 00d026968b3c 3 weeks ago 1.41 GB
-{% endhighlight %}
-
-Using the latest MXNet with [Intel MKL-DNN](https://github.com/intel/mkl-dnn) is
-recommended for the
-fastest inference speeds with MXNet.
-
-{% highlight bash %}
-$ docker pull mxnet/python:1.3.0_cpu_mkl # Use sudo if you skip Step 2
-$ docker images # Use sudo if you skip Step 2
-
-REPOSITORY TAG IMAGE ID CREATED SIZE
-mxnet/python 1.3.0_cpu_mkl deaf9bf61d29 4 days ago 678 MB
-{% endhighlight %}
-
-**Step 4** Validate the installation.
\ No newline at end of file
diff --git a/docs/static_site/src/_includes/get_started/windows/python/cpu/pip.md b/docs/static_site/src/_includes/get_started/windows/python/cpu/pip.md
deleted file mode 100644
index d5c7f1fd08f0..000000000000
--- a/docs/static_site/src/_includes/get_started/windows/python/cpu/pip.md
+++ /dev/null
@@ -1,73 +0,0 @@
-Run the following command:
-
-
-
-{% include /get_started/pip_snippet.md %}
-{% include /get_started/gpu_snippet.md %}
\ No newline at end of file
diff --git a/docs/static_site/src/_includes/get_started/windows/python/gpu/build-from-source.md b/docs/static_site/src/_includes/get_started/windows/python/gpu/build-from-source.md
deleted file mode 100644
index 55bca3a129d8..000000000000
--- a/docs/static_site/src/_includes/get_started/windows/python/gpu/build-from-source.md
+++ /dev/null
@@ -1 +0,0 @@
-To build from source, refer to the [MXNet Windows installation guide](/get_started/windows_setup.html).
diff --git a/docs/static_site/src/_includes/get_started/windows/python/gpu/pip.md b/docs/static_site/src/_includes/get_started/windows/python/gpu/pip.md
deleted file mode 100644
index cbcd9d44d6af..000000000000
--- a/docs/static_site/src/_includes/get_started/windows/python/gpu/pip.md
+++ /dev/null
@@ -1,74 +0,0 @@
-Run the following command:
-
-
-
-
-{% include /get_started/pip_snippet.md %}
-{% include /get_started/gpu_snippet.md %}
\ No newline at end of file
diff --git a/docs/static_site/src/_includes/get_started/windows/r/cpu.md b/docs/static_site/src/_includes/get_started/windows/r/cpu.md
deleted file mode 100644
index 926b8355c984..000000000000
--- a/docs/static_site/src/_includes/get_started/windows/r/cpu.md
+++ /dev/null
@@ -1,15 +0,0 @@
-Note: packages for 3.6.x are not yet available.
-Install 3.5.x of R from [CRAN](https://cran.r-project.org/bin/windows/base/old/).
-
-You can [build MXNet-R from source](/get_started/windows_setup.html#install-mxnet-package-for-r), or
-you can use a
-pre-built binary:
-
-{% highlight r %}
-cran <- getOption("repos")
-cran["dmlc"] <- "https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/R/CRAN/"
-options(repos = cran)
-install.packages("mxnet")
-{% endhighlight %}
-
-To run MXNet you also should have OpenCV and OpenBLAS installed.
diff --git a/docs/static_site/src/_includes/get_started/windows/r/gpu.md b/docs/static_site/src/_includes/get_started/windows/r/gpu.md
deleted file mode 100644
index 084f1a5a4012..000000000000
--- a/docs/static_site/src/_includes/get_started/windows/r/gpu.md
+++ /dev/null
@@ -1,16 +0,0 @@
-You can [build MXNet-R from source](/get_started/windows_setup.html#install-mxnet-package-for-r), or
-you can use a
-pre-built binary:
-
-{% highlight r %}
-cran <- getOption("repos")
-cran["dmlc"] <-
-"https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/R/CRAN/GPU/cu92"
-options(repos = cran)
-install.packages("mxnet")
-{% endhighlight %}
-
-Change cu92 to cu90, cu91 or cuda100 based on your CUDA toolkit version. Currently, MXNet supports these versions of CUDA.
-Note : You also need to have cuDNN installed on Windows. Check out this
-[guide](https://docs.nvidia.com/deeplearning/sdk/cudnn-install/index.html#installwindows)
-on the steps for installation.
\ No newline at end of file
diff --git a/docs/static_site/src/_includes/get_started/windows/scala/scala.md b/docs/static_site/src/_includes/get_started/windows/scala/scala.md
deleted file mode 100644
index 74b7d45c6d79..000000000000
--- a/docs/static_site/src/_includes/get_started/windows/scala/scala.md
+++ /dev/null
@@ -1 +0,0 @@
-MXNet-Scala for Windows is not yet available.
\ No newline at end of file
diff --git a/docs/static_site/src/_includes/head.html b/docs/static_site/src/_includes/head.html
index 9a565c756c07..fb7822caaae9 100644
--- a/docs/static_site/src/_includes/head.html
+++ b/docs/static_site/src/_includes/head.html
@@ -5,14 +5,22 @@
{%- seo -%}
+
{%- feed_meta -%}
{%- if jekyll.environment == 'production' and site.google_analytics -%}
{%- include google-analytics.html -%}
{%- endif -%}
-
-
-
-
-
+ {%- if jekyll.environment == 'production' -%}
+
+ {%- else -%}
+
+ {%- endif -%}
+
+
+
+
+ {%- if page.feedback == true and jekyll.environment == "production" -%}
+
+ {%- endif -%}
diff --git a/docs/static_site/src/_includes/header.html b/docs/static_site/src/_includes/header.html
index 314506476985..d4e81c9da6b1 100644
--- a/docs/static_site/src/_includes/header.html
+++ b/docs/static_site/src/_includes/header.html
@@ -35,14 +35,75 @@
-
+
+
+
+
+
+
+
+
+
+
+ {% for version in site.versions %}
+ {% if version == site.versions[1] %}
+
diff --git a/docs/static_site/src/pages/api/developer_guide/1_github_contribution_and_PR_verification_tips.md b/docs/static_site/src/pages/api/developer_guide/1_github_contribution_and_PR_verification_tips.md
new file mode 100644
index 000000000000..93cc916f7b0f
--- /dev/null
+++ b/docs/static_site/src/pages/api/developer_guide/1_github_contribution_and_PR_verification_tips.md
@@ -0,0 +1,193 @@
+---
+layout: page_category
+title: GitHub contribution and PR verification tips
+category: Developer Guide
+permalink: /api/dev-guide/github_contribution_and_PR_verification_tips
+---
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+# GitHub contribution and PR verification tips
+
+Use this page for general git workflow tips.
+
+## Setup and configure
+
+It is recommended that you fork the MXNet repo, and then set the original repo as an upstream remote repo.
+
+Fork [https://github.com/apache/incubator-mxnet](https://github.com/apache/incubator-mxnet) then:
+
+```
+git clone --recursive https://github.com/your_username/incubator-mxnet
+cd mxnet
+git remote add upstream https://github.com/apache/incubator-mxnet
+```
+
+Once `upstream` was added, then create a branch for your contribution.
+
+
+```
+git branch your-contribution-branch
+```
+
+Note that you can incorporate the changes from `upstream` to any of your local branches during or after development via:
+
+```
+git fetch upstream
+git rebase upstream/master
+```
+
+See [this stackoverflow discussion](https://stackoverflow.com/questions/3357122/git-pull-vs-git-fetch-vs-git-rebase) for more details about difference between `git pull`, `git rebase` and `git merge`.
+
+Since Apache MXNet 3rd party git submodules, to update their changes on your branch after rebase, you can run:
+
+```
+git submodule update --recursive
+```
+
+## Save your local changes for future
+
+During development, you can save your current changes in your branch before committing anything. For example to go to another branch to do something else via:
+
+
+```
+git stash save
+```
+
+To restore the changes so that they can be added to a commit use:
+
+
+```
+git stash pop
+```
+
+
+To drop the changes, use:
+
+```
+git stash drop
+```
+
+## Reset
+
+Sometimes, if you want to wipe out the changes you have made you can use:
+
+```
+git reset --hard
+```
+
+Be very careful since hard-reset removes any of the changes and you’ll be back to the HEAD commit. To remove all the changed before a commit given its commit-SHA you can use `git reset --hard commit-SHA` or `git reset --hard HEAD~2` to remove relative to the first two commits on top of HEAD.
+
+However, sometimes it’s useful to keep the files/changes staged when moving the HEAD which can be done via
+`git reset --soft`. All of the files changed between the original HEAD and the commit will be staged.
+
+In [summary](https://stackoverflow.com/a/50022436),
+
+
+* **`--soft`**: **uncommit** changes, changes are left staged (*index*).
+* **`--mixed`** *(default)*: **uncommit + unstage** changes, changes are left in *working tree*.
+* **`--hard`**: **uncommit + unstage + delete** changes, nothing left.
+
+
+
+## Recover a previous commit after reset
+
+Sometimes you might mistakenly reset a branch to a wrong commit. When that happens, you can use the following command to show the list of recent commits:
+
+
+```
+git reflog
+```
+
+Once you get the right hashtag, you can use git reset again to change the head to the right commit.
+
+
+## How to resolve conflict with master
+
+Sometimes when rebasing to the most recent master as explained above, git may show you there are some conflicts which it cannot resolve. These changes will not be merged. For examples, your file `conflict.py` has some conflicts with the master branch. Here you need to:
+
+* manually modify the file to resolve the conflict.
+* After you resolved the conflict, mark it as resolved by:
+
+```
+git add conflict.py
+```
+
+* Then you can continue rebase by:
+
+```
+git rebase --continue
+```
+
+* Finally push to your fork, you may need to **force push** here:
+
+```
+git push --force
+```
+
+**Note** that force push is okay when it’s on your branch and you are the only one who is using that branch. Otherwise, it can have bad consequences as it’s rewritten the history.
+
+
+## How to group multiple commits into one
+
+Sometimes, you may have added a lot of related commits suitable to be grouped/combined together to create one meaningful atomic commit. For example, when later commits are only fixes to previous ones, in your PR.
+If you haven’t configured your default git editor, do the following once:
+
+```
+git config core.editor the-editor-you-like
+```
+
+Assume we want to merge the last 3 commits.
+
+```
+git rebase -i HEAD~3
+```
+
+1. It will pop up an text editor. Set the **first commit as pick,** and **change later ones to squash**.
+2. After you saved the file, it will pop up another text editor to ask you modify the combined commit message.
+3. Push the changes to your fork, you need to force push.
+
+```
+git push --force
+```
+
+**Note** that force push is okay when it’s on your branch and you are the only one who is using that branch. Otherwise, it can have bad consequences as it’s rewritten the history.
+
+
+## Apply only k-latest commits on to the master
+
+Sometimes it is useful to only apply your k-latest changes on top of the master. This usually happens when you have other m-commits that are already merged before these k-commits. Directly rebase against the master might cause merge conflicts on these first m-commits (which can be safely discarded).
+
+You can instead use the following command:
+
+
+```
+# k is the concrete number. Put HEAD~2 for the last 1 commit.
+git rebase --onto upstream/master HEAD~k
+```
+
+You can then force push to the master `git push --force`. Note that the above command will discard all the commits before the last k ones.
+
+
+## What is the consequence of force push
+
+The last three tips require the force push, this is because we altered the path of the commits. **It is fine to force push to your own fork, as long as the commits changed are only yours.** In case there are multiple collaborators who use your branch there is a safer option `git push --force-with-lease.`
+
+
+## PR verification
+
+When sending a pull request, remember to add some tests. During the development, one can set `MXNET_TEST_COUNT=1000/10000` to test on some randomly selected test cases. This makes the testing and development cycle faster. Moreover, some test results might change due to the seed in pseudo-random number generator. To fix the seed during testing, set `MXNET_TEST_SEED=your seed number`.
diff --git a/docs/static_site/src/pages/api/developer_guide/debugging_and_performance_optimization_tips.md b/docs/static_site/src/pages/api/developer_guide/debugging_and_performance_optimization_tips.md
new file mode 100644
index 000000000000..7f53bdb73bbc
--- /dev/null
+++ b/docs/static_site/src/pages/api/developer_guide/debugging_and_performance_optimization_tips.md
@@ -0,0 +1,59 @@
+---
+layout: page_category
+title: Debugging and performance optimization tips
+category: Developer Guide
+permalink: /api/dev-guide/debugging_and_performance_optimization_tips
+---
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+# Debugging and performance optimization tips
+
+The general workflow when defining your network with Gluon API is either:
+
+* build sequentially using `nn.Sequential` or `nn.HybridSequential`
+
+* inherit from `nn.Block` or `nn.HybridBlock`
+
+## Debugging
+
+When debugging your MXNet code, remember the following:
+
+**Do NOT hybridize for debugging**
+
+The difference between [imperative style (Gluon non-hybridized) and symbolic style (Gluon hybridized)]({{ "/versions/1.2.1/architecture/program_model.html" | relative_url }}) is:
+
+* *imperative style* is _define-by-run_
+* *symbolic style* is _define-then-run_
+
+
+Basically, that means the execution path changes when calling `hybridize` on your network inherited from `HybridBlock` or `HybridSequential` (note that inheriting directly from `Block` is the same as not hybridizing your network). For efficiency, symbolic code does not keep the intermediate results and so it would be hard to debug and examine the intermediate outputs. Therefore, if you want to *examine the intermediate results for debugging, do NOT hybridize*. Once everything is working as expected, then you can `hybridize` and enjoy the speed up.
+
+Please checkout the [d2l](http://d2l.ai/chapter_computational-performance/hybridize.html?highlight=hybridize#hybrid-programming) for more details about the hybrid-programming model.
+
+## Use naive engine
+
+It is also useful to set the environment variable `MXNET_ENGINE_TYPE='NaiveEngine'` prior to running your (end-to-end) code. This setting disables multi-threading and the execution engine will be synchronous, so you can examine the backtrace more easily. Remember to change it back to either the default `'ThreadedEnginePerDevice'` or `'ThreadedEngine'`.
+
+For more details, here is a comprehensive tutorial on interactive debugging on [YouTube](https://www.youtube.com/watch?v=6-dOoJVw9_0).
+
+## Performance optimization
+
+Following up on using the environment variable `MXNET_ENGINE_TYPE` for debugging, here are the [available environment variables]({{ "/api/faq/env_var" | relative_url }}) that affect the performance of your code.
+
+Please refer to [this presentation](https://www.slideshare.net/ThomasDelteil1/debugging-and-performance-tricks-for-mxnet-gluon) for more information on debugging and performance optimization.
+
diff --git a/docs/static_site/src/pages/api/developer_guide/examine_forward_results_with_hooks.md b/docs/static_site/src/pages/api/developer_guide/examine_forward_results_with_hooks.md
new file mode 100644
index 000000000000..cc468037de16
--- /dev/null
+++ b/docs/static_site/src/pages/api/developer_guide/examine_forward_results_with_hooks.md
@@ -0,0 +1,163 @@
+---
+layout: page_category
+title: Examine forward results with hooks
+category: Developer Guide
+permalink: /api/dev-guide/examine_forward_results_with_hooks
+---
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+# Examine forward results with hooks
+
+There are currently three ways to register a function in an MXNet Gluon Block for execution:
+
+* before `forward` via [register_forward_pre_hook]({{"/api/python/docs/api/gluon/block.html#mxnet.gluon.Block.register_forward_pre_hook" | relative_url }})
+* after `forward` via [register_forward_hook]({{"/api/python/docs/api/gluon/block.html#mxnet.gluon.Block.register_forward_hook" | relative_url }})
+* as a callback via [register_op_hook]({{"/api/python/docs/api/gluon/block.html#mxnet.gluon.Block.register_op_hook" | relative_url }})
+
+## Pre-forward hook
+
+To register a hook prior to forward execution, the requirement is that the registered operation **should not modify the input or output**. For example: `hook(block, input) -> None`. This is useful to get a summary before execution.
+
+```
+import mxnet as mx
+from mxnet.gluon import nn
+
+block = nn.Dense(10)
+block.initialize()
+print("{}".format(block))
+# Dense(None -> 10, linear)
+
+def pre_hook(block, input) -> None: # notice it has two arguments, one block and one input
+ print("{}".format(block))
+ return
+
+# register
+pre_handle = block.register_forward_pre_hook(pre_hook)
+input = mx.nd.ones((3, 5))
+print(block(input))
+
+# Dense(None -> 10, linear)
+# [[ 0.11254273 0.11162187 0.02200389 -0.04842059 0.09531345 0.00880495
+# -0.07610667 0.1562067 0.14192852 0.04463106]
+# [ 0.11254273 0.11162187 0.02200389 -0.04842059 0.09531345 0.00880495
+# -0.07610667 0.1562067 0.14192852 0.04463106]
+# [ 0.11254273 0.11162187 0.02200389 -0.04842059 0.09531345 0.00880495
+# -0.07610667 0.1562067 0.14192852 0.04463106]]
+#
+```
+
+We can `detach` a hook from a block:
+
+
+```
+pre_handle.detach()
+print(block(input))
+
+# [[ 0.11254273 0.11162187 0.02200389 -0.04842059 0.09531345 0.00880495
+# -0.07610667 0.1562067 0.14192852 0.04463106]
+# [ 0.11254273 0.11162187 0.02200389 -0.04842059 0.09531345 0.00880495
+# -0.07610667 0.1562067 0.14192852 0.04463106]
+# [ 0.11254273 0.11162187 0.02200389 -0.04842059 0.09531345 0.00880495
+# -0.07610667 0.1562067 0.14192852 0.04463106]]
+#
+```
+
+Notice `Dense(None -> 10, linear)` is not displayed anymore.
+
+## Post-forward hook
+
+Registering a hook after forward execution is very similar to pre-forward hook (as explained above) with the difference that the hook signature should be `hook(block, input, output) -> None` where **hook should not modify the input and output.** Continuing from the above example:
+
+
+```
+def post_hook(block, intput, output) -> None:
+ print("{}".format(block))
+ return
+
+post_handle = block.register_forward_hook(post_hook)
+print(block(input))
+
+# Dense(5 -> 10, linear)
+# [[ 0.11254273 0.11162187 0.02200389 -0.04842059 0.09531345 0.00880495
+# -0.07610667 0.1562067 0.14192852 0.04463106]
+# [ 0.11254273 0.11162187 0.02200389 -0.04842059 0.09531345 0.00880495
+# -0.07610667 0.1562067 0.14192852 0.04463106]
+# [ 0.11254273 0.11162187 0.02200389 -0.04842059 0.09531345 0.00880495
+# -0.07610667 0.1562067 0.14192852 0.04463106]]
+#
+```
+
+
+Notice the difference between `pre_hook` and `post_hook` results due to shape inference after `forward` is done executing.
+
+## Callback hook
+
+We can register a callback monitor to monitor all operators that are called by the `HybridBlock` **after hybridization** with `register_op_hook(callback, monitor_all=False) ` where the callback signature should be:
+
+
+```
+callback(node_name: str, opr_name: str, arr: NDArray) -> None
+```
+
+where `node_name` is the name of the tensor being inspected (str), `opr_name` is the name of the operator producing or consuming that tensor (str) and `arr` the tensor being inspected (NDArray).
+
+
+```
+import mxnet as mx
+from mxnet.gluon import nn
+
+def mon_callback(node_name, opr_name, arr):
+ print("{}".format(node_name))
+ print("{}".format(opr_name))
+ return
+
+model = nn.HybridSequential(prefix="dense_")
+with model.name_scope():
+ model.add(mx.gluon.nn.Dense(2))
+
+model.initialize()
+model.hybridize()
+model.register_op_hook(mon_callback, monitor_all=True)
+print(model(mx.nd.ones((2, 3, 4))))
+
+# b'dense_dense0_fwd_data'
+# b'FullyConnected'
+# b'dense_dense0_fwd_weight'
+# b'FullyConnected'
+# b'dense_dense0_fwd_bias'
+# b'FullyConnected'
+# b'dense_dense0_fwd_output'
+# b'FullyConnected'
+# [[-0.05979988 -0.16349721]
+# [-0.05979988 -0.16349721]]
+#
+```
+
+
+Setting `monitor_all=False` will print only the output:
+
+
+```
+`# b'dense_dense0_fwd_output'`
+`# b'FullyConnected'``
+# [[-0.05979988 -0.16349721]
+# [-0.05979988 -0.16349721]]
+#
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+# Exception handing and custom error types
+
+
+Apache MXNet v1.7 has added the custom error type support and as a result `MXNetError` is inherited from `RuntimeError` so it is possible to register a custom error type in the backend and prepend its error message. Then in the frontend, one can throw the exception of the registered error type.
+
+For example, we want the `transpose` operator defined in the C++ backend to throw `ValueError` type in the Python frontend. Therefore, in the C++ backend we can add this check:
+
+```
+CHECK_EQ(axes_set.size(), axes.ndim()) << "ValueError: Repeated axis in transpose."
+ << " param.axes = "
+ << param.axes;
+```
+
+so that on the frontend, when a problematic `transpose` call is made such as:
+
+```
+from mxnet import np
+
+dat = np.random.normal(0, 1, (3, 4, 5))
+dat.transpose((0, 0, 1))
+```
+
+the following traceback will be produced:
+
+
+```
+ValueError Traceback (most recent call last)
+ in
+----> 1 dat.transpose((0, 0, 1))
+
+~/mxnet-distro/mxnet-build/python/mxnet/numpy/multiarray.py in transpose(self, *axes)
+ 1460 elif axes[0] is None:
+ 1461 axes = None
+-> 1462 return _mx_np_op.transpose(self, axes=axes)
+ 1463
+ 1464 def flip(self, *args, **kwargs):
+~/mxnet-distro/mxnet-build/python/mxnet/ndarray/register.py in transpose(a, axes, out, name, **kwargs)
+
+~/mxnet-distro/mxnet-build/python/mxnet/_ctypes/ndarray.py in _imperative_invoke(handle, ndargs, keys, vals, out, is_np_op, output_is_list)
+ 105 c_str_array(keys),
+ 106 c_str_array([str(s) for s in vals]),
+--> 107 ctypes.byref(out_stypes)))
+ 108
+ 109 create_ndarray_fn = _np_ndarray_cls if is_np_op else _ndarray_cls
+
+~/mxnet-distro/mxnet-build/python/mxnet/base.py in check_call(ret)
+ 271 """
+ 272 if ret != 0:
+--> 273 raise get_last_ffi_error()
+ 274
+ 275
+ValueError: Traceback (most recent call last):
+ File "src/operator/numpy/np_matrix_op.cc", line 77
+
+ValueError: Check failed: axes_set.size() == axes.ndim() (2 vs. 3) : Repeated axis in transpose. param.axes = [0,0,1]
+```
+
+
+Note that as of writing this document, the following Python error types are supported:
+
+
+* `ValueError`
+* `TypeError`
+* `AttributeError`
+* `IndexError`
+* `NotImplementedError`
+
+Check [this](https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/error.py) resource for more details
+about Python supported error types that MXNet supports.
+
+## How to register a custom error type
+
+Here is the way to register a custom error type in Python frontend:
+
+
+```
+import mxnet as mx
+
+@mx.error.register
+class MyError(mx.MXNetError):
+ def __init__(self, msg):
+ super().__init__(msg)
+```
+
+Then in the C++ backend, you can refer to `MyError` via:
+
+`LOG(FATAL) << "MyError: this is a custom error message"`
diff --git a/docs/static_site/src/pages/api/developer_guide/profiling.md b/docs/static_site/src/pages/api/developer_guide/profiling.md
new file mode 100644
index 000000000000..841c00891b6b
--- /dev/null
+++ b/docs/static_site/src/pages/api/developer_guide/profiling.md
@@ -0,0 +1,279 @@
+---
+layout: page_category
+title: Profiling
+category: Developer Guide
+permalink: /api/dev-guide/profiling
+---
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+# Profiling
+
+Apache MXNet provides memory [profiler]({{"/api/python/docs/api/mxnet/profiler/index.html" | relative_url }}) which is a way to access what is happening under the hood during runtime. The common scenario is you want to use the profiler for your hybridized model and visualize the outputs via `chrome://tracing`. Here are the steps you need to do:
+
+1. Configure the profiler
+2. `set_state('run')` before the model is defined
+3. Add `mx.nd.waitall()` to enforce synchronization after you have done with some computation (maybe as part of training)
+4. Then add `set_state('stop')`
+5. Finally `dump` the profiling results
+
+
+Here is a simple example
+
+```
+import mxnet as mx
+from mxnet.gluon import nn
+from mxnet import profiler
+
+def enable_profiler(profile_filename, run=True, continuous_dump=False, aggregate_stats=False):
+ profiler.set_config(profile_symbolic=True,
+ profile_imperative=True,
+ profile_memory=True,
+ profile_api=True,
+ filename=profile_filename,
+ continuous_dump=continuous_dump,
+ aggregate_stats=aggregate_stats)
+ if run:
+ profiler.set_state('run')
+
+enable_profiler(profile_filename='test_profiler.json', run=True, continuous_dump=True)
+profiler.set_state('run')
+
+model = nn.HybridSequential(prefix='net_')
+with model.name_scope():
+ model.add(nn.Dense(128, activation='tanh'))
+ model.add(nn.Dropout(0.5))
+ model.add(nn.Dense(64, activation='tanh'),
+ nn.Dense(32, in_units=64))
+ model.add(nn.Activation('relu'))
+model.initialize(ctx=mx.cpu())
+model.hybridize()
+
+inputs = mx.sym.var('data')
+
+with mx.autograd.record():
+ out = model(mx.nd.zeros((16, 10), ctx=mx.cpu()))
+out.backward()
+mx.nd.waitall()
+profiler.set_state('stop')
+profiler.dump(True)
+```
+
+And in `chrome://tracing` use the `load` and select `test_profiler.json`, then you will see something like this
+![dev_guide_profilling_1](/assets/img/dev_guide_profilling_1.png) To understand what is going on, we need to dive deep into the MXNet runtime.
+
+## Dive deep into MXNet runtime with the profiler
+
+Let's start with a simple example and explain as we go on. The following code creates a 3x3 tensor, computes the diagonal and then sum's along the diagonal (to compute the “trace”). Using the MXNet profiler, we capture internal MXNet behavior and dump it to a string and print it (`dumps()`) and also dump it to a file (`dump()`). Then we can import that file in `chrome://tracing` and view it graphically.
+
+```
+import mxnet as mx
+import numpy as np
+
+from mxnet import profiler
+
+#configure the profiler
+profiler.set_config(profile_all=True, aggregate_stats=True, filename='trace_profile.json')
+#start the profiler collecting data
+profiler.set_state('run')
+
+###########################################################
+#1. create our data
+data = np.linspace(1,9,9).reshape((3,3))
+
+#2. create an MXNet ndarray
+a = mx.nd.array(data)
+
+#3. compute on our data and produce results
+b = mx.nd.diag(a)
+c = mx.nd.sum(b,-1)
+
+#4. wait for computation to finish
+mx.nd.waitall()
+###########################################################
+
+#stop the profiler
+profiler.set_state('stop')
+
+#dump the profiling data as a string
+print(profiler.dumps())
+#dump the profiling data as a json file that can be viewed graphically
+profiler.dump()
+```
+
+When running this code, the dumps function dumps the profiling data to a string and returns it (which we promptly print). This statistical info is shown below.
+
+```
+Profile Statistics:
+ Note the difference in units for different entries.
+Device Storage
+=================
+Name Total Count Min Use (kB) Max Use (kB) Avg Use (kB)
+---- ----------- ------------- ------------- -------------
+Memory: cpu/0 3 96.0600 96.0760 0.0080
+
+MXNET_C_API
+=================
+Name Total Count Time (ms) Min Time (ms) Max Time (ms) Avg Time (ms)
+---- ----------- --------- ------------- ------------- -------------
+MXImperativeInvokeEx 2 0.3360 0.0990 0.2370 0.1680
+MXNet C API Calls 17 0.2320 0.2160 0.2320 0.0080
+MXNDArraySyncCopyFromCPU 1 0.1750 0.1750 0.1750 0.1750
+MXNDArrayCreateEx 1 0.1050 0.1050 0.1050 0.1050
+MXNDArrayGetShapeEx 11 0.0210 0.0000 0.0160 0.0019
+MXNDArrayWaitAll 1 0.0200 0.0200 0.0200 0.0200
+MXNDArrayGetDType 1 0.0010 0.0010 0.0010 0.0010
+MXNet C API Concurrency 34 0.0000 0.0000 0.0010 0.0000
+
+operator
+=================
+Name Total Count Time (ms) Min Time (ms) Max Time (ms) Avg Time (ms)
+---- ----------- --------- ------------- ------------- -------------
+sum 1 0.0520 0.0520 0.0520 0.0520
+diag 1 0.0410 0.0410 0.0410 0.0410
+WaitForVar 1 0.0220 0.0220 0.0220 0.0220
+```
+
+The dump function writes out the same data in a format that can be opened in `chrome://tracing` and displayed visually. This can be seen in the diagram below.
+
+![dev_guide_profilling_2.png](/assets/img/dev_guide_profilling_2.png)
+The profiling data has captured info about interesting functions that have executed while your program was running. Here are some explanations about what each one does.
+
+### **The functions in the C_API are:**
+
+|**Function Name** |**Description** |
+|--- |--- |
+|**MXImperativeInvokeEx** | invokes an operator to perform the computation |
+|**MXNDArrayCreateEx** | creates an ndarray |
+| **MXNDArrayGetDType** | returns the data type of the ndarray |
+| **MXNDArrayGetShape** | returns the shape of the ndarray (as a tuple where each element is the size of a dimension) |
+| **MXNDArraySyncCopyFromCPU** | called when data is initially residing outside of an MXNet data structure (ie. numpy.ndarry rather than mxnet.numpy.ndarray). Data is copied into the MXNet data structure |
+| **MXNDArrayWaitAll** | wait for all asynchronous operations to finish in MXNet. This function is only used in benchmarking to wait for work to happen. In a real program, there is no waiting and data dependencies are evaluated and computation executed as needed in a As Late As Possible (ALAP) way |
+
+### **The function in the Engine API are:**
+
+| **Function Name** | **Description** |
+|--- |--- |
+| **WaitForVar** | Takes a variable reference as input and waits until that variable has been computed before returning |
+
+### **Other API functions:**
+
+| **Function Name** | **Description** |
+|--- |--- |
+| **ResourceParallelRandomSetSeed** | sets the random number generator seed |
+
+### **Operators we intended to call in the code:**
+
+| **Operator Name** | **Description** |
+|--- |--- |
+| **sum** | sum a tensor along a particular axis |
+| **diag** | compute the diagonal of the tensor |
+
+
+
+## Closer look
+
+From the code, we can identify the major events in our test application
+
+1. Initialize our input data
+2. Creating a new MXNet ndarray using our existing data values
+3. Compute on our data
+ 1. produce the diagonal of the input data
+ 2. sum along the diagonal to compute the “trace” of the matrix
+4. Wait for computation to finish (only needed when profiling)
+
+In the following list, #1 uses regular numpy functions to initialize data. MXNet is not involved in this process. In #2, we create an MXNet ndarray and quite a few things happen under the hood. The screenshot below shows a zoomed in portion of the timeline.
+
+![dev_guide_profilling_3.png](/assets/img/dev_guide_profilling_3.png)
+Here, the four red arrows show the important events in this sequence.
+
+1. First, the `MXNDArrayCreateEx` is called to physically allocate space to store the data and other necessary attributes in the `ndarray` class.
+2. Then some support functions are called (`MXNDArrayGetShape,` `MXNDArrayGetDType`) while initialing the data structure.
+3. Finally the data is copied from the non-MXNet ndarray into the newly prepared MXNet ndarray by the `MXNDArraySyncCopyFromCPU` function.
+
+Next, #3 (in our code example) begins the computing process to produce our output data. The screenshot below shows this behavior.
+
+![dev_guide_profilling_4.png](/assets/img/dev_guide_profilling_4.png)
+Here you can see that the following sequence of events happen:
+
+1. `MXImperativeInvokeEx` is called the first time to launch the diagonal operator from #3 (in our code example).
+2. Soon after that the actual **`diag`** operator begins executing in another thread.
+3. While that is happening, our main thread moves on and calls `MXImperativeInvokeEx` again to launch the **`sum`** operator. Just like before, this returns without actually executing the operator and continues.
+4. Lastly, the `MXNDArrayWaitAll` is called as the main thread has progressed to #4 in our app. It will wait here while all the computation finishes.
+
+Next lets look at a view of the part of the timeline zoomed to the actual operator execution.
+
+![dev_guide_profilling_5.png](/assets/img/dev_guide_profilling_5.png)
+Here there are 3 main events happening:
+
+1. The **`diag`** operator is executing first.
+2. Then the `ResourceParallelRandomSetSeed` runs.
+3. And finally the `sum` operator executes (for a very short time as shown by the big red arrow).
+
+The `diag` operator running makes sense (although seems to take a little longer than we'd like). At the end, the sum operator runs (very quickly!). But the weird part in the middle is **`ResourceParallelRandomSetSeed`** running. This is part of the MXNet resource manager. The resource manager handles temporary space and random number generators needed by the operators. The **`sum`** operator requests temporary space in order to compute the sum, and therefore launches the resource manager (for the first time) here. As part of its startup sequence, the random number generator is initialized by setting the seed. So this is some initialization overhead. But let's try and run the app again, running the compute twice, and look at the 2nd run to try and remove this initialization from our profiling.
+
+Here is the modified code:
+
+```
+import mxnet as mx
+import numpy as np
+
+from mxnet import profiler
+
+profiler.set_config(profile_all=True, aggregate_stats=True, filename='trace_profile.json')
+profiler.set_state('run')
+
+################
+# first run
+sdata = np.linspace(1,9,9).reshape((3,3))
+
+sa = mx.nd.array(sdata)
+sb = mx.nd.diag(sa)
+sc = mx.nd.sum(sb,-1)
+
+mx.nd.waitall()
+################
+
+################
+# second run
+data = np.linspace(1,9,9).reshape((3,3))
+
+a = mx.nd.array(data)
+b = mx.nd.diag(a)
+c = mx.nd.sum(b,-1)
+
+mx.nd.waitall()
+################
+
+profiler.set_state('stop')
+
+print(profiler.dumps())
+profiler.dump()
+```
+
+Notice that we renamed the variables and made another copy after the `waital` call. This is so that MXNet doesn’t have to worry about re-using variables, and to segment the 2nd half after the first time initialization.
+
+Here is an overview of the *new* timeline:
+
+![dev_guide_profilling_6.png](/assets/img/dev_guide_profilling_6.png)
+The first red box is the first run, and the 2nd smaller one is the 2nd run. First off, we can see how much smaller the 2nd one is now without any of the initialization routines. Here is a zoomed in view of just the 2nd run.
+
+
+![dev_guide_profilling_7.png](/assets/img/dev_guide_profilling_7.png)
+We still have the same sequence of events at the beginning to initialize the MXNet ndarray (`MXNDArrayCreateEx`, `MXNDArrayGetShape`, `MXNDArrayGetDType`, `MXNDArraySyncCopyFromCPU`). Then the **`diag`** operator runs, followed by the **`sum`** operator, and finally the `waitall`. When you look at this, be careful about the assumptions that you make. In this version of the timeline, it appears that the operator executes after the `MXImperativeInvokeEx` runs, and seems to imply an inherent ordering. But realize that there is no dependency between the **`diag`** operator finishing and the next **`MXImperativeInvokeEx`** launching the **`sum`** operator. In this case, it just-so-happens that the **`diag`** operator finishes so quickly that it appears that way. But in reality the main thread is launching the operators and not waiting for them to finish. Lastly, keep in mind that in this case by the time we hit the **`MXNDArrayWaitAll`** everything is already done and we return immediately, but in other circumstances it may sit here waiting for everything to finish (like we saw earlier in the first run).
+
+
diff --git a/docs/static_site/src/pages/api/faq/security.md b/docs/static_site/src/pages/api/faq/security.md
index 54481460348e..9262f48b6590 100644
--- a/docs/static_site/src/pages/api/faq/security.md
+++ b/docs/static_site/src/pages/api/faq/security.md
@@ -32,6 +32,7 @@ Please note that the security mailing list should only be used for reporting und
Questions about:
+
* if a vulnerability applies to your particular application
* obtaining further information on a published vulnerability
* availability of patches and/or new releases
diff --git a/docs/static_site/src/pages/get_started/build_from_source.md b/docs/static_site/src/pages/get_started/build_from_source.md
index 1dfa95a82ade..19f2f1a313ca 100644
--- a/docs/static_site/src/pages/get_started/build_from_source.md
+++ b/docs/static_site/src/pages/get_started/build_from_source.md
@@ -25,307 +25,381 @@ permalink: /get_started/build_from_source
# Build MXNet from Source
-This document explains how to build MXNet from source code.
-
-**For Java/Scala/Clojure, please follow [this guide instead](scala_setup)**
-
-## Overview
-
-Building from source follows this general two-step flow of building the shared library, then installing your preferred language binding. Use the following links to jump to the different sections of this guide.
-
-1. Build the MXNet shared library, `libmxnet.so`.
- * [Clone the repository](#clone-the-mxnet-project)
- * [Prerequisites](#prerequisites)
- * [Math library selection](#math-library-selection)
- * [Install GPU software](#install-gpu-software)
- * [Install optional software](#install-optional-software)
- * [Adjust your build configuration](#build-configurations)
- * [Build MXNet](#build-mxnet)
- * [with NCCL](#build-mxnet-with-nccl) (optional)
- * [for C++](#build-mxnet-with-c++) (optional)
- * [Usage Examples](#usage-examples)
- * [systems with GPUs and Intel CPUs](#recommended-for-Systems-with-NVIDIA-GPUs-and-Intel-CPUs)
- * [GPUs with non-Intel CPUs](#recommended-for-Systems-with-Intel-CPUs)
- * [Intel CPUs](#recommended-for-Systems-with-Intel-CPUs)
- * [non-Intel CPUs](#recommended-for-Systems-with-non-Intel-CPUs)
-2. [Install the language API binding(s)](#installing-mxnet-language-bindings) you would like to use for MXNet.
-MXNet's newest and most popular API is Gluon. Gluon is built into the Python binding. If Python isn't your preference, you still have more options. MXNet supports several other language APIs:
- - [Python (includes Gluon)]({{'/api/python/docs/api/index.html'|relative_url}})
- - [C++]({{'/api/cpp'|relative_url}})
- - [Clojure]({{'/api/clojure'|relative_url}})
- - [Java]({{'/api/java'|relative_url}})
- - [Julia]({{'/api/julia'|relative_url}})
- - [Perl]({{'/api/perl'|relative_url}})
- - [R]({{'/api/r'|relative_url}})
- - [Scala]({{'/api/scala'|relative_url}})
+Building and installing MXNet from source is a three-step process. First, build
+the shared `libmxnet` which provides the MXNet backend, then install your
+preferred language binding and finally validate that MXNet was installed
+correctly by running a small example.
-
+1. [Obtaining the source](#obtaining-the-source-code)
+2. [Installing MXNet's recommended dependencies](#installing-mxnet's-recommended-dependencies)
+3. [Overview of optional dependencies and optional features](#overview-of-optional-dependencies-and-optional-features)
+4. [Building MXNet](#building-mxnet)
+5. [Install the language API binding(s)](#installing-mxnet-language-bindings) you would like to use for MXNet.
-## Build Instructions by Operating System
+MXNet's newest and most popular API is Gluon. Gluon is built into the Python
+binding. If Python isn't your preference, you still have more options. MXNet
+supports several other language bindings. Please see the [API Documentation
+page](/api) for an overview of all supported languages and their APIs.
-Detailed instructions are provided per operating system. Each of these guides also covers how to install the specific [Language Bindings](#installing-mxnet-language-bindings) you require.
-You may jump to those, but it is recommended that you continue reading to understand more general "build from source" options.
-* [Amazon Linux / CentOS / RHEL](centos_setup)
-* [macOS](osx_setup)
-* [Devices](index.html?&platform=devices&language=python&environ=pip&processor=cpu)
-* [Ubuntu](ubuntu_setup)
-* [Windows](windows_setup)
+## Obtaining the source code
+To obtain the source code of the latest Apache MXNet (incubating) release,
+please access the [Download page](/get_started/download) and download the
+`.tar.gz` source archive corresponding to the release you wish to build.
-
+Developers can also obtain the unreleased development code from the git
+repository via `git clone --recursive https://github.com/apache/incubator-mxnet mxnet`
+
+Building a MXNet 1.x release from source requires a C++11 compliant compiler.
+
+Building the development version of MXNet or any 2.x release from source
+requires a C++17 compliant compiler. The oldest compiler versions tested during
+MXNet 2 development are GCC 7, Clang 6 and MSVC 2019.
-## Clone the MXNet Project
+## Installing MXNet's recommended dependencies
+To install the build tools and recommended dependencies, please run the
+following commands respectively based on your Operating System. Please see the
+next section for further explanations on the set of required and optional
+dependencies of MXNet.
-1. Clone or fork the MXNet project.
+### Debian Linux derivatives (Debian, Ubuntu, ...)
```bash
git clone --recursive https://github.com/apache/incubator-mxnet mxnet
cd mxnet
+sudo apt-get update
+sudo apt-get install -y build-essential git ninja-build ccache libopenblas-dev libopencv-dev cmake
```
+### Red Hat Enterprise Linux derivatives (RHEL, CentOS, Fedora, ...)
+```bash
+sudo yum install epel-release centos-release-scl
+sudo yum install git make ninja-build automake autoconf libtool protobuf-compiler protobuf-devel \
+ atlas-devel openblas-devel lapack-devel opencv-devel openssl-devel zeromq-devel python3 \
+ devtoolset-7
+source /opt/rh/devtoolset-7/enable
+```
+Here `devtoolset-7` refers to the [Developer Toolset
+7](https://www.softwarecollections.org/en/scls/rhscl/devtoolset-7/) created by
+Red Hat for developers working on CentOS or Red Hat Enterprise Linux platform
+and providing the GNU Compiler Collection 7.
+
+### macOS
+```bash
+# Install OS X Developer Tools
+xcode-select --install
-## Prerequisites
+# Install Homebrew
+/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
+
+# Install dependencies
+brew install cmake ninja ccache opencv
+```
+
+Note: the compiler provided by Apple on macOS does not support OpenMP. To use
+OpenMP on macOS you need to install for example the Clang compiler via `brew`:
+
+```bash
+brew install llvm
+```
-The following sections will help you decide which specific prerequisites you need to install.
+### Windows
+You can use Chocolatey software management solution to install some dependencies
+on Windows.
-#### Math Library Selection
-It is useful to consider your math library selection prior to your other prerequisites.
+```bash
+choco install python git 7zip cmake ninja opencv
+```
+
+Currently OpenBLAS is not available from Chocolatey. You may download it from
+from [the OpenBLAS release page](https://github.com/xianyi/OpenBLAS/releases)
+and compile from source. Set the `OpenBLAS_HOME` environment variable to point
+to the OpenBLAS directory that contains the `include` and `lib` directories for
+example by typing `set OpenBLAS_HOME=C:\utils\OpenBLAS`.
+
+If you like to compile MXNet with Visual Studio compiler, please install at
+least [VS2019](https://www.visualstudio.com/downloads/).
+
+## Overview of optional dependencies and optional features
+
+### Math Library Selection
MXNet relies on the
[BLAS](https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms) (Basic
-Linear Algebra Subprograms) library for numerical computations.
-Those can be extended with [LAPACK (Linear Algebra Package)](https://github.com/Reference-LAPACK/lapack), an additional set of mathematical functions.
+Linear Algebra Subprograms) library for numerical computations. In addition to
+BLAS, some operators in MXNet rely on the [LAPACK (Linear Algebra
+Package)](https://github.com/Reference-LAPACK/lapack), an additional set of
+mathematical functions.
+
+Several BLAS and LAPACK implementations exist. Among them, MXNet is tested with:
-MXNet supports multiple mathematical backends for computations on the CPU:
* [Apple Accelerate](https://developer.apple.com/documentation/accelerate)
* [ATLAS](http://math-atlas.sourceforge.net/)
-* [MKL](https://software.intel.com/en-us/intel-mkl) (MKL, MKLML)
-* [MKL-DNN](https://github.com/intel/mkl-dnn)
+* [Intel MKL](https://software.intel.com/en-us/intel-mkl)
* [OpenBLAS](https://www.openblas.net/)
-The default order of choice for the libraries if found follows the path from the most
-(recommended) to less performant backends.
-The following lists show this order by library and `cmake` switch.
+Apple Accelerate and MKL are proprietary. ATLAS and OpenBLAS are Open Source. If
+you don't have any specific requirements, MXNet recommends OpenBLAS as it
+typically outperforms ATLAS, is portable across many platforms, provides a
+LAPACK implementation and has a permissive license.
-For desktop platforms (x86_64):
+### Optional GPU support
-1. MKL-DNN (submodule) | `USE_MKLDNN`
-2. MKL | `USE_MKL_IF_AVAILABLE`
-3. MKLML (downloaded) | `USE_MKLML`
-4. Apple Accelerate | `USE_APPLE_ACCELERATE_IF_AVAILABLE` | Mac only
-5. OpenBLAS | `BLAS` | Options: Atlas, Open, MKL, Apple
+MXNet optionally supports [NVDIA CUDA and
+cuDNN](https://developer.nvidia.com/cuda-downloads) for better performance on
+NVidia devices. MXNet releases in general are tested with the last two major
+CUDA versions available at the time of the release. For example, CUDA 9.2 and
+10.2.
-Note: If `USE_MKL_IF_AVAILABLE` is set to False then MKLML and MKL-DNN will be disabled as well for configuration
-backwards compatibility.
+To compile MXNet with CUDA support, define the `USE_CUDA` option. If you compile
+MXNet on a system with NVidia GPUs, the build system will automatically detect
+the CUDA Architecture. If you are compiling on a system without NVidia GPUs,
+please specify the `MXNET_CUDA_ARCH` option to select the CUDA Architecture and
+avoid a lengthy build targeting all common CUDA Architectures. Please see the
+MXNet build configuration instructions in the next step.
-For embedded platforms (all other and if cross compiled):
+MXNet also supports [NCCL](https://developer.nvidia.com/nccl) - NVIDIA's
+Collective Communications Library. NCCL is useful when using MXNet on multiple
+GPUs that require communication. Instructions for installing NCCL are found in
+the following [Build MXNet with NCCL](#build-mxnet-with-nccl) section.
-1. OpenBLAS | `BLAS` | Options: Atlas, Open, MKL, Apple
+To enable building MXNet with NCCL, install NCCL and define the `USE_NCCL`
+option in the MXNet build configuration in the next step.
-You can set the BLAS library explicitly by setting the BLAS variable to:
+After building with NCCL, you may optionally use the tests in
+`tests/python/gpu/test_nccl.py` to ensure NCCL is enabled correctly. Please
+first delete the line containing `skip(reason="Test requires NCCL library
+installed and enabled during build")` before running the test. In MXNet 2.x
+versions, the test can be run via `pytest --verbose
+tests/python/gpu/test_nccl.py`. In MXNet 1.x it is run via `python
+tests/python/gpu/test_nccl.py`.
-* Atlas
-* Open
-* MKL
-* Apple
+To get the best performance out of NCCL it is recommended to set environment
+variable `NCCL_LAUNCH_MODE=PARALLEL` when using NCCL version 2.1 or newer.
-See the [cmake/ChooseBLAS.cmake](https://github.com/apache/incubator-mxnet/blob/master/cmake/ChooseBlas.cmake) file for the options.
+### Optional OpenCV support
-[Intel's MKL (Math Kernel Library)](https://software.intel.com/en-us/mkl) is one of the most powerful math libraries
+MXNet's Image Loading and Augmentation features rely on
+[OpenCV](http://opencv.org/). Image Loading and Augmentation
-It has following flavors:
+## Building MXNet
-* MKL is a complete math library, containing all the functionality found in ATLAS, OpenBlas and LAPACK. It is free under
-community support licensing (https://software.intel.com/en-us/articles/free-mkl),
-but needs to be downloaded and installed manually.
+MXNet 1.x can be built either with a classic Makefile setup or with the `cmake`
+cross platform build system. Starting with MXNet 1.7, MXNet recommends using the
+`cmake` cross platform build tool.
-* MKLML is a subset of MKL. It contains a smaller number of functions to reduce the
-size of the download and reduce the number of dynamic libraries user needs.
+Note: The `cmake` build requires CMake 3.13 or higher. If you are running an
+older version of CMake, you will see an error message like `CMake 3.13 or higher
+is required. You are running version 3.10.2`. Please update CMake on your
+system. You can download and install latest CMake from https://cmake.org or via
+the Python package manager `pip` with `python3 -m pip install --user --upgrade
+"cmake>=3.13.2"`. After installing cmake with `pip3`, it is usually available at
+`~/.local/bin/cmake` or directly as `cmake`.
-
+Please see the [`cmake configuration
+files`](https://github.com/apache/incubator-mxnet/tree/v1.x/config) files for
+instructions on how to configure and build MXNet with cmake.
-* MKL-DNN is a separate open-source library, it can be used separately from MKL or MKLML. It is
-shipped as a subrepo with MXNet source code (see 3rdparty/mkldnn or the [MKL-DNN project](https://github.com/intel/mkl-dnn))
+Up to the MXNet 1.6 release, please follow the instructions in the
+[`make/config.mk`](https://github.com/apache/incubator-mxnet/blob/v1.x/make/config.mk)
+file on how to configure and compile MXNet. This method is supported on all 1.x
+releases.
-Since the full MKL library is almost always faster than any other BLAS library it's turned on by default,
-however it needs to be downloaded and installed manually before doing `cmake` configuration.
-Register and download on the [Intel performance libraries website](https://software.intel.com/en-us/performance-libraries).
-You can also install MKL through [YUM](https://software.intel.com/en-us/articles/installing-intel-free-libs-and-python-yum-repo)
-or [APT](https://software.intel.com/en-us/articles/installing-intel-free-libs-and-python-apt-repo) Repository.
+To enable the optional MXNet C++ package, please set the `USE_CPP_PACKAGE=1`
+option prior to compiling. See the [C++ guide](cpp_setup) for more information.
-Note: MKL is supported only for desktop builds and the framework itself supports the following
-hardware:
-* Intel® Xeon Phi™ processor
-* Intel® Xeon® processor
-* Intel® Core™ processor family
-* Intel Atom® processor
+## Installing MXNet Language Bindings
+After building MXNet's shared library, you can install other language bindings.
-If you have a different processor you can still try to use MKL, but performance results are
-unpredictable.
+**NOTE:** The C++ API binding must be built when you build MXNet from source. See [Build MXNet with C++]({{'/api/cpp.html'|relative_url}}).
+## Installing Language Packages for MXNet
-#### Install GPU Software
+After you have installed the MXNet core library. You may install MXNet interface
+packages for the programming language of your choice:
+- [Python](#install-mxnet-for-python)
+- [C++](#install-the-mxnet-package-for-c++)
+- [Clojure](#install-the-mxnet-package-for-clojure)
+- [Julia](#install-the-mxnet-package-for-julia)
+- [Perl](#install-the-mxnet-package-for-perl)
+- [R](#install-the-mxnet-package-for-r)
+- [Scala](#install-the-mxnet-package-for-scala)
+- [Java](#install-the-mxnet-package-for-java)
-If you want to run MXNet with GPUs, you must install [NVDIA CUDA and cuDNN](https://developer.nvidia.com/cuda-downloads).
+### Install MXNet for Python
-#### Install Optional Software
+To install the MXNet Python binding navigate to the root of the MXNet folder then run the following:
-These might be optional, but they're typically desirable as the extend or enhance MXNet's functionality.
+```bash
+python3 -m pip install --user -e ./python
+```
-* [OpenCV](http://opencv.org/) - Image Loading and Augmentation. Each operating system has different packages and build from source options for OpenCV. Refer to your OS's link in the [Build Instructions by Operating System](#build-instructions-by-operating-system) section for further instructions.
-* [NCCL](https://developer.nvidia.com/nccl) - NVIDIA's Collective Communications Library. Instructions for installing NCCL are found in the following [Build MXNet with NCCL](#build-mxnet-with-nccl) section.
+Note that the `-e` flag is optional. It is equivalent to `--editable` and means
+that if you edit the source files, these changes will be reflected in the
+package installed.
-More information on turning these features on or off are found in the following [build configurations](#build-configurations) section.
+You may optionally install ```graphviz``` library that is used for visualizing
+network graphs you build on MXNet. You may also install [Jupyter
+Notebook](http://jupyter.readthedocs.io/) which is used for running MXNet
+tutorials and examples.
+```bash
+python3 -m pip install --user graphviz==0.8.4 jupyter
+```
-
+Please also see the [MXNet Python API](/api/python) page.
-## Build Configurations
+### Install the MXNet Package for C++
-There is a configuration file for make,
-[`make/config.mk`](https://github.com/apache/incubator-mxnet/blob/master/make/config.mk), that contains all the compilation options. You can edit it and then run `make` or `cmake`. `cmake` is recommended for building MXNet (and is required to build with MKLDNN), however you may use `make` instead. For building with Java/Scala/Clojure, only `make` is supported.
+To enable C++ package, just add `USE_CPP_PACKAGE=1` as build option when
+building the MXNet shared library following the instructions from the previous
+section.
-**NOTE:** When certain set of build flags are set, MXNet archive increases to more than 4 GB. Since MXNet uses archive internally archive runs into a bug ("File Truncated": [bugreport](https://sourceware.org/bugzilla/show_bug.cgi?id=14625)) for archives greater than 4 GB. Please use ar version 2.27 or greater to overcome this bug. Please see https://github.com/apache/incubator-mxnet/issues/15084 for more details.
+You can find C++ code examples in the `cpp-package/example` folder of the MXNet
+project. The folder contains a README explaining how to build the examples. The
+`predict-cpp` explains Image Classification using MXNet's C Predict API.
-
+Please also see the [MXNet C++ API](/api/cpp) page.
-## Build MXNet
-
-### Build MXNet with NCCL
-- Download and install the latest NCCL library from NVIDIA.
-- Note the directory path in which NCCL libraries and header files are installed.
-- Ensure that the installation directory contains ```lib``` and ```include``` folders.
-- Ensure that the prerequisites for using NCCL such as Cuda libraries are met.
-- Append the ```config.mk``` file with following, in addition to the CUDA related options.
-- USE_NCCL=1
-- USE_NCCL_PATH=path-to-nccl-installation-folder
-
-``` bash
-echo "USE_NCCL=1" >> make/config.mk
-echo "USE_NCCL_PATH=path-to-nccl-installation-folder" >> make/config.mk
-cp make/config.mk .
-```
-- Run make command
-``` bash
-make -j"$(nproc)"
-```
-#### Validating NCCL
-- Follow the steps to install MXNet Python binding.
-- Comment the following line in ```test_nccl.py``` file at ```incubator-mxnet/tests/python/gpu/test_nccl.py```
-``` bash
-@unittest.skip("Test requires NCCL library installed and enabled during build")
-```
-- Run test_nccl.py script as follows. The test should complete. It does not produce any output.
-``` bash
-nosetests --verbose tests/python/gpu/test_nccl.py
-```
+### Install the MXNet Package for Clojure
-**Recommendation to get the best performance out of NCCL:**
-It is recommended to set environment variable NCCL_LAUNCH_MODE to PARALLEL when using NCCL version 2.1 or newer.
+Refer to the [Clojure setup
+guide](https://github.com/apache/incubator-mxnet/tree/master/contrib/clojure-package).
-
+Please also see the [MXNet Clojure API](/api/clojure) page.
-### Build MXNet with C++
+### Install the MXNet Package for Julia
-* To enable C++ package, just add `USE_CPP_PACKAGE=1` when you run `make` or `cmake` (see examples).
+Make sure to install at least Julia 1.0.3.
-
+To use the Julia binding you need to set the `MXNET_HOME` and `LD_LIBRARY_PATH`
+environment variables. For example,
-### Usage Examples
+```bash
+export MXNET_HOME=$HOME/incubator-mxnet
+export LD_LIBRARY_PATH=$HOME/incubator-mxnet/build:$LD_LIBRARY_PATH
+```
-For example, you can specify using all cores on Linux as follows:
+Then install MXNet with Julia:
```bash
-mkdir build && cd build
-cmake -DCMAKE_BUILD_TYPE=Release -GNinja ..
-ninja -v
+julia --color=yes --project=./ -e \
+ 'using Pkg; \
+ Pkg.develop(PackageSpec(name="MXNet", path = joinpath(ENV["MXNET_HOME"], "julia")))'
```
+Please also see the [MXNet Julia API](/api/julia) page.
+
-#### Recommended for Systems with NVIDIA GPUs and Intel CPUs
-* Build MXNet with `cmake` and install with MKL DNN, GPU, and OpenCV support:
+### Install the MXNet Package for Perl
+#### Installing perl package dependencies on Debian Linux derivatives (Debian, Ubuntu, ...)
+
+```
+sudo apt-get install libmouse-perl pdl cpanminus swig libgraphviz-perl
+cpanm -q -L "${HOME}/perl5" Function::Parameters Hash::Ordered PDL::CCS
+```
+
+#### Installing perl package dependencies on macOS
```bash
-mkdir build && cd build
-cmake -DUSE_CUDA=1 -DUSE_CUDA_PATH=/usr/local/cuda -DUSE_CUDNN=1 -DUSE_MKLDNN=1 -DCMAKE_BUILD_TYPE=Release -GNinja ..
-ninja -v
+brew install swig
+sudo sh -c 'curl -L https://cpanmin.us | perl - App::cpanminus'
+sudo cpanm -q -n PDL Mouse Function::Parameters Hash::Ordered PDL::CCS
```
-#### Recommended for Systems with NVIDIA GPUs
-* Build with both OpenBLAS, GPU, and OpenCV support:
+#### Install the MXNet Package for Perl
+After you build the shared library, run the following command from the MXNet
+source root directory to build the MXNet Perl package:
```bash
-mkdir build && cd build
-cmake -DBLAS=open -DUSE_CUDA=1 -DUSE_CUDA_PATH=/usr/local/cuda -DUSE_CUDNN=1 -DCMAKE_BUILD_TYPE=Release -GNinja ..
-ninja -v
+MXNET_HOME=${PWD}
+export LD_LIBRARY_PATH=${MXNET_HOME}/lib
+export PERL5LIB=${HOME}/perl5/lib/perl5
+
+cd ${MXNET_HOME}/perl-package/AI-MXNetCAPI/
+perl Makefile.PL INSTALL_BASE=${HOME}/perl5
+make install
+
+cd ${MXNET_HOME}/perl-package/AI-NNVMCAPI/
+perl Makefile.PL INSTALL_BASE=${HOME}/perl5
+make install
+
+cd ${MXNET_HOME}/perl-package/AI-MXNet/
+perl Makefile.PL INSTALL_BASE=${HOME}/perl5
+make install
```
-#### Recommended for Systems with Intel CPUs
-* Build MXNet with `cmake` and install with MKL DNN, and OpenCV support:
+Please also see the [MXNet Perl API](/api/perl) page.
+
+### Install the MXNet Package for R
+
+To install R and the devtools, run
```bash
-mkdir build && cd build
-cmake -DUSE_CUDA=0 -DUSE_MKLDNN=1 -DCMAKE_BUILD_TYPE=Release -GNinja ..
-ninja -v
+sudo apt-get update
+sudo apt-get install -y r-base-core r-cran-devtools libcairo2-dev libxml2-dev
```
-#### Recommended for Systems with non-Intel CPUs
-* Build MXNet with `cmake` and install with OpenBLAS and OpenCV support:
+`libxml2-dev` is required for the `roxygen2` dependency and `libcairo2-dev` is
+required for the suggested `imager` dependency.
+
+To generate documentation, it is also required to install `roxygen2`.
```bash
-mkdir build && cd build
-cmake -DUSE_CUDA=0 -DBLAS=open -DCMAKE_BUILD_TYPE=Release -GNinja ..
-ninja -v
+R
+> install.packages("roxygen2")
+> Would you like to use a personal library instead? (y/n) y
+> Would you like to create a personal library ... to install packages into? (y/n) y
```
-#### Other Examples
+Note: To successfully complete the next step, you need a personal R library. If
+you were able to run `install.packages("roxygen2")` above, you either had
+already, or you have successfully created a personal library just now.
-* Build without using OpenCV:
+To build and install the MXNet-R bindings, run:
```bash
-mkdir build && cd build
-cmake -DUSE_OPENCV=0 -DCMAKE_BUILD_TYPE=Release -GNinja ..
-ninja -v
+make -f R-package/Makefile rpkg
```
-* Build on **macOS** with the default BLAS library (Apple Accelerate) and Clang installed with `xcode` (OPENMP is disabled because it is not supported by the Apple version of Clang):
+Please also see the [MXNet R API](/api/r) page.
+
+### Install the MXNet Package for Scala
+
+After building the MXNet shared library, you may simply run the following from
+the MXNet scala-package folder:
```bash
-mkdir build && cd build
-cmake -DBLAS=apple -DUSE_OPENCV=0 -DUSE_OPENMP=0 -DCMAKE_BUILD_TYPE=Release -GNinja ..
-ninja -v
+mvn install
```
-* To use OpenMP on **macOS** you need to install the Clang compiler, `llvm` (the one provided by Apple does not support OpenMP):
+This will install both the Java Inference API and the required MXNet-Scala package.
+
+Please also see the [MXNet Scala API](/api/scala) page.
+
+### Install the MXNet Package for Java
+
+After building the MXNet shared library, you may simply run the following from
+the MXNet scala-package folder:
```bash
-brew install llvm
-mkdir build && cd build
-cmake -DBLAS=apple -DUSE_OPENMP=1 -DCMAKE_BUILD_TYPE=Release -GNinja ..
-ninja -v
+mvn install
```
-
+This will install both the Java Inference API and the required MXNet-Scala package.
-## Installing MXNet Language Bindings
-After building MXNet's shared library, you can install other language bindings.
+Please also see the [MXNet Java API](/api/java) page.
-**NOTE:** The C++ API binding must be built when you build MXNet from source. See [Build MXNet with C++]({{'/api/cpp'|relative_url}}).
+## Contributions
-The following table provides links to each language binding by operating system:
+You are more than welcome to contribute easy installation scripts for other operating systems and programming languages.
+See the [community contributions page]({{'/community/contribute'|relative_url}}) for further information.
-| Language | [Ubuntu](ubuntu_setup) | [macOS](osx_setup) | [Windows](windows_setup) |
-| --- | ---- | --- | ------- |
-| Python | [Ubuntu guide](ubuntu_setup.html#install-mxnet-for-python) | [OSX guide](osx_setup) | [Windows guide](windows_setup.html#install-mxnet-for-python) |
-| C++ | [C++ guide](cpp_setup) | [C++ guide](cpp_setup) | [C++ guide](cpp_setup) |
-| Clojure | [Clojure guide](https://github.com/apache/incubator-mxnet/tree/master/contrib/clojure-package) | [Clojure guide](https://github.com/apache/incubator-mxnet/tree/master/contrib/clojure-package) | n/a |
-| Julia | [Ubuntu guide](ubuntu_setup.html#install-the-mxnet-package-for-julia) | [OSX guide](osx_setup.html#install-the-mxnet-package-for-julia) | [Windows guide](windows_setup.html#install-the-mxnet-package-for-julia) |
-| Perl | [Ubuntu guide](ubuntu_setup.html#install-the-mxnet-package-for-perl) | [OSX guide](osx_setup.html#install-the-mxnet-package-for-perl) | n/a |
-| R | [Ubuntu guide](ubuntu_setup.html#install-the-mxnet-package-for-r) | [OSX guide](osx_setup.html#install-the-mxnet-package-for-r) | [Windows guide](windows_setup.html#install-the-mxnet-package-for-r) |
-| Scala | [Scala guide](scala_setup.html) | [Scala guide](scala_setup.html) | n/a |
-| Java | [Java guide](java_setup.html) | [Java Guide](java_setup.html) | n/a |
+## Next Steps
+* [Tutorials]({{'/api'|relative_url}})
+* [How To]({{'/api/faq/add_op_in_backend'|relative_url}})
+* [Architecture]({{'/api/architecture/overview'|relative_url}})
diff --git a/docs/static_site/src/pages/get_started/c_plus_plus.md b/docs/static_site/src/pages/get_started/c_plus_plus.md
deleted file mode 100644
index 9f4800cd99bf..000000000000
--- a/docs/static_site/src/pages/get_started/c_plus_plus.md
+++ /dev/null
@@ -1,55 +0,0 @@
----
-layout: page
-title: C++ Setup
-action: Get Started
-action_url: /get_started
-permalink: /get_started/cpp_setup
----
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-## Build the C++ package
-The C++ package has the same prerequisites as the MXNet library.
-
-To enable C++ package, just add `USE_CPP_PACKAGE=1` in the [build from source](build_from_source) options when building the MXNet shared library.
-
-For example to build MXNet with GPU support and the C++ package, OpenCV, and OpenBLAS, from the project root you would run:
-
-```bash
-cmake -DUSE_CUDA=1 -DUSE_CUDA_PATH=/usr/local/cuda -DUSE_CUDNN=1 -DUSE_MKLDNN=1 -DUSE_CPP_PACKAGE=1 -DCMAKE_BUILD_TYPE=Release -GNinja ..
-ninja -v
-```
-
-You may also want to add the MXNet shared library to your `LD_LIBRARY_PATH`:
-
-```bash
-export LD_LIBRARY_PATH=~/incubator-mxnet/lib
-```
-
-Setting the `LD_LIBRARY_PATH` is required to run the examples mentioned in the following section.
-
-## C++ Example Code
-You can find C++ code examples in the `cpp-package/example` folder of the MXNet project. Refer to the [cpp-package's README](https://github.com/apache/incubator-mxnet/tree/master/cpp-package) for instructions on building the examples.
-
-## Tutorials
-
-* [MXNet C++ API Basics]({{'/api/cpp/docs/tutorials/basics'|relative_url}})
-
-## Related Topics
-
-* [Image Classification using MXNet's C Predict API](https://github.com/apache/incubator-mxnet/tree/master/example/image-classification/predict-cpp)
diff --git a/docs/static_site/src/pages/get_started/centos_setup.md b/docs/static_site/src/pages/get_started/centos_setup.md
deleted file mode 100644
index 315787d21f28..000000000000
--- a/docs/static_site/src/pages/get_started/centos_setup.md
+++ /dev/null
@@ -1,115 +0,0 @@
----
-layout: page
-title: CentOS setup
-action: Get Started
-action_url: /get_started
-permalink: /get_started/centos_setup
----
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-# Installing MXNet on CentOS and other non-Ubuntu Linux systems
-
-Step 1. Install build tools and git on `CentOS >= 7` and `Fedora >= 19`:
-
-```bash
-sudo yum groupinstall -y "Development Tools" && sudo yum install -y git
-```
-
-Step 2. Install Atlas:
-
-```bash
-sudo yum install atlas-devel
-```
-
-Installing both `git` and `cmake` or `make` by following instructions on the websites is
-straightforward. Here we provide the instructions to build `gcc-4.8` from source codes.
-
-Step 3. Install the 32-bit `libc` with one of the following system-specific commands:
-
-```bash
-sudo apt-get install libc6-dev-i386 # In Ubuntu
-sudo yum install glibc-devel.i686 # In RHEL (Red Hat Linux)
-sudo yum install glibc-devel.i386 # In CentOS 5.8
-sudo yum install glibc-devel.i686 # In CentOS 6/7
-```
-
-Step 4. Download and extract the `gcc` source code with the prerequisites:
-
-```bash
-wget http://mirrors.concertpass.com/gcc/releases/gcc-4.8.5/gcc-4.8.5.tar.gz
-tar -zxf gcc-4.8.5.tar.gz
-cd gcc-4.8.5
-./contrib/download_prerequisites
-```
-
-Step 5. Build `gcc` by using 10 threads and then install to `/usr/local`
-
-```bash
-mkdir release && cd release
-../configure --prefix=/usr/local --enable-languages=c,c++
-make -j10
-sudo make install
-```
-
-Step 6. Add the lib path to your configure file such as `~/.bashrc`:
-
-```bash
-export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/lib64
-```
-
-Step 7. Build [OpenBLAS from source](https://github.com/xianyi/OpenBLAS#installation-from-source).
-
-Step 8. Build OpenCV
-
-To build OpenCV from source code, you need the [cmake](https://cmake.org) library.
-
-* If you don't have cmake or if your version of cmake is earlier than 3.6.1, run the following commands to install a newer version of cmake:
-
-```bash
-wget https://cmake.org/files/v3.6/cmake-3.6.1-Linux-x86_64.tar.gz
-tar -zxvf cmake-3.6.1-Linux-x86_64.tar.gz
-alias cmake="cmake-3.6.1-Linux-x86_64/bin/cmake"
-```
-
-* To download and extract the OpenCV source code, run the following commands:
-
-```bash
-wget https://codeload.github.com/opencv/opencv/zip/2.4.13
-unzip 2.4.13
-cd opencv-2.4.13
-mkdir release
-cd release/
-```
-
-* Build OpenCV. The following commands build OpenCV with 10 threads. We
-disabled GPU support, which might significantly slow down an MXNet program
-running on a GPU processor. It also disables 1394 which might generate a
-warning. Then install it on `/usr/local`.
-
-```bash
-cmake -D BUILD_opencv_gpu=OFF -D WITH_CUDA=OFF -D WITH_1394=OFF -D CMAKE_BUILD_TYPE=RELEASE -D CMAKE_INSTALL_PREFIX=/usr/local ..
-make -j10
-sudo make install
-```
-
-* Add the lib path to your configuration such as `~/.bashrc`.
-
-```bash
-export PKG_CONFIG_PATH=$PKG_CONFIG_PATH:/usr/local/lib/pkgconfig/
-```
diff --git a/docs/static_site/src/pages/get_started/download.md b/docs/static_site/src/pages/get_started/download.md
index cf3f8c1ddf48..7444a2e9d8e2 100644
--- a/docs/static_site/src/pages/get_started/download.md
+++ b/docs/static_site/src/pages/get_started/download.md
@@ -25,16 +25,24 @@ permalink: /get_started/download
# Source Download
-These source archives are generated from tagged releases. Updates and patches will not have been applied. For any updates refer to the corresponding branches in the [GitHub repository](https://github.com/apache/incubator-mxnet). Choose your flavor of download from the following links:
+The source archives listed on this page are official MXNet releases following
+the [Apache Software Foundation Release
+Policy](http://www.apache.org/legal/release-policy.html).
+
+If you would like to actively participate in the MXNet development, you are
+encouraged to contribute to our development version on
+[GitHub](https://github.com/apache/incubator-mxnet).
| Version | Source | PGP | SHA |
|---------|-------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------|
-| 1.5.1 | [Download](https://www.apache.org/dyn/closer.cgi/incubator/mxnet/1.5.1/apache-mxnet-src-1.5.1-incubating.tar.gz) | [Download](https://apache.org/dist/incubator/mxnet/1.5.1/apache-mxnet-src-1.5.1-incubating.tar.gz.asc) | [Download](https://apache.org/dist/incubator/mxnet/1.5.1/apache-mxnet-src-1.5.1-incubating.tar.gz.sha512) |
-| 1.5.0 | [Download](https://www.apache.org/dyn/closer.cgi/incubator/mxnet/1.5.0/apache-mxnet-src-1.5.0-incubating.tar.gz) | [Download](https://apache.org/dist/incubator/mxnet/1.5.0/apache-mxnet-src-1.5.0-incubating.tar.gz.asc) | [Download](https://apache.org/dist/incubator/mxnet/1.5.0/apache-mxnet-src-1.5.0-incubating.tar.gz.sha512) |
-| 1.4.1 | [Download](https://www.apache.org/dyn/closer.cgi/incubator/mxnet/1.4.1/apache-mxnet-src-1.4.1-incubating.tar.gz) | [Download](https://apache.org/dist/incubator/mxnet/1.4.1/apache-mxnet-src-1.4.1-incubating.tar.gz.asc) | [Download](https://apache.org/dist/incubator/mxnet/1.4.1/apache-mxnet-src-1.4.1-incubating.tar.gz.sha512) |
-| 1.4.0 | [Download](https://www.apache.org/dyn/closer.cgi/incubator/mxnet/1.4.0/apache-mxnet-src-1.4.0-incubating.tar.gz) | [Download](https://apache.org/dist/incubator/mxnet/1.4.0/apache-mxnet-src-1.4.0-incubating.tar.gz.asc) | [Download](https://apache.org/dist/incubator/mxnet/1.4.0/apache-mxnet-src-1.4.0-incubating.tar.gz.sha512) |
-| 1.3.1 | [Download](https://www.apache.org/dyn/closer.cgi/incubator/mxnet/1.3.1/apache-mxnet-src-1.3.1-incubating.tar.gz) | [Download](https://apache.org/dist/incubator/mxnet/1.3.1/apache-mxnet-src-1.3.1-incubating.tar.gz.asc) | [Download](https://apache.org/dist/incubator/mxnet/1.3.1/apache-mxnet-src-1.3.1-incubating.tar.gz.sha512) |
-| 1.3.0 | [Download](https://www.apache.org/dyn/closer.cgi/incubator/mxnet/1.3.0/apache-mxnet-src-1.3.0-incubating.tar.gz) | [Download](https://apache.org/dist/incubator/mxnet/1.3.0/apache-mxnet-src-1.3.0-incubating.tar.gz.asc) | [Download](https://apache.org/dist/incubator/mxnet/1.3.0/apache-mxnet-src-1.3.0-incubating.tar.gz.sha512) |
+| 1.7.0 | [Download](http://www.apache.org/dyn/closer.lua?filename=incubator/mxnet/1.7.0/apache-mxnet-src-1.7.0-incubating.tar.gz&action=download) | [Download](https://downloads.apache.org/incubator/mxnet/1.7.0/apache-mxnet-src-1.7.0-incubating.tar.gz.asc) | [Download](https://downloads.apache.org/incubator/mxnet/1.7.0/apache-mxnet-src-1.7.0-incubating.tar.gz.sha512) |
+| 1.6.0 | [Download](https://archive.apache.org/dist/incubator/mxnet/1.6.0/apache-mxnet-src-1.6.0-incubating.tar.gz) | [Download](https://archive.apache.org/dist/incubator/mxnet/1.6.0/apache-mxnet-src-1.6.0-incubating.tar.gz.asc) | [Download](https://archive.apache.org/dist/incubator/mxnet/1.6.0/apache-mxnet-src-1.6.0-incubating.tar.gz.sha512) |
+| 1.5.1 | [Download](https://archive.apache.org/dist/incubator/mxnet/1.5.1/apache-mxnet-src-1.5.1-incubating.tar.gz) | [Download](https://archive.apache.org/dist/incubator/mxnet/1.5.1/apache-mxnet-src-1.5.1-incubating.tar.gz.asc) | [Download](https://archive.apache.org/dist/incubator/mxnet/1.5.1/apache-mxnet-src-1.5.1-incubating.tar.gz.sha512) |
+| 1.5.0 | [Download](https://archive.apache.org/dist/incubator/mxnet/1.5.0/apache-mxnet-src-1.5.0-incubating.tar.gz) | [Download](https://archive.apache.org/dist/incubator/mxnet/1.5.0/apache-mxnet-src-1.5.0-incubating.tar.gz.asc) | [Download](https://archive.apache.org/dist/incubator/mxnet/1.5.0/apache-mxnet-src-1.5.0-incubating.tar.gz.sha512) |
+| 1.4.1 | [Download](https://archive.apache.org/dist/incubator/mxnet/1.4.1/apache-mxnet-src-1.4.1-incubating.tar.gz) | [Download](https://archive.apache.org/dist/incubator/mxnet/1.4.1/apache-mxnet-src-1.4.1-incubating.tar.gz.asc) | [Download](https://archive.apache.org/dist/incubator/mxnet/1.4.1/apache-mxnet-src-1.4.1-incubating.tar.gz.sha512) |
+| 1.4.0 | [Download](https://archive.apache.org/dist/incubator/mxnet/1.4.0/apache-mxnet-src-1.4.0-incubating.tar.gz) | [Download](https://archive.apache.org/dist/incubator/mxnet/1.4.0/apache-mxnet-src-1.4.0-incubating.tar.gz.asc) | [Download](https://archive.apache.org/dist/incubator/mxnet/1.4.0/apache-mxnet-src-1.4.0-incubating.tar.gz.sha512) |
+| 1.3.1 | [Download](https://archive.apache.org/dist/incubator/mxnet/1.3.1/apache-mxnet-src-1.3.1-incubating.tar.gz) | [Download](https://archive.apache.org/dist/incubator/mxnet/1.3.1/apache-mxnet-src-1.3.1-incubating.tar.gz.asc) | [Download](https://archive.apache.org/dist/incubator/mxnet/1.3.1/apache-mxnet-src-1.3.1-incubating.tar.gz.sha512) |
+| 1.3.0 | [Download](https://archive.apache.org/dist/incubator/mxnet/1.3.0/apache-mxnet-src-1.3.0-incubating.tar.gz) | [Download](https://archive.apache.org/dist/incubator/mxnet/1.3.0/apache-mxnet-src-1.3.0-incubating.tar.gz.asc) | [Download](https://archive.apache.org/dist/incubator/mxnet/1.3.0/apache-mxnet-src-1.3.0-incubating.tar.gz.sha512) |
| 1.2.1 | [Download](https://archive.apache.org/dist/incubator/mxnet/1.2.1/apache-mxnet-src-1.2.1-incubating.tar.gz) | [Download](https://archive.apache.org/dist/incubator/mxnet/1.2.1/apache-mxnet-src-1.2.1-incubating.tar.gz.asc) | [Download](https://archive.apache.org/dist/incubator/mxnet/1.2.1/apache-mxnet-src-1.2.1-incubating.tar.gz.sha512) |
| 1.2.0 | [Download](https://archive.apache.org/dist/incubator/mxnet/1.2.0/apache-mxnet-src-1.2.0-incubating.tar.gz) | [Download](https://archive.apache.org/dist/incubator/mxnet/1.2.0/apache-mxnet-src-1.2.0-incubating.tar.gz.asc) | [Download](https://archive.apache.org/dist/incubator/mxnet/1.2.0/apache-mxnet-src-1.2.0-incubating.tar.gz.sha512) |
| 1.1.0 | [Download](https://archive.apache.org/dist/incubator/mxnet/1.1.0/apache-mxnet-src-1.1.0-incubating.tar.gz) | [Download](https://archive.apache.org/dist/incubator/mxnet/1.1.0/apache-mxnet-src-1.1.0-incubating.tar.gz.asc) | [Download](https://archive.apache.org/dist/incubator/mxnet/1.1.0/apache-mxnet-src-1.1.0-incubating.tar.gz.sha512) |
diff --git a/docs/static_site/src/pages/get_started/index.html b/docs/static_site/src/pages/get_started/index.html
index 02e7cf1b8641..f83396e28e10 100644
--- a/docs/static_site/src/pages/get_started/index.html
+++ b/docs/static_site/src/pages/get_started/index.html
@@ -22,12 +22,18 @@
-{% include /get_started/get_started.html %}
-
-
Download from source
-
The signed source code for Apache MXNet (incubating) is available for download here
+
Build and install Apache MXNet (incubating) from source
+
+ To build and install MXNet from the official Apache Software Foundation
+ signed source code please follow our Building From Source guide.
+
+
+{% include /get_started/get_started.html %}
diff --git a/docs/static_site/src/pages/get_started/validate_mxnet.md b/docs/static_site/src/pages/get_started/validate_mxnet.md
index d613c3bf0508..392682acc6e7 100644
--- a/docs/static_site/src/pages/get_started/validate_mxnet.md
+++ b/docs/static_site/src/pages/get_started/validate_mxnet.md
@@ -73,75 +73,6 @@ array([[ 3., 3., 3.],
```
-## Verify GPU Training
-
-From the MXNet root directory run: `python example/image-classification/train_mnist.py --network lenet --gpus 0` to test GPU training.
-
-
-## Virtualenv
-
-Activate the virtualenv environment created for *MXNet*.
-
-```bash
-$ source ~/mxnet/bin/activate
-```
-
-After activating the environment, you should see the prompt as below.
-
-```bash
-(mxnet)$
-```
-
-Start the python terminal.
-
-```bash
-$ python
-```
-
-Run the previous Python example.
-
-
-## Docker with CPU
-
-Launch a Docker container with `mxnet/python` image and run example *MXNet* python program on the terminal.
-
-```bash
-$ docker run -it mxnet/python bash # Use sudo if you skip Step 2 in the installation instruction
-
-# Start a python terminal
-root@4919c4f58cac:/# python
-```
-
-Run the previous Python example.
-
-
-## Docker with GPU
-
-Launch a NVIDIA Docker container with `mxnet/python:gpu` image and run example *MXNet* python program on the terminal.
-
-```bash
-$ nvidia-docker run -it mxnet/python:gpu bash # Use sudo if you skip Step 2 in the installation instruction
-
-# Start a python terminal
-root@4919c4f58cac:/# python
-```
-
-Run the previous Python example and run the previous GPU examples.
-
-
-## Cloud
-
-Login to the cloud instance you launched, with pre-installed *MXNet*, following the guide by corresponding cloud provider.
-
-Start the python terminal.
-
-```bash
-$ python
-```
-
-Run the previous Python example, and for GPU instances run the previous GPU example.
-
-
## Alternative Language Bindings
### C++