Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pytorch Multi step LR scheduler & Optimizer. #2973

Closed
wants to merge 3 commits into from

Conversation

qiuxin2012
Copy link
Contributor

No description provided.

@qiuxin2012 qiuxin2012 changed the title Pytorch Multi step LR scheduler. Pytorch Multi step LR scheduler & Optimizer. Oct 21, 2020
dding3 pushed a commit that referenced this pull request Jul 23, 2021
dding3 pushed a commit to dding3/analytics-zoo that referenced this pull request Jul 23, 2021
dding3 pushed a commit to dding3/analytics-zoo that referenced this pull request Jul 23, 2021
dding3 pushed a commit to dding3/analytics-zoo that referenced this pull request Aug 7, 2021
dding3 pushed a commit to dding3/analytics-zoo that referenced this pull request Aug 9, 2021
dding3 pushed a commit to dding3/analytics-zoo that referenced this pull request Aug 9, 2021
dding3 pushed a commit to dding3/analytics-zoo that referenced this pull request Aug 10, 2021
dding3 pushed a commit to dding3/analytics-zoo that referenced this pull request Aug 10, 2021
dding3 pushed a commit to dding3/analytics-zoo that referenced this pull request Aug 11, 2021
dding3 pushed a commit to dding3/analytics-zoo that referenced this pull request Aug 11, 2021
dding3 pushed a commit to dding3/analytics-zoo that referenced this pull request Aug 11, 2021
dding3 added a commit that referenced this pull request Aug 12, 2021

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
* Modify the thread pool to adopt mkldnn models (#2608)

The `Engine.default` will support single thread including `LocalOptimizer` and `DistriOptimizer`. For supporting single thread version of `invokeWait2` method in `ThreadPool`, it will set the threadpool to current thread.

1. For dnn model, it will use the affinity to bind omp thread. And for performance issue, the default thread must use current main thread.
2. For MTLabeldBGRImgToBatch will use another new threads pool which is called io. So it will not be blocked when the default thread pool is single thread.
3. For FileWriter, it will not use default, otherwise the whole app will stuck at creating summary.

* feature: add shutdown for optimizer which will release the native resources (#2609)

Release native resources at the end of training.

It will call `release` of model for all models cloned in optimizer at the end of training.

1) `LocalOptimizer` is very simple because all models cloned is local.
2) `DistriOptimizer` is a little complicated. We should do release before `models.unpersist`, otherwise it
     will serialize and transfer the model again. And `ModelBroadcast` will clone new model when do
     value, so we should release them also.

* fix: dnn currently not support windows and will be supported in future (#2613)

* [Bug Fix] Handle grey image correctly if model require a 3 channel tensor input (#2616)

* handle grey image correct if model require a 3 channel tensor input

* move the test image files so not break old tests

* fix style error

* fix: set the encoding of input and output files is UTF-8 (#2615)

* [Feature] Allow user to customized how model is broadcast in distributed training (#2618)

* allow user to override ModelBroadcast

* update configuration doc

* meet code review

* NLL unlabeled data fix (#2620)

* fix: the inference performance regression of mkldnn (#2622)

We should copy weights when updateOutput at training. The weights are loaded before and will not be changed when do inference.

* fix: style check errors (#2625)

* [new feature]Hit Ratio and NDCG (#2623)

* [new feature]Parallel Adam (#2626)

* feat: training ResNet50 w/ dnn backend on Spark. (#2624)

* feat: resnet50 training with distributed mode
* fix: unknown segmentfault
* fix: clone of dnn tensor
* fix: delete unused codes
* fix: bn initialization
* fix: performance regression
* fix: convergence regression
* fix: delete the release in ModelBroadcast
* fix: to pass all uni tests and delete segment fault.

* feat: add dnn vgg model. (#2627)

* feat: add dnn vgg model.

* fix: rename the ResNet50Perf to Perf

* Fix issue 2592 (#2629)

* fix issue Predictor 2592

* adjust the algorithm

* fix whitespace style check

* fix code review issue

* fix loop efficiency

* feat: add example for lenet w/ dnn (#2635)

* fix join table will throw exception during backward if batchsize is changed (#2638)

* fix join table backward

* change to resize as

* change Reshape to InferReShape in reshapeLoadTF (#2637)

* change Reshape to InferReShape in reshapeLoadTF

* fix docs

* fix failed code

* fix failed code

* fixes after code review

* fixes after code review

* fix

* fix infer

* add unit tests

* feature: vgg-16 with mkldnn backend (#2631)

* feat: vgg-16 with mkldnn backend
* fix: tests errors
* fix: case class too much arguments
* fix: vgg_16 blas model supports
* fix: protobuf of serializer
* fix: sgd poly test case error
* fix: consitent of poly impl
* fix: rename the version2 of Xavier to varianceNormAverage

* Refine the synchronizer to support prioirty and also make it event driven (#2634)

* refinement

* refinement

* refinement

* refinement per comments

* refinemnt per review

* perf: need not narrow the gradients and zero gradients for dnn backend (#2632)

* perf: need not narrow the gradients and zero gradients for dnn backend
* fix: empty gradient zero for dnn backend
* fix: delete affine

* fix break unit test (#2642)

* New parallel optimizer (#2643)

* add new parallel optimizer

* change infor back to debug for metrics log

* refinement per comments

* refinement per comments on single model optimization

* refinement for sharing common methods

* fix style

* refinement to reuse duplicate code

* Fix transfer learning (#2645)

* fix transfer learning
* add ParseSingleExample, DecodeBmp tf loader
* add corresponding unit tests

* remove potential performance downgrader (#2651) (#2652)

* Fix transfer learning (#2645)

* fix transfer learning
* add ParseSingleExample, DecodeBmp tf loader
* add corresponding unit tests

* remove potential performance downgrader

* abstractify tests with common spark lifecycle (#2654)

apply SparkContextLifeCycle to tests

default app name + extend SparkContextLifeCycle to other compatible tests

add custom before and after

* bump version to 0.8.0-SNAPSHOT (#2660)

* bump version to 0.8.0-SNAPSHOT

* add core link update

* [Enhancement] - Deco legacy transformers and train InceptionV1 to meet training target (#2661)

* refinement inception v1 training code

* fix ut due to the init change

* fix type

* fix param

* Add python API for new transformers and apply them in inception training example (#2663)

* refinement on python API

* fix ut

* fix: uuid() will return a new uuid every call (#2667)

* fix: uuid() will return a new uuid every call
* fix: add partitionId to value()
* fix: we need not add partition id to the value()
* fix: code clean

* [Bug Fix] Fix predictor issue while batch size == 1 for some topology (#2669)

* fix predictor issue

* refinemnt per batchsize only

* fix ut

* remove unused code

* fix batch size == 1

* fix ut

* AND/OR compound triggers support (#2675)

* AND/OR compound triggers support

* Unit-tests for compound triggers

* Unit-tests for compound triggers updates for OR

* Style fixes for Trigger.and/or

* Style fixes for Trigger.and/or

* Trigger.endWhen docs update

* Trigger.endWhen docs update

* add dnn graph (#2666)

* add dnn graph

* move compile to forward, add graph test to perf

* add dnn graph option to example

* style check

* replace dnn with dnn graph in examples

* Update README.md (#2681)

* Fix ut failed due to duplicated spark context (#2687)

* Fix ut failed

* fix ut

* add no phase api when initPrimitives (#2686)

* delete phase in iniPrimitives

* fix style check

* improve memoryReorder layer to handle conversion between nhwc and nchw (#2683)

* fix reorder to handle nhwc

* add init memory for ReorderMemory

* support same padding in dnn layer (#2684)

* support same padding in dnn layer

* meet review

* add BlasWrapper (#2690)

* add BlasWrapper

* refactor code

* meet review

* SerializerSpec excluded mkldnn.BlasWrapper

* change some comments

* add dnn output layer (#2691)

* add dnn output layer

* SerializerSpec excluded mkldnn Output

* change some comments

* Irelement (#2695)

* add ir element and convertion

* add more comments

* meet comments

* change map name

* support dlclassifiermodel binary classification (#2705)

* add IR graph and conversion from IR graph to blas graph or dnn graph (#2704)

* add ir graph

* fix model evaluate & conv without bias

* add dnnMode & support table inputs

* irelement & graph layer use same weights

* meet pr comments and code refactor

* update  latest weight for validation (#2710)

* convert static graph to IR graph and build (#2711)

* add static graph to IR graph

* meet pr comments

* [Enhancement] - Enhance unig test to avoid dynamic resource allocation issue by docker (#2713)

* make the core number fixed

* fix local predictor

* add Trigger and/or python API (#2682)

* update sparse tensor's document (#2714)

* Reserve all state in OptimMethod when calling Optimizer.optimize() multiple times (#2648)

* reserve optimMethod for each worker

* add valdiation throughput

* cache variable previousOptim

* fix: move mkldnn computing to a single thread pool (#2724)

Because if we use the parent thread directly, there will be two bugs,
1. The child threads forked from parent thread will be bound to core 0
because of the affinity settings.
2. The native thread has some unknown thread local variables. So if
the parent thread exits and is recreated, such as the thread from
Executors.newFixedThreadPool. The whole app will be segment fault.
The parent thread means the main thread (Local Mode) or worker thread of
mapPartition (Distributed Mode).

* add ceilMode for Pooling & fix batchNorm evaluate (#2708)

* add ceilMode for Pooling & fix batchNorm evaluate

* add training status for dnn layer

* fix comments

* fix IRGraph init & Add regualizer (#2736)

* fix IRGraph init & Add regualizer

* meet review comments

* fix: update mkldnn version to v0.17 issues. (#2712)

There're two issues,

1. the padding tensor required. mkl-dnn will use a padding tensor which
    will use more memory, such as 4x1x28x28 to 4x8x28x28(avx2). It will
    pad to times of simd width.
2. the TensorMMap between DenseTensor and DnnTensor. Previous impl
    will allocate DnnTensor when model is created, which will cost too much
    space. So this patch will allocate it at runtime.

* add computshape for some layers and add skip primitives in DnnGraph (#2740)

* add computshape for some layer and add skip primitives in DnnGraph

* meet pr comments

* include edge case to cover all the data types (#2742)

* layer auto fusion for dnn graph (#2746)

* add auto fusion in dnn graph

* refactor predict for dnn model (#2737)

* refactor predict for dnn model

* remove some unit tests (#2752)

* remove some conflict tests (#2753)

* Fix Add operation error when type is Double importing Tensorflow graph (#2721)

* feature: add byte supports for DnnTensor (#2751)

* feat: add byte supports for DnnTensor

* [New Feature] Calculating Scales (#2750)

* [New Feature]Calculating Scales

* recursively update mask for container module (#2754)

* recursively update mask for container module

* [Enhancement] - Speed up BlasWrapper performance under MKL-DNN (#2748)

* add parallel in Blaswrapper

* refactor to support ssd

* meet pr comments

* fix logger serialize

* change asInstanceOf to toDistirbuted in optimizer (#2755)

* change asInstanceOf to toDistirbuted

* change asInstanceOf to toDistirbuted

* convert scale in blas to dnn (#2758)

* convert scale in blas to dnn

* meet pr comment

* feat: reorder for int8 supports (#2756)

1. Because the new data type, we should add a new attribute called dataType
    to the `MemoryData`.
2. Because we should transfer the scales between FP32->int8 and Int8->FP32.
    we should add two new attributes called `mask` and `scales`.

* fix conversion accuracy (#2760)

*  fix accuracy for saved model

* exclude mkldnn model when conversion

* feature: layer wise supports of int8 (#2762)

Enable the int8 data type in layers, especially for convolutions.
So for a specific layer, it can accept a int8 input. If you want to the fp32
output, should add a reorder.

* feature: mkldnn int8 layer wise supports (#2759)

including 3 steps.

1. generate scales of model.
   need an api like `generateScalesWithMask` to generate the scales of
   fp32 model. and the model returned is an fp32 model too.
2. quantize the model
   the `quantize()` api will be compatible with the `bigquant`
   backend, which will set the quantize flag. And when doing compile,
   the quantized weight, output, input will be generated by mkldnn at
   runtime.
3. do the inference (forward).

* change some docs about mkldnn (#2765)

* add comments about mkldnn

* meet pr comments

* examples for int8 (#2761)

This is an example of how to use mkldnn int8. There're two steps, use
GenInt8Scales to generate the scales first and save the new model. And than you
can use the quantized model as usual.

* enable fustion by default (#2766)

* fix: the influence of default value of fusion (#2768)

* fix: use too much memory of mkldnn models (#2783)

* fix: inplace of input/output and weight dimension error (#2779)

Some layer's input and output use the same memory. We can't do forward in the
`calcScales`. Because at that time, the input has been changed, its scales maybe
not right. Such as,

Seqeuntail().add(Conv).add(ReLU)

it will do two steps, seq.forward(input) first. and when go into the ReLU, it
will do another forward, so the input will be the output. And scales will be
wrong.

For convolution's weight, the dimension always is 5, although the group number
is 1. But for dnn convolution, if there's no group, the weight's dimension
should be 4.

* fix: the blas wrapper has no scales (#2778)

* fix softmax (#2777)

* fix: performance regression on resnet50 (#2774)

the u8 to s8 or s8 to u8 needs no reorder on this case.

* fix log init (#2781)

* fix: dropout should init primitive (#2789)

* flip to 0.9.0 (#2792)

* test: should compare the right grad input (#2794)

* fix the wrong error message (#2800)

* [New feature] Add attention layer and ffn layer (#2795)

* add attention layer

* add ffn layer and more unit tests

* refactor according to pr comments

* add SerializationTest

* fix unit tests

* add python api

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer (#2802)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer:
Add LARS optimizer: Layer-wise scaled. Also with utility functions to build a set of LARS optim for a container.

Bug fix: The gradient block id of AllReduceParameter is originally composed of {id}{pidTo}gradientBytes{pidFrom}. But the combination of {id}{pidTo} will cause ambiguity. e.g., "112" can be {1}{12} or {11}{2}. Now a "_" is added to separate id from pidTo

* refine documents, correctly set the lrSchedulerOwner bit

* format the added code

* make Lars inherit SGD

* rename Lars -> LarsSGD and reformat

* style changes

* bugfix - set mask for container (#2807)

* bugfix - set mask for container

* bugfix #2805: set dimension mask

* Update Graph.scala

* Update Graph.scala

* change set mask indicator's name

* rename set mask params

* [Enhancement]: Scala Reflection: get default value for constructor parameters (#2808)

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* resolve conflict

* resolve conflict

* code style check

* remove print

* fix typos

fix typos

* replace randomcropper with centercrop for better performance (#2818)

* fix: memory data hash code should contain data type (#2821)

* Optimize backward graph generation and CAddTable (#2817)

* Optimize backward graph generation and caddtable

* refine add table

* change api name

* add layer norm and expand size layers (#2819)

* add layer norm and expand size

* meet pr comments

* feat: enable global average pooling (#2823)

* feat: enable global average pooling

* test: add more unit tests

* Optimizers: use member variable in parent class

* Revert "Optimizers: use member variable in parent class"

This reverts commit 7e47204

* Dilation in MKL-DNN Convolution (#2815)

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* fix typos

fix typos

* make todo all uppercase

* fix: calculate arbitrary mask of scales (#2822)

* Use one AllReduceParameter for multi-optim method  training (#2814)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* change random seed in UT

* [New feature] add transformer layer (#2825)

* add transformer

* refactor class name

* use same embedding for translation

* fix pr comments

* [Bug Fix] Fix Issue 2734 (#2816)

* fix issue 2734

* fix issue 2734

* fix issue 2734

* [Refactor] Reflection Utilization (#2831)

* refactor reflection utils

* refactor reflection utils

* feat: MKLDNN LSTM unidirectional/bidirectional inference support (#2806)

* LSTM draft

* MKLDNN LSTM fixed MD

* added hiddenSize

* setMemoryData NativeData

* weights NativeData format set to ldigo, all 1 test passed

* fixed format any problem

* LSTM weights bias initialisation

* add LSTM2 in nn

* Bidirectional LSTM inference enabled

* modified Bidirectional test

* LSTMSpec input format conversion bug between bigdl and mkldnn fixed, not support random weights, bias

* fixed the last problem 1 3 2 4

* Three inference tests with randomly generated parameters

* Added comments and modified the LSTMSpec (tests using Equivalent.nearequals)

* Deleted nn/LSTM2. Renamed methods. Added a requirement in nn/TimeDistributed

* combined initMemoryDescs() into initFwdPrimitives()

* Add require for input size and hidden size matching if layers of LSTM is more than one

* Refactor RNN

* Add comment on gate order to mkldnn/RNN

* Add unidirectional multilayer test

* add comments/ modify UTs

* phase is not used anymore/ use isTraining() in stead

* operationWant enhanced/ weight init/ release() parameters()

* remove input format check and change some variables names

* input format check / throw exception print info / release code

* comment style and RNNSerialTest

* remove unnecessary comments

* bug fix for cmul (#2836)

* bug fix for cmul

* meet pr comments

* set new storage to weight and bias for weight fusion (#2839)

* Add parameter processor for LARS (#2832)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* implement lars whole layer gradient norm calculation

* change random seed in UT

* add limitation on "trust" of LARS, remove debug output

* reformat

* add tests in DirtriOptimizer for LARS

* reformat

* update parameters in UT

* update parameters in UT

* Add transformer to LM example (#2835)

* add transformer to LM example

* refactor dropout in Transformer

* meet pr comments

* feat: MKLDNN LSTM unidirectional/bidirectional backward support (#2840)

* MKLDNN LSTM backward support with accuracy testing

* fix: require consistent between shape and layout of mkldnn (#2824)

* fix: fusion for multi-group of convolution (#2826)

* fix: support int8 of jointable (#2827)

* fix: support int8 of jointable
* doc: add more docs

* fix: invokeAndWait2 should throw the exception in the tasks (#2843)

* fix acc bug & init dnn thread (#2841)

* support tnc and ntc conversion (#2844)

* support ntc in dnn layer (#2847)

* support ntc in dnn layer

* meet pr comments

* [WIP]Add beam search feature in transformer model (#2834)

* add beam search feature

* Update beam search feature and unit test

* add symbolToLogits function set check

* update clearState and add serial test

* add SequenceBeamSearch to python layers

* add createSequenceBeamSearch method to python api

* feat: add a property to disable omp thread affinity (#2849)

* fix: use treeset to calc topk to upgrade the performance of DetectionOutputSSD (#2853)

* fix: wrong affinity settings (#2857)

* update beam search feature for interface with transformer model (#2855)

* update beam search for padding value and cache structure

* update python API for beam search

* add comments and update python layer

* modify comments format

* modify comments format

* Support converting blas lstm to dnn lstm (#2846)

* convert from blas lstm to dnn lstm

* meet pr comments

* fix load lstm error bug (#2858)

* Add beam search in transformer (#2856)

* Add beam search in transformer

* meet pr comments

* fix: upgrade the performance of normalize (#2854)

* feat: add axis to softmax (#2859)

* flip version to 0.10.0 (#2869)

* [Bug Fix] - Fix module version comparison  (#2871)

* update serialization

* update serialization

* convert IRgraph momentum to mkldnn (#2872)

* feat: RoiAlign Forward (#2874)

* Add set input output format API in Python (#2880)

* add set input output format

* add static graph check

* feat: Feature Pyramid Networks Forward (#2870)

* fix memory leak for ir graph training (#2895)

* add gemm layer (#2882)

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add gemm layer

* add gemm layer

* add Shape layer (#2885)

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add Gather layer (#2897)

* add gather layer

* [New feature] Add maskhead (#2892)

* support for maskhead

* fix unit tests (#2905)

* modify  predict/predictClass function  (#2868)

* predictClass output modification

* predict/predictClass function modification in Beta Api

* predict/predictClass function modification

* predict/predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* [New feature] Add Boxhead (#2894)

* add boxhead

* add SerialTest

* meet pr comments

* fix: Add TopBlocks to Feature Pyramid Networks (FPN) (#2899)

* Add Mean Average Precision validation method (#2906)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* fix boxhead unit tests (#2912)

* python api nested list input and pooler python api (#2900)

* Auto memory management for MKLDNN (#2867)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* style fixes

* change _implicitMemoryOwner -> _this

* [New feature] Add region proposal (#2896)

* add Regionproposal

* [New feature] add maskrcnn (#2908)

* add maskrcnn

* fix mask head

* move maskrcnn to models

* add maskrcnn serialTest

* Add Onnx Supported Layers (#2902)

* remove duplicated layers

* Update RoiLabel class and add RoiImageFeatureToBatch (#2913)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* update RoiLabel, add RoiImageFeatureToBatch

* fix typo in class name

* updates by suggestions

* minor updates

* Move RoiMiniBatch to MTImageFeatureToBatch.scala

* mask in RoiLabel now have Floats not Bytes

* use IndexedSeq for RoiLabel

* style fix

* add isCrowd and origSize to final target table

* style fix

* isCrowd change to float, add doc

* add tests and bug fixes

* add util getting RoiLabels from ImageFeatures

* add util getting RoiLabels from Table

* comment out the tests

* rename utils in RoiLabel

* feat: MKLDNN GRU forward/backward support (#2893)

* Onnx support: modify unsqueeze function (#2910)

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* add maskutils (#2921)

* add maskutils

* update tests & docs

* fix typo in document

* Fix memory leaks on training (#2914)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* fix memory leak in batch norm

* style fixes

* change _implicitMemoryOwner -> _this

* release submat

* release opencv submats

* support samples with different size  to one mini batch (#2929)

* add to batch with resize

* meet comments

* support batch for mask head and pooler (#2926)

* support batch for mask head

* meet comments

* Onnx support: add a dim parameter to ops.Gather (#2920)

* add dim parameter to ops.Gather

* improve and simplify code

* improve and simplify code

* improve and simplify code

* improve and simplify code

* support batch for regionproposal (#2928)

* support batch for regionproposal

* enable gru blas-to-dnn conversion (#2930)

* Onnx support: add pos parameter to softmax (#2933)

* add pos parameter to softmax

* add pos parameter to softmax

* add pos parameter to softmax

* fix review problem

* fix review problem

* Add resize for segmentation (#2923)

* add resize for segmentation

* meet pr comments

* support batch input for boxhead (#2924)

* boxhead support batch input

* meet pr comments

* COCO SeqFile (#2927)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* ignore non-existing images

* updates based on GH comments

* ONNX Support (#2918)

* onnx dev

* add onnx loader

* clean up

* feat: add precision recall auc (#2941)

* feat: add precision recall auc

* add post processing for maskrcnn model (#2931)

* add mask postprocessing

* put image info to mask model

* fix TimeDistributedCriterion() lack of parameter of dimension issue (#2940)

* revert back api (#2943)

* fix: softmax and bn+scale fusion (#2937)

* feat: multi models support with MKL-DNN backend (#2936)

* feat: multi models support with MKL-DNN backend

* add COCO MAP (#2935)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* add COCO MAP

* revert merge conflict

* ignore non-existing images

* add IOU related API. MAP now parses RLEs

* BBox now inclusive

* updates based on GH comments

* add COCODataset.getImageById

* COCO topK default => -1, remove height: Int, width: Int in GroundTruthRLE

* update imageId2Image

* rename MAPObjectDetection utils, add cocoSegmentationAndBBox, refine formatting

* rename utils

* update documents

* check size of bbox & classes & scores & labels & iscrowd. Handle empty predictions

* add gt and target image size checking, add support for empty target bbox, add UT

* detection sorted before matching with GT. Optimize MAPResult merging. Add UT for merging

* COCO Seq file reader: grey to bgr (#2942)

* grey to bgr

* refactor isGrayScaleImage

* simplify grey scale image checking

* Add the flushing denormal values option on BigDL side (#2934)

* add no argument apply api for softmax (#2945)

* add no argument apply api for softmax

* add no argument apply api for softmax

* add maskrcnn inference example (#2944)

* add maskrcnn inference example

* meet pr comments

* add model download url

* Update the RoiLabel and MTImageFeatureToBatch (#2925)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* Python MKLDNN examples for CNN(LeNet) and RNN(LSTM) (#2932)

* fix: takeSample only works for dnn backend and get one batch (#2947)

* fix: takeSample only works for dnn backend and get one batch

* Rename filesToRoiImageFrame to filesToRoiImageFeatures (#2949)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* filesToRoiImageFrame -> filesToRoiImageFeatures, to public

* fix: move out setMklThreads of MklDnn (#2950)

* memory data cleanup (#2956)

* memory data cleanup

* Onnx support: RoiAlign and TopK parameter update (#2957)

* Topk add dim and increase parameter

* RoiAlign add max pooling mode

* add test cases

* add test cases

* remove masks requirements (#2959)

* fix: the squeeze should not be included in IRElement (#2962)

* enhance COCODataset (#2954)

* enhance COCODataset:
Add COCODataset.loadFromSeqFile
Add COCODataset.toImageFeatures
Add COCOImage.toTable

* rename and polish doc

* fix COCO serialize bug

* fix typo in function name

* typo fix (#2965)

* rename RoiImageFeatureToBatch APIs (#2964)

* RoiMiniBatch enhancement (#2953)

* SerializableIndexedSeq

* allow empty target & image size info

* rename RoiImageFeatureToBatch APIs

* set as private

* change back to array

* MTImageFeatureToBatch without labels

* handle iscrowd

* remove duplication in merge

* feat: add softmax backward (#2967)

* feat: add softmax backward

* fix: fuse bn scale and relu to bn. (#2966)

* fix: fuse bn scale and relu.

* fix mask unit tests (#2973)

* fix: nms stability when using treeset. (#2972)

* flip version to 0.11 (#2974)

* refactor anchor generator (#2963)

* refactor anchor generator

* meet pr comments

* fix code style

* ROIAlign refactor (#2960)

* ROIAlign refactor

* fix unit tests

* fix model load of maskrcnn (#2961)

* fix maskrcnn model load

* delete temp file

* fix maskrcnn tests

* support roialign backward (#2975)

* support roialign backward

* fix sparselinear unit test

* fix: bn nhwc error, the channel should be the last dim (#2981)

* refactor: move torch relevants unit tests to integration tests. (#2971)

* fix: enable integration accuracy tests (#2976)

* fix: softmax dnn backend wrong order of primitive (#2986)

* modify TextClassifier.scala (#2987)

* Add a method to merge nested StaticGraphs (#2985)

* NHWC support when running with MKL-DNN (#2989)

* support NHWC for MKLDNN

* fix unit tests

* Keras with MKL-DNN backend support (#2990)

* feat: add distri optimizer v2 (#2992)

* update error message in AllReduceParameter (#2997)

* update error message in AllReduceParameter

* use tensorflow proto jar (#2994)

* Remove final for AbstractModule (#3001)

* DistriOptimizerV2 argument (#3003)

* call DistriOptimizerV2

* fix inception (#3010)

* fix top1 and treenn (#3011)

* remove final setExtraParameters (#3014)

* move pretrain in DistriOptimizerV2 (#3016)

* move getData

* rename

* remove time counting

* deprecate dlframe (#3012)

* deprecate dlframe

* fix throughput (#3017)

* fix throughput

* update

* test examples by distrioptimizerv2 (#3007)

* enable scala examples by distrioptimizerv2

* update example's readme

* update integration test

* deprecate nn.keras (#3013)

* deprecate nn.keras

* fix loss when minibatch size is different (#3021)

* fix loss

* fix ut

* fix style check (#3022)

* flip version to 0.12 (#3029)



* update

* fix KerasLayer new parameters() (#3034)

* Fix analytics zoo protobuf shading problem (#3033)

* change shade name and remove protobuf-java (already introduced by tf)

* remove protobuf

* add required dependencies (#3047)

* [WIP] spark 3.0 (#3054)

* spark 3.0

* Update scala maven plugin (#3068)

* update scala maven plugin

* change to public (#3064)

* Add big model support (#3067)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* squeeze target dimension (corner case) in ClassNLLCriterion (#3072)

* fix target dimension match error

* update message (#3073)

* flip version to 0.13-snapshot (#3074)

* flip version to 0.13-snapshot

* Uncompressed Tensor  (#3079)

* support no compressing parameter

* address comments

* hotfix ClassNLLCriterion with cloned target (#3081)

* hotfix ClassNLLCriterion with cloned target

* Fix SerializationUtils clone issue of QuantizedTensor (#3088)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* update clone quantizedtensor

* update

* add OptimPredictorShutdownSpec UT in integration test (#3089)

* move integration UT to a general test script (#3094)

* back port master (#3096)

* set seed to avoid random error in PredictionServiceUT (#3097)

* add serializeUid (#3099)

* update doc (#3104)

* remove DLFrames (#3124)

* remove DLFrames

* update

* update

* update

* rm dlframe example from test script

* Add Utest about dividing zero (#3128)

* Add Utest about dividing zero

* add Utest and zero check of LocalData

* add Utest and zero check of LocalData

* change

* Add Utest about dividing zero

* fix test

* make default DistriOptimizer as V2 (#3129)

* make default DistriOptimizer as V2

* update

* fix dlframe (#3133)

* DistriOptimizerV2 logger (#3135)

* DistriOptimizerV2 logger

* update

* fix style check

* validate epoch num

* move dlframe SharedParamsApater to AZ and roll back to OptimizerV1 (#3137)

* upgrade spark version (#3138)

* upgrade log4j (#3141)

* flip0.14 (#3142)

* flip0.14

* update

* fix common compile issue

* change dllib package name

* fix serization failure

* fix FileSpec failure

Co-authored-by: Yanzhang Wang <i8run15@gmail.com>
Co-authored-by: Ian Wong <yiheng.wang@intel.com>
Co-authored-by: Griffin Kardos <kardosgriffin@gmail.com>
Co-authored-by: Xin Qiu <qiuxin2012@users.noreply.github.com>
Co-authored-by: megaSpoon <bowen.she@intel.com>
Co-authored-by: Jerry Wu <wzhongyuan@gmail.com>
Co-authored-by: Tao Pathompong Ruangyam <tao@starcolon.com>
Co-authored-by: abdmob <abdulla.abd.m@gmail.com>
Co-authored-by: zhangxiaoli73 <380761639@qq.com>
Co-authored-by: LeicongLi <leicongli@gmail.com>
Co-authored-by: Emiliano Martinez <emimartinez.sanchez@gmail.com>
Co-authored-by: yaochi <yaochitc@gmail.com>
Co-authored-by: Menooker <Menooker@users.noreply.github.com>
Co-authored-by: Menooker <myjisgreat@live.cn>
Co-authored-by: Firecrackerxox <mengceng.he@intel.com>
Co-authored-by: majing921201 <1834475657@qq.com>
Co-authored-by: jenniew <jenniewang123@gmail.com>
Co-authored-by: Xiao <lingxiao1989@gmail.com>
Co-authored-by: Firecrackerxox <he044646@sina.com>
Co-authored-by: Hui Li <lihuibinghan@sina.com>
Co-authored-by: Yang Wang <yang3.wang@intel.com>
Co-authored-by: Le-Zheng <30695225+Le-Zheng@users.noreply.github.com>
Co-authored-by: Hangrui Cao <50705298+DiegoCao@users.noreply.github.com>
dding3 pushed a commit to dding3/analytics-zoo that referenced this pull request Aug 17, 2021
dding3 pushed a commit to dding3/analytics-zoo that referenced this pull request Aug 24, 2021
dding3 pushed a commit to dding3/analytics-zoo that referenced this pull request Aug 24, 2021
Le-Zheng added a commit that referenced this pull request Sep 1, 2021

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
* convert static graph to IR graph and build (#2711)

* add static graph to IR graph

* meet pr comments

* [Enhancement] - Enhance unig test to avoid dynamic resource allocation issue by docker (#2713)

* make the core number fixed

* fix local predictor

* add Trigger and/or python API (#2682)

* add spark 2.4 support (#2715)

* update sparse tensor's document (#2714)

* Reserve all state in OptimMethod when calling Optimizer.optimize() multiple times (#2648)

* reserve optimMethod for each worker

* add valdiation throughput

* cache variable previousOptim

* fix: move mkldnn computing to a single thread pool (#2724)

Because if we use the parent thread directly, there will be two bugs,
1. The child threads forked from parent thread will be bound to core 0
because of the affinity settings.
2. The native thread has some unknown thread local variables. So if
the parent thread exits and is recreated, such as the thread from
Executors.newFixedThreadPool. The whole app will be segment fault.
The parent thread means the main thread (Local Mode) or worker thread of
mapPartition (Distributed Mode).

* add ceilMode for Pooling & fix batchNorm evaluate (#2708)

* add ceilMode for Pooling & fix batchNorm evaluate

* add training status for dnn layer

* fix comments

* fix IRGraph init & Add regualizer (#2736)

* fix IRGraph init & Add regualizer

* meet review comments

* fix: update mkldnn version to v0.17 issues. (#2712)

There're two issues,

1. the padding tensor required. mkl-dnn will use a padding tensor which
    will use more memory, such as 4x1x28x28 to 4x8x28x28(avx2). It will
    pad to times of simd width.
2. the TensorMMap between DenseTensor and DnnTensor. Previous impl
    will allocate DnnTensor when model is created, which will cost too much
    space. So this patch will allocate it at runtime.

* add computshape for some layers and add skip primitives in DnnGraph (#2740)

* add computshape for some layer and add skip primitives in DnnGraph

* meet pr comments

* Improve documentation (#2745)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* include edge case to cover all the data types (#2742)

* layer auto fusion for dnn graph (#2746)

* add auto fusion in dnn graph

* refactor predict for dnn model (#2737)

* refactor predict for dnn model

* remove some unit tests (#2752)

* remove some conflict tests (#2753)

* Update documentation (#2749)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Fix Add operation error when type is Double importing Tensorflow graph (#2721)

* feature: add byte supports for DnnTensor (#2751)

* feat: add byte supports for DnnTensor

* [New Feature] Calculating Scales (#2750)

* [New Feature]Calculating Scales

* recursively update mask for container module (#2754)

* recursively update mask for container module

* [Enhancement] - Speed up BlasWrapper performance under MKL-DNN (#2748)

* add parallel in Blaswrapper

* refactor to support ssd

* meet pr comments

* fix logger serialize

* Loss Function docs improvement (#2757)

* Improve Loss Function docs v2

* change asInstanceOf to toDistirbuted in optimizer (#2755)

* change asInstanceOf to toDistirbuted

* change asInstanceOf to toDistirbuted

* convert scale in blas to dnn (#2758)

* convert scale in blas to dnn

* meet pr comment

* feat: reorder for int8 supports (#2756)

1. Because the new data type, we should add a new attribute called dataType
    to the `MemoryData`.
2. Because we should transfer the scales between FP32->int8 and Int8->FP32.
    we should add two new attributes called `mask` and `scales`.

* fix conversion accuracy (#2760)

*  fix accuracy for saved model

* exclude mkldnn model when conversion

* feature: layer wise supports of int8 (#2762)

Enable the int8 data type in layers, especially for convolutions.
So for a specific layer, it can accept a int8 input. If you want to the fp32
output, should add a reorder.

* feature: mkldnn int8 layer wise supports (#2759)

including 3 steps.

1. generate scales of model.
   need an api like `generateScalesWithMask` to generate the scales of
   fp32 model. and the model returned is an fp32 model too.
2. quantize the model
   the `quantize()` api will be compatible with the `bigquant`
   backend, which will set the quantize flag. And when doing compile,
   the quantized weight, output, input will be generated by mkldnn at
   runtime.
3. do the inference (forward).

* update readme for v1 training (#2763)

* update release doc for preparation (#2764)

* change some docs about mkldnn (#2765)

* add comments about mkldnn

* meet pr comments

* examples for int8 (#2761)

This is an example of how to use mkldnn int8. There're two steps, use
GenInt8Scales to generate the scales first and save the new model. And than you
can use the quantized model as usual.

* enable fustion by default (#2766)

* fix: the influence of default value of fusion (#2768)

* fix: use too much memory of mkldnn models (#2783)

* fix: inplace of input/output and weight dimension error (#2779)

Some layer's input and output use the same memory. We can't do forward in the
`calcScales`. Because at that time, the input has been changed, its scales maybe
not right. Such as,

Seqeuntail().add(Conv).add(ReLU)

it will do two steps, seq.forward(input) first. and when go into the ReLU, it
will do another forward, so the input will be the output. And scales will be
wrong.

For convolution's weight, the dimension always is 5, although the group number
is 1. But for dnn convolution, if there's no group, the weight's dimension
should be 4.

* fix: the blas wrapper has no scales (#2778)

* fix softmax (#2777)

* fix: performance regression on resnet50 (#2774)

the u8 to s8 or s8 to u8 needs no reorder on this case.

* fix log init (#2781)

* fix: dropout should init primitive (#2789)

* Docs update for spark 2.3, build 0.7 and deps exlude (#2671)

* flip to 0.9.0 (#2792)

* Improve Layer documentation v1 (#2767)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Loss Function docs improvement v1

* Improve Loss Function docs v2

* Improve Layers documentation

* Improve documentation on Activations

* minor fix

* Update a code section with python style on Metrics.md (#2665)

* [Fix] doc : some changes for scalaUserGuide and release links according to … (#2791)

* doc : some changes for scalaUserGuide and release links according to v0.8.0 release

* Update build-bigdl-core.md

* Update build-bigdl-core.md

* test: should compare the right grad input (#2794)

* fix the wrong error message (#2800)

* [New feature] Add attention layer and ffn layer (#2795)

* add attention layer

* add ffn layer and more unit tests

* refactor according to pr comments

* add SerializationTest

* fix unit tests

* add python api

* update readme with newly adopted mkl-dnn (#2803)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer (#2802)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer:
Add LARS optimizer: Layer-wise scaled. Also with utility functions to build a set of LARS optim for a container.

Bug fix: The gradient block id of AllReduceParameter is originally composed of {id}{pidTo}gradientBytes{pidFrom}. But the combination of {id}{pidTo} will cause ambiguity. e.g., "112" can be {1}{12} or {11}{2}. Now a "_" is added to separate id from pidTo

* refine documents, correctly set the lrSchedulerOwner bit

* format the added code

* make Lars inherit SGD

* rename Lars -> LarsSGD and reformat

* style changes

* bugfix - set mask for container (#2807)

* bugfix - set mask for container

* bugfix #2805: set dimension mask

* Update Graph.scala

* Update Graph.scala

* change set mask indicator's name

* rename set mask params

* [Enhancement]: Scala Reflection: get default value for constructor parameters (#2808)

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* resolve conflict

* resolve conflict

* code style check

* remove print

* fix typos

fix typos

* replace randomcropper with centercrop for better performance (#2818)

* fix: memory data hash code should contain data type (#2821)

* Optimize backward graph generation and CAddTable (#2817)

* Optimize backward graph generation and caddtable

* refine add table

* change api name

* add layer norm and expand size layers (#2819)

* add layer norm and expand size

* meet pr comments

* feat: enable global average pooling (#2823)

* feat: enable global average pooling

* test: add more unit tests

* Optimizers: use member variable in parent class

* Revert "Optimizers: use member variable in parent class"

This reverts commit 7e47204

* Dilation in MKL-DNN Convolution (#2815)

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* fix typos

fix typos

* make todo all uppercase

* fix: calculate arbitrary mask of scales (#2822)

* Use one AllReduceParameter for multi-optim method  training (#2814)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* change random seed in UT

* [New feature] add transformer layer (#2825)

* add transformer

* refactor class name

* use same embedding for translation

* fix pr comments

* [Bug Fix] Fix Issue 2734 (#2816)

* fix issue 2734

* fix issue 2734

* fix issue 2734

* [Refactor] Reflection Utilization (#2831)

* refactor reflection utils

* refactor reflection utils

* feat: MKLDNN LSTM unidirectional/bidirectional inference support (#2806)

* LSTM draft

* MKLDNN LSTM fixed MD

* added hiddenSize

* setMemoryData NativeData

* weights NativeData format set to ldigo, all 1 test passed

* fixed format any problem

* LSTM weights bias initialisation

* add LSTM2 in nn

* Bidirectional LSTM inference enabled

* modified Bidirectional test

* LSTMSpec input format conversion bug between bigdl and mkldnn fixed, not support random weights, bias

* fixed the last problem 1 3 2 4

* Three inference tests with randomly generated parameters

* Added comments and modified the LSTMSpec (tests using Equivalent.nearequals)

* Deleted nn/LSTM2. Renamed methods. Added a requirement in nn/TimeDistributed

* combined initMemoryDescs() into initFwdPrimitives()

* Add require for input size and hidden size matching if layers of LSTM is more than one

* Refactor RNN

* Add comment on gate order to mkldnn/RNN

* Add unidirectional multilayer test

* add comments/ modify UTs

* phase is not used anymore/ use isTraining() in stead

* operationWant enhanced/ weight init/ release() parameters()

* remove input format check and change some variables names

* input format check / throw exception print info / release code

* comment style and RNNSerialTest

* remove unnecessary comments

* Softmax -> SoftMax (#2837)

* bug fix for cmul (#2836)

* bug fix for cmul

* meet pr comments

* set new storage to weight and bias for weight fusion (#2839)

* Add parameter processor for LARS (#2832)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* implement lars whole layer gradient norm calculation

* change random seed in UT

* add limitation on "trust" of LARS, remove debug output

* reformat

* add tests in DirtriOptimizer for LARS

* reformat

* update parameters in UT

* update parameters in UT

* Add transformer to LM example (#2835)

* add transformer to LM example

* refactor dropout in Transformer

* meet pr comments

* feat: MKLDNN LSTM unidirectional/bidirectional backward support (#2840)

* MKLDNN LSTM backward support with accuracy testing

* fix: require consistent between shape and layout of mkldnn (#2824)

* fix: fusion for multi-group of convolution (#2826)

* fix: support int8 of jointable (#2827)

* fix: support int8 of jointable
* doc: add more docs

* fix: invokeAndWait2 should throw the exception in the tasks (#2843)

* fix acc bug & init dnn thread (#2841)

* support tnc and ntc conversion (#2844)

* support ntc in dnn layer (#2847)

* support ntc in dnn layer

* meet pr comments

* [WIP]Add beam search feature in transformer model (#2834)

* add beam search feature

* Update beam search feature and unit test

* add symbolToLogits function set check

* update clearState and add serial test

* add SequenceBeamSearch to python layers

* add createSequenceBeamSearch method to python api

* feat: add a property to disable omp thread affinity (#2849)

* fix: use treeset to calc topk to upgrade the performance of DetectionOutputSSD (#2853)

* fix: wrong affinity settings (#2857)

* update beam search feature for interface with transformer model (#2855)

* update beam search for padding value and cache structure

* update python API for beam search

* add comments and update python layer

* modify comments format

* modify comments format

* Support converting blas lstm to dnn lstm (#2846)

* convert from blas lstm to dnn lstm

* meet pr comments

* fix load lstm error bug (#2858)

* Add beam search in transformer (#2856)

* Add beam search in transformer

* meet pr comments

* fix: upgrade the performance of normalize (#2854)

* feat: add axis to softmax (#2859)

* add release doc for 0.9 (#2862)

* fix: update core ref to master (#2865)

* flip version to 0.10.0 (#2869)

* [Bug Fix] - Fix module version comparison  (#2871)

* update serialization

* update serialization

* convert IRgraph momentum to mkldnn (#2872)

* tutorial fix (#2879)

* feat: RoiAlign Forward (#2874)

* Add set input output format API in Python (#2880)

* add set input output format

* add static graph check

* feat: Feature Pyramid Networks Forward (#2870)

* fix memory leak for ir graph training (#2895)

* add gemm layer (#2882)

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add gemm layer

* add gemm layer

* add Shape layer (#2885)

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add Gather layer (#2897)

* add gather layer

* [New feature] Add maskhead (#2892)

* support for maskhead

* fix unit tests (#2905)

* modify  predict/predictClass function  (#2868)

* predictClass output modification

* predict/predictClass function modification in Beta Api

* predict/predictClass function modification

* predict/predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* [New feature] Add Boxhead (#2894)

* add boxhead

* add SerialTest

* meet pr comments

* fix: Add TopBlocks to Feature Pyramid Networks (FPN) (#2899)

* Add Mean Average Precision validation method (#2906)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* fix boxhead unit tests (#2912)

* python api nested list input and pooler python api (#2900)

* Auto memory management for MKLDNN (#2867)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* style fixes

* change _implicitMemoryOwner -> _this

* [New feature] Add region proposal (#2896)

* add Regionproposal

* [New feature] add maskrcnn (#2908)

* add maskrcnn

* fix mask head

* move maskrcnn to models

* add maskrcnn serialTest

* Add Onnx Supported Layers (#2902)

* remove duplicated layers

* Update RoiLabel class and add RoiImageFeatureToBatch (#2913)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* update RoiLabel, add RoiImageFeatureToBatch

* fix typo in class name

* updates by suggestions

* minor updates

* Move RoiMiniBatch to MTImageFeatureToBatch.scala

* mask in RoiLabel now have Floats not Bytes

* use IndexedSeq for RoiLabel

* style fix

* add isCrowd and origSize to final target table

* style fix

* isCrowd change to float, add doc

* add tests and bug fixes

* add util getting RoiLabels from ImageFeatures

* add util getting RoiLabels from Table

* comment out the tests

* rename utils in RoiLabel

* feat: MKLDNN GRU forward/backward support (#2893)

* Onnx support: modify unsqueeze function (#2910)

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* add maskutils (#2921)

* add maskutils

* update tests & docs

* fix typo in document

* Fix memory leaks on training (#2914)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* fix memory leak in batch norm

* style fixes

* change _implicitMemoryOwner -> _this

* release submat

* release opencv submats

* support samples with different size  to one mini batch (#2929)

* add to batch with resize

* meet comments

* support batch for mask head and pooler (#2926)

* support batch for mask head

* meet comments

* Onnx support: add a dim parameter to ops.Gather (#2920)

* add dim parameter to ops.Gather

* improve and simplify code

* improve and simplify code

* improve and simplify code

* improve and simplify code

* support batch for regionproposal (#2928)

* support batch for regionproposal

* enable gru blas-to-dnn conversion (#2930)

* Onnx support: add pos parameter to softmax (#2933)

* add pos parameter to softmax

* add pos parameter to softmax

* add pos parameter to softmax

* fix review problem

* fix review problem

* Add resize for segmentation (#2923)

* add resize for segmentation

* meet pr comments

* support batch input for boxhead (#2924)

* boxhead support batch input

* meet pr comments

* COCO SeqFile (#2927)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* ignore non-existing images

* updates based on GH comments

* ONNX Support (#2918)

* onnx dev

* add onnx loader

* clean up

* feat: add precision recall auc (#2941)

* feat: add precision recall auc

* add post processing for maskrcnn model (#2931)

* add mask postprocessing

* put image info to mask model

* fix TimeDistributedCriterion() lack of parameter of dimension issue (#2940)

* revert back api (#2943)

* fix: softmax and bn+scale fusion (#2937)

* feat: multi models support with MKL-DNN backend (#2936)

* feat: multi models support with MKL-DNN backend

* add COCO MAP (#2935)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* add COCO MAP

* revert merge conflict

* ignore non-existing images

* add IOU related API. MAP now parses RLEs

* BBox now inclusive

* updates based on GH comments

* add COCODataset.getImageById

* COCO topK default => -1, remove height: Int, width: Int in GroundTruthRLE

* update imageId2Image

* rename MAPObjectDetection utils, add cocoSegmentationAndBBox, refine formatting

* rename utils

* update documents

* check size of bbox & classes & scores & labels & iscrowd. Handle empty predictions

* add gt and target image size checking, add support for empty target bbox, add UT

* detection sorted before matching with GT. Optimize MAPResult merging. Add UT for merging

* COCO Seq file reader: grey to bgr (#2942)

* grey to bgr

* refactor isGrayScaleImage

* simplify grey scale image checking

* Add the flushing denormal values option on BigDL side (#2934)

* add no argument apply api for softmax (#2945)

* add no argument apply api for softmax

* add no argument apply api for softmax

* ONNX ResNet example (#2939)

* add onnx resnet example

* add doc for onnx

* add doc for onnx

* clean up

* add maskrcnn inference example (#2944)

* add maskrcnn inference example

* meet pr comments

* add model download url

* Update the RoiLabel and MTImageFeatureToBatch (#2925)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* Python MKLDNN examples for CNN(LeNet) and RNN(LSTM) (#2932)

* fix: takeSample only works for dnn backend and get one batch (#2947)

* fix: takeSample only works for dnn backend and get one batch

* edit doc (#2948)

* Rename filesToRoiImageFrame to filesToRoiImageFeatures (#2949)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* filesToRoiImageFrame -> filesToRoiImageFeatures, to public

* fix: move out setMklThreads of MklDnn (#2950)

* memory data cleanup (#2956)

* memory data cleanup

* Onnx support: RoiAlign and TopK parameter update (#2957)

* Topk add dim and increase parameter

* RoiAlign add max pooling mode

* add test cases

* add test cases

* remove masks requirements (#2959)

* fix: the squeeze should not be included in IRElement (#2962)

* enhance COCODataset (#2954)

* enhance COCODataset:
Add COCODataset.loadFromSeqFile
Add COCODataset.toImageFeatures
Add COCOImage.toTable

* rename and polish doc

* fix COCO serialize bug

* fix typo in function name

* typo fix (#2965)

* rename RoiImageFeatureToBatch APIs (#2964)

* RoiMiniBatch enhancement (#2953)

* SerializableIndexedSeq

* allow empty target & image size info

* rename RoiImageFeatureToBatch APIs

* set as private

* change back to array

* MTImageFeatureToBatch without labels

* handle iscrowd

* remove duplication in merge

* feat: add softmax backward (#2967)

* feat: add softmax backward

* fix: fuse bn scale and relu to bn. (#2966)

* fix: fuse bn scale and relu.

* fix mask unit tests (#2973)

* fix: nms stability when using treeset. (#2972)

* flip version to 0.11 (#2974)

* refactor anchor generator (#2963)

* refactor anchor generator

* meet pr comments

* fix code style

* ROIAlign refactor (#2960)

* ROIAlign refactor

* fix unit tests

* fix model load of maskrcnn (#2961)

* fix maskrcnn model load

* delete temp file

* fix maskrcnn tests

* support roialign backward (#2975)

* support roialign backward

* fix sparselinear unit test

* fix: bn nhwc error, the channel should be the last dim (#2981)

* refactor: move torch relevants unit tests to integration tests. (#2971)

* fix: enable integration accuracy tests (#2976)

* fix: softmax dnn backend wrong order of primitive (#2986)

* modify TextClassifier.scala (#2987)

* Add a method to merge nested StaticGraphs (#2985)

* NHWC support when running with MKL-DNN (#2989)

* support NHWC for MKLDNN

* fix unit tests

* Keras with MKL-DNN backend support (#2990)

* Update README.md

* Update README.md

* feat: add distri optimizer v2 (#2992)

* update error message in AllReduceParameter (#2997)

* update error message in AllReduceParameter

* use tensorflow proto jar (#2994)

* fix callBigDLFunc (#3002)

* Remove final for AbstractModule (#3001)

* DistriOptimizerV2 argument (#3003)

* call DistriOptimizerV2

* fix inception (#3010)

* fix top1 and treenn (#3011)

* remove final setExtraParameters (#3014)

* move pretrain in DistriOptimizerV2 (#3016)

* move getData

* rename

* remove time counting

* deprecate dlframe (#3012)

* deprecate dlframe

* fix throughput (#3017)

* fix throughput

* update

* add release doc for 0.10.0 (#3020)

* test examples by distrioptimizerv2 (#3007)

* enable scala examples by distrioptimizerv2

* update example's readme

* update integration test

* test python examples by distriOptimizerV2 (#3008)

* Test python examples by distriOptimizerV2

* deprecate nn.keras (#3013)

* deprecate nn.keras

* fix loss when minibatch size is different (#3021)

* fix loss

* fix ut

* fix style check (#3022)

* specify pyspark version (#3030)

* specify pyspark version

* add release doc for 0.11 (#3026)

* flip version to 0.12 (#3029)



* update

* fix KerasLayer new parameters() (#3034)

* Fix analytics zoo protobuf shading problem (#3033)

* change shade name and remove protobuf-java (already introduced by tf)

* remove protobuf

* add required dependencies (#3047)

* update doc (#3056)

* Updatedoc (#3060)

* Update install-from-pip.md

* [WIP] spark 3.0 (#3054)

* spark 3.0

* add spark3.0 deployment (#3061)

* add spark3.0 deployment

* add warning to remind Optimizer() deprecates (#3062)

* add warning to remind deprecates

* Update scala maven plugin (#3068)

* update scala maven plugin

* change to public (#3064)

* Add big model support (#3067)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* squeeze target dimension (corner case) in ClassNLLCriterion (#3072)

* fix target dimension match error

* update message (#3073)

* flip version to 0.13-snapshot (#3074)

* flip version to 0.13-snapshot

* Uncompressed Tensor  (#3079)

* support no compressing parameter

* address comments

* hotfix ClassNLLCriterion with cloned target (#3081)

* hotfix ClassNLLCriterion with cloned target

* Fix SerializationUtils clone issue of QuantizedTensor (#3088)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* update clone quantizedtensor

* update

* add OptimPredictorShutdownSpec UT in integration test (#3089)

* move integration UT to a general test script (#3094)

* back port master (#3096)

* set seed to avoid random error in PredictionServiceUT (#3097)

* Jdk11 support (#3098)

* update for jdk 11 support and doc

* add serializeUid (#3099)

* update doc (#3104)

* add doc for running in ide (#3106)

* fix callBigDLFunc return a Int while the true return value from java is a byte array. (#3111)

* add list of df support (#3113)

* Update readme (#3118)

* Update index.md

* add 0.12.2 release download (#3122)

* remove DLFrames (#3124)

* remove DLFrames

* update

* update

* update

* rm dlframe example from test script

* Add Utest about dividing zero (#3128)

* Add Utest about dividing zero

* add Utest and zero check of LocalData

* add Utest and zero check of LocalData

* change

* Add Utest about dividing zero

* fix test

* add python3 to Dockerfile (#3132)

* add python3 to Dockerfile

* update

* update jdk

* update

* make default DistriOptimizer as V2 (#3129)

* make default DistriOptimizer as V2

* update

* fix dlframe (#3133)

* DistriOptimizerV2 logger (#3135)

* DistriOptimizerV2 logger

* update

* fix style check

* validate epoch num

* move dlframe SharedParamsApater to AZ and roll back to OptimizerV1 (#3137)

* upgrade spark version (#3138)

* Update deploy-spark2.sh

* 0.13 release doc (#3144)

* upgrade log4j (#3141)

* flip0.14 (#3142)

* flip0.14

* update

* Update deploy-spark3.sh (#3145)

* update

* update

* update

* update

* fix make dist

* migrate path

* update

* update

Co-authored-by: zhangxiaoli73 <380761639@qq.com>
Co-authored-by: Jerry Wu <wzhongyuan@gmail.com>
Co-authored-by: Xin Qiu <qiuxin2012@users.noreply.github.com>
Co-authored-by: Yanzhang Wang <i8run15@gmail.com>
Co-authored-by: GenBrg <34305977+GenBrg@users.noreply.github.com>
Co-authored-by: LeicongLi <leicongli@gmail.com>
Co-authored-by: Emiliano Martinez <emimartinez.sanchez@gmail.com>
Co-authored-by: abdolence <abdulla.abd.m@gmail.com>
Co-authored-by: Enrique Garcia <engapa@gmail.com>
Co-authored-by: Louie Tsai <louie.tsai@intel.com>
Co-authored-by: yaochi <yaochitc@gmail.com>
Co-authored-by: Menooker <Menooker@users.noreply.github.com>
Co-authored-by: Menooker <myjisgreat@live.cn>
Co-authored-by: Firecrackerxox <mengceng.he@intel.com>
Co-authored-by: majing921201 <1834475657@qq.com>
Co-authored-by: jenniew <jenniewang123@gmail.com>
Co-authored-by: Xiao <lingxiao1989@gmail.com>
Co-authored-by: Firecrackerxox <he044646@sina.com>
Co-authored-by: Hui Li <lihuibinghan@sina.com>
Co-authored-by: Jason Dai <jason.dai@intel.com>
Co-authored-by: dding3 <ding.ding@intel.com>
Co-authored-by: Yang Wang <yang3.wang@intel.com>
Co-authored-by: Yina Chen <33650826+cyita@users.noreply.github.com>
Co-authored-by: Hangrui Cao <50705298+DiegoCao@users.noreply.github.com>
Co-authored-by: pinggao18 <44043817+pinggao18@users.noreply.github.com>
dding3 added a commit to dding3/analytics-zoo that referenced this pull request Sep 2, 2021
* convert static graph to IR graph and build (intel-analytics#2711)

* add static graph to IR graph

* meet pr comments

* [Enhancement] - Enhance unig test to avoid dynamic resource allocation issue by docker (intel-analytics#2713)

* make the core number fixed

* fix local predictor

* add Trigger and/or python API (intel-analytics#2682)

* add spark 2.4 support (intel-analytics#2715)

* update sparse tensor's document (#2714)

* Reserve all state in OptimMethod when calling Optimizer.optimize() multiple times (#2648)

* reserve optimMethod for each worker

* add valdiation throughput

* cache variable previousOptim

* fix: move mkldnn computing to a single thread pool (intel-analytics#2724)

Because if we use the parent thread directly, there will be two bugs,
1. The child threads forked from parent thread will be bound to core 0
because of the affinity settings.
2. The native thread has some unknown thread local variables. So if
the parent thread exits and is recreated, such as the thread from
Executors.newFixedThreadPool. The whole app will be segment fault.
The parent thread means the main thread (Local Mode) or worker thread of
mapPartition (Distributed Mode).

* add ceilMode for Pooling & fix batchNorm evaluate (#2708)

* add ceilMode for Pooling & fix batchNorm evaluate

* add training status for dnn layer

* fix comments

* fix IRGraph init & Add regualizer (#2736)

* fix IRGraph init & Add regualizer

* meet review comments

* fix: update mkldnn version to v0.17 issues. (intel-analytics#2712)

There're two issues,

1. the padding tensor required. mkl-dnn will use a padding tensor which
    will use more memory, such as 4x1x28x28 to 4x8x28x28(avx2). It will
    pad to times of simd width.
2. the TensorMMap between DenseTensor and DnnTensor. Previous impl
    will allocate DnnTensor when model is created, which will cost too much
    space. So this patch will allocate it at runtime.

* add computshape for some layers and add skip primitives in DnnGraph (intel-analytics#2740)

* add computshape for some layer and add skip primitives in DnnGraph

* meet pr comments

* Improve documentation (intel-analytics#2745)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* include edge case to cover all the data types (#2742)

* layer auto fusion for dnn graph (intel-analytics#2746)

* add auto fusion in dnn graph

* refactor predict for dnn model (intel-analytics#2737)

* refactor predict for dnn model

* remove some unit tests (intel-analytics#2752)

* remove some conflict tests (#2753)

* Update documentation (intel-analytics#2749)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Fix Add operation error when type is Double importing Tensorflow graph (#2721)

* feature: add byte supports for DnnTensor (intel-analytics#2751)

* feat: add byte supports for DnnTensor

* [New Feature] Calculating Scales (#2750)

* [New Feature]Calculating Scales

* recursively update mask for container module (intel-analytics#2754)

* recursively update mask for container module

* [Enhancement] - Speed up BlasWrapper performance under MKL-DNN (intel-analytics#2748)

* add parallel in Blaswrapper

* refactor to support ssd

* meet pr comments

* fix logger serialize

* Loss Function docs improvement (intel-analytics#2757)

* Improve Loss Function docs v2

* change asInstanceOf to toDistirbuted in optimizer (#2755)

* change asInstanceOf to toDistirbuted

* change asInstanceOf to toDistirbuted

* convert scale in blas to dnn (#2758)

* convert scale in blas to dnn

* meet pr comment

* feat: reorder for int8 supports (#2756)

1. Because the new data type, we should add a new attribute called dataType
    to the `MemoryData`.
2. Because we should transfer the scales between FP32->int8 and Int8->FP32.
    we should add two new attributes called `mask` and `scales`.

* fix conversion accuracy (intel-analytics#2760)

*  fix accuracy for saved model

* exclude mkldnn model when conversion

* feature: layer wise supports of int8 (intel-analytics#2762)

Enable the int8 data type in layers, especially for convolutions.
So for a specific layer, it can accept a int8 input. If you want to the fp32
output, should add a reorder.

* feature: mkldnn int8 layer wise supports (intel-analytics#2759)

including 3 steps.

1. generate scales of model.
   need an api like `generateScalesWithMask` to generate the scales of
   fp32 model. and the model returned is an fp32 model too.
2. quantize the model
   the `quantize()` api will be compatible with the `bigquant`
   backend, which will set the quantize flag. And when doing compile,
   the quantized weight, output, input will be generated by mkldnn at
   runtime.
3. do the inference (forward).

* update readme for v1 training (intel-analytics#2763)

* update release doc for preparation (intel-analytics#2764)

* change some docs about mkldnn (intel-analytics#2765)

* add comments about mkldnn

* meet pr comments

* examples for int8 (intel-analytics#2761)

This is an example of how to use mkldnn int8. There're two steps, use
GenInt8Scales to generate the scales first and save the new model. And than you
can use the quantized model as usual.

* enable fustion by default (intel-analytics#2766)

* fix: the influence of default value of fusion (#2768)

* fix: use too much memory of mkldnn models (intel-analytics#2783)

* fix: inplace of input/output and weight dimension error (intel-analytics#2779)

Some layer's input and output use the same memory. We can't do forward in the
`calcScales`. Because at that time, the input has been changed, its scales maybe
not right. Such as,

Seqeuntail().add(Conv).add(ReLU)

it will do two steps, seq.forward(input) first. and when go into the ReLU, it
will do another forward, so the input will be the output. And scales will be
wrong.

For convolution's weight, the dimension always is 5, although the group number
is 1. But for dnn convolution, if there's no group, the weight's dimension
should be 4.

* fix: the blas wrapper has no scales (intel-analytics#2778)

* fix softmax (intel-analytics#2777)

* fix: performance regression on resnet50 (intel-analytics#2774)

the u8 to s8 or s8 to u8 needs no reorder on this case.

* fix log init (#2781)

* fix: dropout should init primitive (#2789)

* Docs update for spark 2.3, build 0.7 and deps exlude (intel-analytics#2671)

* flip to 0.9.0 (intel-analytics#2792)

* Improve Layer documentation v1 (#2767)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Loss Function docs improvement v1

* Improve Loss Function docs v2

* Improve Layers documentation

* Improve documentation on Activations

* minor fix

* Update a code section with python style on Metrics.md (intel-analytics#2665)

* [Fix] doc : some changes for scalaUserGuide and release links according to … (intel-analytics#2791)

* doc : some changes for scalaUserGuide and release links according to v0.8.0 release

* Update build-bigdl-core.md

* Update build-bigdl-core.md

* test: should compare the right grad input (intel-analytics#2794)

* fix the wrong error message (#2800)

* [New feature] Add attention layer and ffn layer (intel-analytics#2795)

* add attention layer

* add ffn layer and more unit tests

* refactor according to pr comments

* add SerializationTest

* fix unit tests

* add python api

* update readme with newly adopted mkl-dnn (#2803)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer (intel-analytics#2802)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer:
Add LARS optimizer: Layer-wise scaled. Also with utility functions to build a set of LARS optim for a container.

Bug fix: The gradient block id of AllReduceParameter is originally composed of {id}{pidTo}gradientBytes{pidFrom}. But the combination of {id}{pidTo} will cause ambiguity. e.g., "112" can be {1}{12} or {11}{2}. Now a "_" is added to separate id from pidTo

* refine documents, correctly set the lrSchedulerOwner bit

* format the added code

* make Lars inherit SGD

* rename Lars -> LarsSGD and reformat

* style changes

* bugfix - set mask for container (intel-analytics#2807)

* bugfix - set mask for container

* bugfix #2805: set dimension mask

* Update Graph.scala

* Update Graph.scala

* change set mask indicator's name

* rename set mask params

* [Enhancement]: Scala Reflection: get default value for constructor parameters (intel-analytics#2808)

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* resolve conflict

* resolve conflict

* code style check

* remove print

* fix typos

fix typos

* replace randomcropper with centercrop for better performance (#2818)

* fix: memory data hash code should contain data type (intel-analytics#2821)

* Optimize backward graph generation and CAddTable (intel-analytics#2817)

* Optimize backward graph generation and caddtable

* refine add table

* change api name

* add layer norm and expand size layers (#2819)

* add layer norm and expand size

* meet pr comments

* feat: enable global average pooling (intel-analytics#2823)

* feat: enable global average pooling

* test: add more unit tests

* Optimizers: use member variable in parent class

* Revert "Optimizers: use member variable in parent class"

This reverts commit 7e47204

* Dilation in MKL-DNN Convolution (intel-analytics#2815)

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* fix typos

fix typos

* make todo all uppercase

* fix: calculate arbitrary mask of scales (intel-analytics#2822)

* Use one AllReduceParameter for multi-optim method  training (intel-analytics#2814)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* change random seed in UT

* [New feature] add transformer layer (intel-analytics#2825)

* add transformer

* refactor class name

* use same embedding for translation

* fix pr comments

* [Bug Fix] Fix Issue 2734 (#2816)

* fix issue 2734

* fix issue 2734

* fix issue 2734

* [Refactor] Reflection Utilization (#2831)

* refactor reflection utils

* refactor reflection utils

* feat: MKLDNN LSTM unidirectional/bidirectional inference support (intel-analytics#2806)

* LSTM draft

* MKLDNN LSTM fixed MD

* added hiddenSize

* setMemoryData NativeData

* weights NativeData format set to ldigo, all 1 test passed

* fixed format any problem

* LSTM weights bias initialisation

* add LSTM2 in nn

* Bidirectional LSTM inference enabled

* modified Bidirectional test

* LSTMSpec input format conversion bug between bigdl and mkldnn fixed, not support random weights, bias

* fixed the last problem 1 3 2 4

* Three inference tests with randomly generated parameters

* Added comments and modified the LSTMSpec (tests using Equivalent.nearequals)

* Deleted nn/LSTM2. Renamed methods. Added a requirement in nn/TimeDistributed

* combined initMemoryDescs() into initFwdPrimitives()

* Add require for input size and hidden size matching if layers of LSTM is more than one

* Refactor RNN

* Add comment on gate order to mkldnn/RNN

* Add unidirectional multilayer test

* add comments/ modify UTs

* phase is not used anymore/ use isTraining() in stead

* operationWant enhanced/ weight init/ release() parameters()

* remove input format check and change some variables names

* input format check / throw exception print info / release code

* comment style and RNNSerialTest

* remove unnecessary comments

* Softmax -> SoftMax (#2837)

* bug fix for cmul (intel-analytics#2836)

* bug fix for cmul

* meet pr comments

* set new storage to weight and bias for weight fusion (intel-analytics#2839)

* Add parameter processor for LARS (#2832)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* implement lars whole layer gradient norm calculation

* change random seed in UT

* add limitation on "trust" of LARS, remove debug output

* reformat

* add tests in DirtriOptimizer for LARS

* reformat

* update parameters in UT

* update parameters in UT

* Add transformer to LM example (intel-analytics#2835)

* add transformer to LM example

* refactor dropout in Transformer

* meet pr comments

* feat: MKLDNN LSTM unidirectional/bidirectional backward support (#2840)

* MKLDNN LSTM backward support with accuracy testing

* fix: require consistent between shape and layout of mkldnn (intel-analytics#2824)

* fix: fusion for multi-group of convolution (intel-analytics#2826)

* fix: support int8 of jointable (#2827)

* fix: support int8 of jointable
* doc: add more docs

* fix: invokeAndWait2 should throw the exception in the tasks (intel-analytics#2843)

* fix acc bug & init dnn thread (intel-analytics#2841)

* support tnc and ntc conversion (intel-analytics#2844)

* support ntc in dnn layer (intel-analytics#2847)

* support ntc in dnn layer

* meet pr comments

* [WIP]Add beam search feature in transformer model (intel-analytics#2834)

* add beam search feature

* Update beam search feature and unit test

* add symbolToLogits function set check

* update clearState and add serial test

* add SequenceBeamSearch to python layers

* add createSequenceBeamSearch method to python api

* feat: add a property to disable omp thread affinity (intel-analytics#2849)

* fix: use treeset to calc topk to upgrade the performance of DetectionOutputSSD (intel-analytics#2853)

* fix: wrong affinity settings (intel-analytics#2857)

* update beam search feature for interface with transformer model (#2855)

* update beam search for padding value and cache structure

* update python API for beam search

* add comments and update python layer

* modify comments format

* modify comments format

* Support converting blas lstm to dnn lstm (#2846)

* convert from blas lstm to dnn lstm

* meet pr comments

* fix load lstm error bug (intel-analytics#2858)

* Add beam search in transformer (intel-analytics#2856)

* Add beam search in transformer

* meet pr comments

* fix: upgrade the performance of normalize (intel-analytics#2854)

* feat: add axis to softmax (intel-analytics#2859)

* add release doc for 0.9 (intel-analytics#2862)

* fix: update core ref to master (intel-analytics#2865)

* flip version to 0.10.0 (intel-analytics#2869)

* [Bug Fix] - Fix module version comparison  (intel-analytics#2871)

* update serialization

* update serialization

* convert IRgraph momentum to mkldnn (intel-analytics#2872)

* tutorial fix (intel-analytics#2879)

* feat: RoiAlign Forward (intel-analytics#2874)

* Add set input output format API in Python (intel-analytics#2880)

* add set input output format

* add static graph check

* feat: Feature Pyramid Networks Forward (intel-analytics#2870)

* fix memory leak for ir graph training (intel-analytics#2895)

* add gemm layer (#2882)

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add gemm layer

* add gemm layer

* add Shape layer (intel-analytics#2885)

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add Gather layer (intel-analytics#2897)

* add gather layer

* [New feature] Add maskhead (intel-analytics#2892)

* support for maskhead

* fix unit tests (intel-analytics#2905)

* modify  predict/predictClass function  (#2868)

* predictClass output modification

* predict/predictClass function modification in Beta Api

* predict/predictClass function modification

* predict/predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* [New feature] Add Boxhead (intel-analytics#2894)

* add boxhead

* add SerialTest

* meet pr comments

* fix: Add TopBlocks to Feature Pyramid Networks (FPN) (#2899)

* Add Mean Average Precision validation method (intel-analytics#2906)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* fix boxhead unit tests (#2912)

* python api nested list input and pooler python api (intel-analytics#2900)

* Auto memory management for MKLDNN (#2867)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* style fixes

* change _implicitMemoryOwner -> _this

* [New feature] Add region proposal (intel-analytics#2896)

* add Regionproposal

* [New feature] add maskrcnn (#2908)

* add maskrcnn

* fix mask head

* move maskrcnn to models

* add maskrcnn serialTest

* Add Onnx Supported Layers (intel-analytics#2902)

* remove duplicated layers

* Update RoiLabel class and add RoiImageFeatureToBatch (intel-analytics#2913)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* update RoiLabel, add RoiImageFeatureToBatch

* fix typo in class name

* updates by suggestions

* minor updates

* Move RoiMiniBatch to MTImageFeatureToBatch.scala

* mask in RoiLabel now have Floats not Bytes

* use IndexedSeq for RoiLabel

* style fix

* add isCrowd and origSize to final target table

* style fix

* isCrowd change to float, add doc

* add tests and bug fixes

* add util getting RoiLabels from ImageFeatures

* add util getting RoiLabels from Table

* comment out the tests

* rename utils in RoiLabel

* feat: MKLDNN GRU forward/backward support (#2893)

* Onnx support: modify unsqueeze function (#2910)

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* add maskutils (intel-analytics#2921)

* add maskutils

* update tests & docs

* fix typo in document

* Fix memory leaks on training (intel-analytics#2914)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* fix memory leak in batch norm

* style fixes

* change _implicitMemoryOwner -> _this

* release submat

* release opencv submats

* support samples with different size  to one mini batch (intel-analytics#2929)

* add to batch with resize

* meet comments

* support batch for mask head and pooler (intel-analytics#2926)

* support batch for mask head

* meet comments

* Onnx support: add a dim parameter to ops.Gather (intel-analytics#2920)

* add dim parameter to ops.Gather

* improve and simplify code

* improve and simplify code

* improve and simplify code

* improve and simplify code

* support batch for regionproposal (#2928)

* support batch for regionproposal

* enable gru blas-to-dnn conversion (intel-analytics#2930)

* Onnx support: add pos parameter to softmax (intel-analytics#2933)

* add pos parameter to softmax

* add pos parameter to softmax

* add pos parameter to softmax

* fix review problem

* fix review problem

* Add resize for segmentation (intel-analytics#2923)

* add resize for segmentation

* meet pr comments

* support batch input for boxhead (#2924)

* boxhead support batch input

* meet pr comments

* COCO SeqFile (intel-analytics#2927)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* ignore non-existing images

* updates based on GH comments

* ONNX Support (#2918)

* onnx dev

* add onnx loader

* clean up

* feat: add precision recall auc (#2941)

* feat: add precision recall auc

* add post processing for maskrcnn model (#2931)

* add mask postprocessing

* put image info to mask model

* fix TimeDistributedCriterion() lack of parameter of dimension issue (intel-analytics#2940)

* revert back api (intel-analytics#2943)

* fix: softmax and bn+scale fusion (intel-analytics#2937)

* feat: multi models support with MKL-DNN backend (intel-analytics#2936)

* feat: multi models support with MKL-DNN backend

* add COCO MAP (#2935)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* add COCO MAP

* revert merge conflict

* ignore non-existing images

* add IOU related API. MAP now parses RLEs

* BBox now inclusive

* updates based on GH comments

* add COCODataset.getImageById

* COCO topK default => -1, remove height: Int, width: Int in GroundTruthRLE

* update imageId2Image

* rename MAPObjectDetection utils, add cocoSegmentationAndBBox, refine formatting

* rename utils

* update documents

* check size of bbox & classes & scores & labels & iscrowd. Handle empty predictions

* add gt and target image size checking, add support for empty target bbox, add UT

* detection sorted before matching with GT. Optimize MAPResult merging. Add UT for merging

* COCO Seq file reader: grey to bgr (intel-analytics#2942)

* grey to bgr

* refactor isGrayScaleImage

* simplify grey scale image checking

* Add the flushing denormal values option on BigDL side (#2934)

* add no argument apply api for softmax (intel-analytics#2945)

* add no argument apply api for softmax

* add no argument apply api for softmax

* ONNX ResNet example (intel-analytics#2939)

* add onnx resnet example

* add doc for onnx

* add doc for onnx

* clean up

* add maskrcnn inference example (intel-analytics#2944)

* add maskrcnn inference example

* meet pr comments

* add model download url

* Update the RoiLabel and MTImageFeatureToBatch (intel-analytics#2925)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* Python MKLDNN examples for CNN(LeNet) and RNN(LSTM) (#2932)

* fix: takeSample only works for dnn backend and get one batch (intel-analytics#2947)

* fix: takeSample only works for dnn backend and get one batch

* edit doc (#2948)

* Rename filesToRoiImageFrame to filesToRoiImageFeatures (intel-analytics#2949)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* filesToRoiImageFrame -> filesToRoiImageFeatures, to public

* fix: move out setMklThreads of MklDnn (intel-analytics#2950)

* memory data cleanup (#2956)

* memory data cleanup

* Onnx support: RoiAlign and TopK parameter update (#2957)

* Topk add dim and increase parameter

* RoiAlign add max pooling mode

* add test cases

* add test cases

* remove masks requirements (intel-analytics#2959)

* fix: the squeeze should not be included in IRElement (intel-analytics#2962)

* enhance COCODataset (#2954)

* enhance COCODataset:
Add COCODataset.loadFromSeqFile
Add COCODataset.toImageFeatures
Add COCOImage.toTable

* rename and polish doc

* fix COCO serialize bug

* fix typo in function name

* typo fix (intel-analytics#2965)

* rename RoiImageFeatureToBatch APIs (#2964)

* RoiMiniBatch enhancement (#2953)

* SerializableIndexedSeq

* allow empty target & image size info

* rename RoiImageFeatureToBatch APIs

* set as private

* change back to array

* MTImageFeatureToBatch without labels

* handle iscrowd

* remove duplication in merge

* feat: add softmax backward (intel-analytics#2967)

* feat: add softmax backward

* fix: fuse bn scale and relu to bn. (intel-analytics#2966)

* fix: fuse bn scale and relu.

* fix mask unit tests (intel-analytics#2973)

* fix: nms stability when using treeset. (intel-analytics#2972)

* flip version to 0.11 (intel-analytics#2974)

* refactor anchor generator (#2963)

* refactor anchor generator

* meet pr comments

* fix code style

* ROIAlign refactor (intel-analytics#2960)

* ROIAlign refactor

* fix unit tests

* fix model load of maskrcnn (intel-analytics#2961)

* fix maskrcnn model load

* delete temp file

* fix maskrcnn tests

* support roialign backward (intel-analytics#2975)

* support roialign backward

* fix sparselinear unit test

* fix: bn nhwc error, the channel should be the last dim (#2981)

* refactor: move torch relevants unit tests to integration tests. (intel-analytics#2971)

* fix: enable integration accuracy tests (intel-analytics#2976)

* fix: softmax dnn backend wrong order of primitive (intel-analytics#2986)

* modify TextClassifier.scala (#2987)

* Add a method to merge nested StaticGraphs (intel-analytics#2985)

* NHWC support when running with MKL-DNN (#2989)

* support NHWC for MKLDNN

* fix unit tests

* Keras with MKL-DNN backend support (#2990)

* Update README.md

* Update README.md

* feat: add distri optimizer v2 (intel-analytics#2992)

* update error message in AllReduceParameter (#2997)

* update error message in AllReduceParameter

* use tensorflow proto jar (#2994)

* fix callBigDLFunc (intel-analytics#3002)

* Remove final for AbstractModule (intel-analytics#3001)

* DistriOptimizerV2 argument (intel-analytics#3003)

* call DistriOptimizerV2

* fix inception (intel-analytics#3010)

* fix top1 and treenn (intel-analytics#3011)

* remove final setExtraParameters (#3014)

* move pretrain in DistriOptimizerV2 (intel-analytics#3016)

* move getData

* rename

* remove time counting

* deprecate dlframe (intel-analytics#3012)

* deprecate dlframe

* fix throughput (#3017)

* fix throughput

* update

* add release doc for 0.10.0 (intel-analytics#3020)

* test examples by distrioptimizerv2 (intel-analytics#3007)

* enable scala examples by distrioptimizerv2

* update example's readme

* update integration test

* test python examples by distriOptimizerV2 (intel-analytics#3008)

* Test python examples by distriOptimizerV2

* deprecate nn.keras (intel-analytics#3013)

* deprecate nn.keras

* fix loss when minibatch size is different (intel-analytics#3021)

* fix loss

* fix ut

* fix style check (intel-analytics#3022)

* specify pyspark version (intel-analytics#3030)

* specify pyspark version

* add release doc for 0.11 (#3026)

* flip version to 0.12 (intel-analytics#3029)



* update

* fix KerasLayer new parameters() (#3034)

* Fix analytics zoo protobuf shading problem (intel-analytics#3033)

* change shade name and remove protobuf-java (already introduced by tf)

* remove protobuf

* add required dependencies (#3047)

* update doc (intel-analytics#3056)

* Updatedoc (#3060)

* Update install-from-pip.md

* [WIP] spark 3.0 (intel-analytics#3054)

* spark 3.0

* add spark3.0 deployment (intel-analytics#3061)

* add spark3.0 deployment

* add warning to remind Optimizer() deprecates (intel-analytics#3062)

* add warning to remind deprecates

* Update scala maven plugin (#3068)

* update scala maven plugin

* change to public (#3064)

* Add big model support (#3067)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* squeeze target dimension (corner case) in ClassNLLCriterion (intel-analytics#3072)

* fix target dimension match error

* update message (#3073)

* flip version to 0.13-snapshot (intel-analytics#3074)

* flip version to 0.13-snapshot

* Uncompressed Tensor  (intel-analytics#3079)

* support no compressing parameter

* address comments

* hotfix ClassNLLCriterion with cloned target (#3081)

* hotfix ClassNLLCriterion with cloned target

* Fix SerializationUtils clone issue of QuantizedTensor (intel-analytics#3088)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* update clone quantizedtensor

* update

* add OptimPredictorShutdownSpec UT in integration test (#3089)

* move integration UT to a general test script (intel-analytics#3094)

* back port master (intel-analytics#3096)

* set seed to avoid random error in PredictionServiceUT (intel-analytics#3097)

* Jdk11 support (intel-analytics#3098)

* update for jdk 11 support and doc

* add serializeUid (intel-analytics#3099)

* update doc (intel-analytics#3104)

* add doc for running in ide (intel-analytics#3106)

* fix callBigDLFunc return a Int while the true return value from java is a byte array. (intel-analytics#3111)

* add list of df support (intel-analytics#3113)

* Update readme (intel-analytics#3118)

* Update index.md

* add 0.12.2 release download (#3122)

* remove DLFrames (intel-analytics#3124)

* remove DLFrames

* update

* update

* update

* rm dlframe example from test script

* Add Utest about dividing zero (#3128)

* Add Utest about dividing zero

* add Utest and zero check of LocalData

* add Utest and zero check of LocalData

* change

* Add Utest about dividing zero

* fix test

* add python3 to Dockerfile (intel-analytics#3132)

* add python3 to Dockerfile

* update

* update jdk

* update

* make default DistriOptimizer as V2 (intel-analytics#3129)

* make default DistriOptimizer as V2

* update

* fix dlframe (intel-analytics#3133)

* DistriOptimizerV2 logger (intel-analytics#3135)

* DistriOptimizerV2 logger

* update

* fix style check

* validate epoch num

* move dlframe SharedParamsApater to AZ and roll back to OptimizerV1 (intel-analytics#3137)

* upgrade spark version (intel-analytics#3138)

* Update deploy-spark2.sh

* 0.13 release doc (#3144)

* upgrade log4j (intel-analytics#3141)

* flip0.14 (intel-analytics#3142)

* flip0.14

* update

* Update deploy-spark3.sh (#3145)

* update

* update

* update

* update

* fix make dist

* migrate path

* update

* update

Co-authored-by: zhangxiaoli73 <380761639@qq.com>
Co-authored-by: Jerry Wu <wzhongyuan@gmail.com>
Co-authored-by: Xin Qiu <qiuxin2012@users.noreply.github.com>
Co-authored-by: Yanzhang Wang <i8run15@gmail.com>
Co-authored-by: GenBrg <34305977+GenBrg@users.noreply.github.com>
Co-authored-by: LeicongLi <leicongli@gmail.com>
Co-authored-by: Emiliano Martinez <emimartinez.sanchez@gmail.com>
Co-authored-by: abdolence <abdulla.abd.m@gmail.com>
Co-authored-by: Enrique Garcia <engapa@gmail.com>
Co-authored-by: Louie Tsai <louie.tsai@intel.com>
Co-authored-by: yaochi <yaochitc@gmail.com>
Co-authored-by: Menooker <Menooker@users.noreply.github.com>
Co-authored-by: Menooker <myjisgreat@live.cn>
Co-authored-by: Firecrackerxox <mengceng.he@intel.com>
Co-authored-by: majing921201 <1834475657@qq.com>
Co-authored-by: jenniew <jenniewang123@gmail.com>
Co-authored-by: Xiao <lingxiao1989@gmail.com>
Co-authored-by: Firecrackerxox <he044646@sina.com>
Co-authored-by: Hui Li <lihuibinghan@sina.com>
Co-authored-by: Jason Dai <jason.dai@intel.com>
Co-authored-by: dding3 <ding.ding@intel.com>
Co-authored-by: Yang Wang <yang3.wang@intel.com>
Co-authored-by: Yina Chen <33650826+cyita@users.noreply.github.com>
Co-authored-by: Hangrui Cao <50705298+DiegoCao@users.noreply.github.com>
Co-authored-by: pinggao18 <44043817+pinggao18@users.noreply.github.com>
dding3 added a commit to dding3/analytics-zoo that referenced this pull request Sep 7, 2021
* convert static graph to IR graph and build (intel-analytics#2711)

* add static graph to IR graph

* meet pr comments

* [Enhancement] - Enhance unig test to avoid dynamic resource allocation issue by docker (intel-analytics#2713)

* make the core number fixed

* fix local predictor

* add Trigger and/or python API (intel-analytics#2682)

* add spark 2.4 support (intel-analytics#2715)

* update sparse tensor's document (#2714)

* Reserve all state in OptimMethod when calling Optimizer.optimize() multiple times (#2648)

* reserve optimMethod for each worker

* add valdiation throughput

* cache variable previousOptim

* fix: move mkldnn computing to a single thread pool (intel-analytics#2724)

Because if we use the parent thread directly, there will be two bugs,
1. The child threads forked from parent thread will be bound to core 0
because of the affinity settings.
2. The native thread has some unknown thread local variables. So if
the parent thread exits and is recreated, such as the thread from
Executors.newFixedThreadPool. The whole app will be segment fault.
The parent thread means the main thread (Local Mode) or worker thread of
mapPartition (Distributed Mode).

* add ceilMode for Pooling & fix batchNorm evaluate (#2708)

* add ceilMode for Pooling & fix batchNorm evaluate

* add training status for dnn layer

* fix comments

* fix IRGraph init & Add regualizer (#2736)

* fix IRGraph init & Add regualizer

* meet review comments

* fix: update mkldnn version to v0.17 issues. (intel-analytics#2712)

There're two issues,

1. the padding tensor required. mkl-dnn will use a padding tensor which
    will use more memory, such as 4x1x28x28 to 4x8x28x28(avx2). It will
    pad to times of simd width.
2. the TensorMMap between DenseTensor and DnnTensor. Previous impl
    will allocate DnnTensor when model is created, which will cost too much
    space. So this patch will allocate it at runtime.

* add computshape for some layers and add skip primitives in DnnGraph (intel-analytics#2740)

* add computshape for some layer and add skip primitives in DnnGraph

* meet pr comments

* Improve documentation (intel-analytics#2745)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* include edge case to cover all the data types (#2742)

* layer auto fusion for dnn graph (intel-analytics#2746)

* add auto fusion in dnn graph

* refactor predict for dnn model (intel-analytics#2737)

* refactor predict for dnn model

* remove some unit tests (intel-analytics#2752)

* remove some conflict tests (#2753)

* Update documentation (intel-analytics#2749)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Fix Add operation error when type is Double importing Tensorflow graph (#2721)

* feature: add byte supports for DnnTensor (intel-analytics#2751)

* feat: add byte supports for DnnTensor

* [New Feature] Calculating Scales (#2750)

* [New Feature]Calculating Scales

* recursively update mask for container module (intel-analytics#2754)

* recursively update mask for container module

* [Enhancement] - Speed up BlasWrapper performance under MKL-DNN (intel-analytics#2748)

* add parallel in Blaswrapper

* refactor to support ssd

* meet pr comments

* fix logger serialize

* Loss Function docs improvement (intel-analytics#2757)

* Improve Loss Function docs v2

* change asInstanceOf to toDistirbuted in optimizer (#2755)

* change asInstanceOf to toDistirbuted

* change asInstanceOf to toDistirbuted

* convert scale in blas to dnn (#2758)

* convert scale in blas to dnn

* meet pr comment

* feat: reorder for int8 supports (#2756)

1. Because the new data type, we should add a new attribute called dataType
    to the `MemoryData`.
2. Because we should transfer the scales between FP32->int8 and Int8->FP32.
    we should add two new attributes called `mask` and `scales`.

* fix conversion accuracy (intel-analytics#2760)

*  fix accuracy for saved model

* exclude mkldnn model when conversion

* feature: layer wise supports of int8 (intel-analytics#2762)

Enable the int8 data type in layers, especially for convolutions.
So for a specific layer, it can accept a int8 input. If you want to the fp32
output, should add a reorder.

* feature: mkldnn int8 layer wise supports (intel-analytics#2759)

including 3 steps.

1. generate scales of model.
   need an api like `generateScalesWithMask` to generate the scales of
   fp32 model. and the model returned is an fp32 model too.
2. quantize the model
   the `quantize()` api will be compatible with the `bigquant`
   backend, which will set the quantize flag. And when doing compile,
   the quantized weight, output, input will be generated by mkldnn at
   runtime.
3. do the inference (forward).

* update readme for v1 training (intel-analytics#2763)

* update release doc for preparation (intel-analytics#2764)

* change some docs about mkldnn (intel-analytics#2765)

* add comments about mkldnn

* meet pr comments

* examples for int8 (intel-analytics#2761)

This is an example of how to use mkldnn int8. There're two steps, use
GenInt8Scales to generate the scales first and save the new model. And than you
can use the quantized model as usual.

* enable fustion by default (intel-analytics#2766)

* fix: the influence of default value of fusion (#2768)

* fix: use too much memory of mkldnn models (intel-analytics#2783)

* fix: inplace of input/output and weight dimension error (intel-analytics#2779)

Some layer's input and output use the same memory. We can't do forward in the
`calcScales`. Because at that time, the input has been changed, its scales maybe
not right. Such as,

Seqeuntail().add(Conv).add(ReLU)

it will do two steps, seq.forward(input) first. and when go into the ReLU, it
will do another forward, so the input will be the output. And scales will be
wrong.

For convolution's weight, the dimension always is 5, although the group number
is 1. But for dnn convolution, if there's no group, the weight's dimension
should be 4.

* fix: the blas wrapper has no scales (intel-analytics#2778)

* fix softmax (intel-analytics#2777)

* fix: performance regression on resnet50 (intel-analytics#2774)

the u8 to s8 or s8 to u8 needs no reorder on this case.

* fix log init (#2781)

* fix: dropout should init primitive (#2789)

* Docs update for spark 2.3, build 0.7 and deps exlude (intel-analytics#2671)

* flip to 0.9.0 (intel-analytics#2792)

* Improve Layer documentation v1 (#2767)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Loss Function docs improvement v1

* Improve Loss Function docs v2

* Improve Layers documentation

* Improve documentation on Activations

* minor fix

* Update a code section with python style on Metrics.md (intel-analytics#2665)

* [Fix] doc : some changes for scalaUserGuide and release links according to … (intel-analytics#2791)

* doc : some changes for scalaUserGuide and release links according to v0.8.0 release

* Update build-bigdl-core.md

* Update build-bigdl-core.md

* test: should compare the right grad input (intel-analytics#2794)

* fix the wrong error message (#2800)

* [New feature] Add attention layer and ffn layer (intel-analytics#2795)

* add attention layer

* add ffn layer and more unit tests

* refactor according to pr comments

* add SerializationTest

* fix unit tests

* add python api

* update readme with newly adopted mkl-dnn (#2803)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer (intel-analytics#2802)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer:
Add LARS optimizer: Layer-wise scaled. Also with utility functions to build a set of LARS optim for a container.

Bug fix: The gradient block id of AllReduceParameter is originally composed of {id}{pidTo}gradientBytes{pidFrom}. But the combination of {id}{pidTo} will cause ambiguity. e.g., "112" can be {1}{12} or {11}{2}. Now a "_" is added to separate id from pidTo

* refine documents, correctly set the lrSchedulerOwner bit

* format the added code

* make Lars inherit SGD

* rename Lars -> LarsSGD and reformat

* style changes

* bugfix - set mask for container (intel-analytics#2807)

* bugfix - set mask for container

* bugfix #2805: set dimension mask

* Update Graph.scala

* Update Graph.scala

* change set mask indicator's name

* rename set mask params

* [Enhancement]: Scala Reflection: get default value for constructor parameters (intel-analytics#2808)

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* resolve conflict

* resolve conflict

* code style check

* remove print

* fix typos

fix typos

* replace randomcropper with centercrop for better performance (#2818)

* fix: memory data hash code should contain data type (intel-analytics#2821)

* Optimize backward graph generation and CAddTable (intel-analytics#2817)

* Optimize backward graph generation and caddtable

* refine add table

* change api name

* add layer norm and expand size layers (#2819)

* add layer norm and expand size

* meet pr comments

* feat: enable global average pooling (intel-analytics#2823)

* feat: enable global average pooling

* test: add more unit tests

* Optimizers: use member variable in parent class

* Revert "Optimizers: use member variable in parent class"

This reverts commit 7e47204

* Dilation in MKL-DNN Convolution (intel-analytics#2815)

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* fix typos

fix typos

* make todo all uppercase

* fix: calculate arbitrary mask of scales (intel-analytics#2822)

* Use one AllReduceParameter for multi-optim method  training (intel-analytics#2814)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* change random seed in UT

* [New feature] add transformer layer (intel-analytics#2825)

* add transformer

* refactor class name

* use same embedding for translation

* fix pr comments

* [Bug Fix] Fix Issue 2734 (#2816)

* fix issue 2734

* fix issue 2734

* fix issue 2734

* [Refactor] Reflection Utilization (#2831)

* refactor reflection utils

* refactor reflection utils

* feat: MKLDNN LSTM unidirectional/bidirectional inference support (intel-analytics#2806)

* LSTM draft

* MKLDNN LSTM fixed MD

* added hiddenSize

* setMemoryData NativeData

* weights NativeData format set to ldigo, all 1 test passed

* fixed format any problem

* LSTM weights bias initialisation

* add LSTM2 in nn

* Bidirectional LSTM inference enabled

* modified Bidirectional test

* LSTMSpec input format conversion bug between bigdl and mkldnn fixed, not support random weights, bias

* fixed the last problem 1 3 2 4

* Three inference tests with randomly generated parameters

* Added comments and modified the LSTMSpec (tests using Equivalent.nearequals)

* Deleted nn/LSTM2. Renamed methods. Added a requirement in nn/TimeDistributed

* combined initMemoryDescs() into initFwdPrimitives()

* Add require for input size and hidden size matching if layers of LSTM is more than one

* Refactor RNN

* Add comment on gate order to mkldnn/RNN

* Add unidirectional multilayer test

* add comments/ modify UTs

* phase is not used anymore/ use isTraining() in stead

* operationWant enhanced/ weight init/ release() parameters()

* remove input format check and change some variables names

* input format check / throw exception print info / release code

* comment style and RNNSerialTest

* remove unnecessary comments

* Softmax -> SoftMax (#2837)

* bug fix for cmul (intel-analytics#2836)

* bug fix for cmul

* meet pr comments

* set new storage to weight and bias for weight fusion (intel-analytics#2839)

* Add parameter processor for LARS (#2832)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* implement lars whole layer gradient norm calculation

* change random seed in UT

* add limitation on "trust" of LARS, remove debug output

* reformat

* add tests in DirtriOptimizer for LARS

* reformat

* update parameters in UT

* update parameters in UT

* Add transformer to LM example (intel-analytics#2835)

* add transformer to LM example

* refactor dropout in Transformer

* meet pr comments

* feat: MKLDNN LSTM unidirectional/bidirectional backward support (#2840)

* MKLDNN LSTM backward support with accuracy testing

* fix: require consistent between shape and layout of mkldnn (intel-analytics#2824)

* fix: fusion for multi-group of convolution (intel-analytics#2826)

* fix: support int8 of jointable (#2827)

* fix: support int8 of jointable
* doc: add more docs

* fix: invokeAndWait2 should throw the exception in the tasks (intel-analytics#2843)

* fix acc bug & init dnn thread (intel-analytics#2841)

* support tnc and ntc conversion (intel-analytics#2844)

* support ntc in dnn layer (intel-analytics#2847)

* support ntc in dnn layer

* meet pr comments

* [WIP]Add beam search feature in transformer model (intel-analytics#2834)

* add beam search feature

* Update beam search feature and unit test

* add symbolToLogits function set check

* update clearState and add serial test

* add SequenceBeamSearch to python layers

* add createSequenceBeamSearch method to python api

* feat: add a property to disable omp thread affinity (intel-analytics#2849)

* fix: use treeset to calc topk to upgrade the performance of DetectionOutputSSD (intel-analytics#2853)

* fix: wrong affinity settings (intel-analytics#2857)

* update beam search feature for interface with transformer model (#2855)

* update beam search for padding value and cache structure

* update python API for beam search

* add comments and update python layer

* modify comments format

* modify comments format

* Support converting blas lstm to dnn lstm (#2846)

* convert from blas lstm to dnn lstm

* meet pr comments

* fix load lstm error bug (intel-analytics#2858)

* Add beam search in transformer (intel-analytics#2856)

* Add beam search in transformer

* meet pr comments

* fix: upgrade the performance of normalize (intel-analytics#2854)

* feat: add axis to softmax (intel-analytics#2859)

* add release doc for 0.9 (intel-analytics#2862)

* fix: update core ref to master (intel-analytics#2865)

* flip version to 0.10.0 (intel-analytics#2869)

* [Bug Fix] - Fix module version comparison  (intel-analytics#2871)

* update serialization

* update serialization

* convert IRgraph momentum to mkldnn (intel-analytics#2872)

* tutorial fix (intel-analytics#2879)

* feat: RoiAlign Forward (intel-analytics#2874)

* Add set input output format API in Python (intel-analytics#2880)

* add set input output format

* add static graph check

* feat: Feature Pyramid Networks Forward (intel-analytics#2870)

* fix memory leak for ir graph training (intel-analytics#2895)

* add gemm layer (#2882)

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add gemm layer

* add gemm layer

* add Shape layer (intel-analytics#2885)

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add Gather layer (intel-analytics#2897)

* add gather layer

* [New feature] Add maskhead (intel-analytics#2892)

* support for maskhead

* fix unit tests (intel-analytics#2905)

* modify  predict/predictClass function  (#2868)

* predictClass output modification

* predict/predictClass function modification in Beta Api

* predict/predictClass function modification

* predict/predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* [New feature] Add Boxhead (intel-analytics#2894)

* add boxhead

* add SerialTest

* meet pr comments

* fix: Add TopBlocks to Feature Pyramid Networks (FPN) (#2899)

* Add Mean Average Precision validation method (intel-analytics#2906)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* fix boxhead unit tests (#2912)

* python api nested list input and pooler python api (intel-analytics#2900)

* Auto memory management for MKLDNN (#2867)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* style fixes

* change _implicitMemoryOwner -> _this

* [New feature] Add region proposal (intel-analytics#2896)

* add Regionproposal

* [New feature] add maskrcnn (#2908)

* add maskrcnn

* fix mask head

* move maskrcnn to models

* add maskrcnn serialTest

* Add Onnx Supported Layers (intel-analytics#2902)

* remove duplicated layers

* Update RoiLabel class and add RoiImageFeatureToBatch (intel-analytics#2913)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* update RoiLabel, add RoiImageFeatureToBatch

* fix typo in class name

* updates by suggestions

* minor updates

* Move RoiMiniBatch to MTImageFeatureToBatch.scala

* mask in RoiLabel now have Floats not Bytes

* use IndexedSeq for RoiLabel

* style fix

* add isCrowd and origSize to final target table

* style fix

* isCrowd change to float, add doc

* add tests and bug fixes

* add util getting RoiLabels from ImageFeatures

* add util getting RoiLabels from Table

* comment out the tests

* rename utils in RoiLabel

* feat: MKLDNN GRU forward/backward support (#2893)

* Onnx support: modify unsqueeze function (#2910)

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* add maskutils (intel-analytics#2921)

* add maskutils

* update tests & docs

* fix typo in document

* Fix memory leaks on training (intel-analytics#2914)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* fix memory leak in batch norm

* style fixes

* change _implicitMemoryOwner -> _this

* release submat

* release opencv submats

* support samples with different size  to one mini batch (intel-analytics#2929)

* add to batch with resize

* meet comments

* support batch for mask head and pooler (intel-analytics#2926)

* support batch for mask head

* meet comments

* Onnx support: add a dim parameter to ops.Gather (intel-analytics#2920)

* add dim parameter to ops.Gather

* improve and simplify code

* improve and simplify code

* improve and simplify code

* improve and simplify code

* support batch for regionproposal (#2928)

* support batch for regionproposal

* enable gru blas-to-dnn conversion (intel-analytics#2930)

* Onnx support: add pos parameter to softmax (intel-analytics#2933)

* add pos parameter to softmax

* add pos parameter to softmax

* add pos parameter to softmax

* fix review problem

* fix review problem

* Add resize for segmentation (intel-analytics#2923)

* add resize for segmentation

* meet pr comments

* support batch input for boxhead (#2924)

* boxhead support batch input

* meet pr comments

* COCO SeqFile (intel-analytics#2927)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* ignore non-existing images

* updates based on GH comments

* ONNX Support (#2918)

* onnx dev

* add onnx loader

* clean up

* feat: add precision recall auc (#2941)

* feat: add precision recall auc

* add post processing for maskrcnn model (#2931)

* add mask postprocessing

* put image info to mask model

* fix TimeDistributedCriterion() lack of parameter of dimension issue (intel-analytics#2940)

* revert back api (intel-analytics#2943)

* fix: softmax and bn+scale fusion (intel-analytics#2937)

* feat: multi models support with MKL-DNN backend (intel-analytics#2936)

* feat: multi models support with MKL-DNN backend

* add COCO MAP (#2935)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* add COCO MAP

* revert merge conflict

* ignore non-existing images

* add IOU related API. MAP now parses RLEs

* BBox now inclusive

* updates based on GH comments

* add COCODataset.getImageById

* COCO topK default => -1, remove height: Int, width: Int in GroundTruthRLE

* update imageId2Image

* rename MAPObjectDetection utils, add cocoSegmentationAndBBox, refine formatting

* rename utils

* update documents

* check size of bbox & classes & scores & labels & iscrowd. Handle empty predictions

* add gt and target image size checking, add support for empty target bbox, add UT

* detection sorted before matching with GT. Optimize MAPResult merging. Add UT for merging

* COCO Seq file reader: grey to bgr (intel-analytics#2942)

* grey to bgr

* refactor isGrayScaleImage

* simplify grey scale image checking

* Add the flushing denormal values option on BigDL side (#2934)

* add no argument apply api for softmax (intel-analytics#2945)

* add no argument apply api for softmax

* add no argument apply api for softmax

* ONNX ResNet example (intel-analytics#2939)

* add onnx resnet example

* add doc for onnx

* add doc for onnx

* clean up

* add maskrcnn inference example (intel-analytics#2944)

* add maskrcnn inference example

* meet pr comments

* add model download url

* Update the RoiLabel and MTImageFeatureToBatch (intel-analytics#2925)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* Python MKLDNN examples for CNN(LeNet) and RNN(LSTM) (#2932)

* fix: takeSample only works for dnn backend and get one batch (intel-analytics#2947)

* fix: takeSample only works for dnn backend and get one batch

* edit doc (#2948)

* Rename filesToRoiImageFrame to filesToRoiImageFeatures (intel-analytics#2949)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* filesToRoiImageFrame -> filesToRoiImageFeatures, to public

* fix: move out setMklThreads of MklDnn (intel-analytics#2950)

* memory data cleanup (#2956)

* memory data cleanup

* Onnx support: RoiAlign and TopK parameter update (#2957)

* Topk add dim and increase parameter

* RoiAlign add max pooling mode

* add test cases

* add test cases

* remove masks requirements (intel-analytics#2959)

* fix: the squeeze should not be included in IRElement (intel-analytics#2962)

* enhance COCODataset (#2954)

* enhance COCODataset:
Add COCODataset.loadFromSeqFile
Add COCODataset.toImageFeatures
Add COCOImage.toTable

* rename and polish doc

* fix COCO serialize bug

* fix typo in function name

* typo fix (intel-analytics#2965)

* rename RoiImageFeatureToBatch APIs (#2964)

* RoiMiniBatch enhancement (#2953)

* SerializableIndexedSeq

* allow empty target & image size info

* rename RoiImageFeatureToBatch APIs

* set as private

* change back to array

* MTImageFeatureToBatch without labels

* handle iscrowd

* remove duplication in merge

* feat: add softmax backward (intel-analytics#2967)

* feat: add softmax backward

* fix: fuse bn scale and relu to bn. (intel-analytics#2966)

* fix: fuse bn scale and relu.

* fix mask unit tests (intel-analytics#2973)

* fix: nms stability when using treeset. (intel-analytics#2972)

* flip version to 0.11 (intel-analytics#2974)

* refactor anchor generator (#2963)

* refactor anchor generator

* meet pr comments

* fix code style

* ROIAlign refactor (intel-analytics#2960)

* ROIAlign refactor

* fix unit tests

* fix model load of maskrcnn (intel-analytics#2961)

* fix maskrcnn model load

* delete temp file

* fix maskrcnn tests

* support roialign backward (intel-analytics#2975)

* support roialign backward

* fix sparselinear unit test

* fix: bn nhwc error, the channel should be the last dim (#2981)

* refactor: move torch relevants unit tests to integration tests. (intel-analytics#2971)

* fix: enable integration accuracy tests (intel-analytics#2976)

* fix: softmax dnn backend wrong order of primitive (intel-analytics#2986)

* modify TextClassifier.scala (#2987)

* Add a method to merge nested StaticGraphs (intel-analytics#2985)

* NHWC support when running with MKL-DNN (#2989)

* support NHWC for MKLDNN

* fix unit tests

* Keras with MKL-DNN backend support (#2990)

* Update README.md

* Update README.md

* feat: add distri optimizer v2 (intel-analytics#2992)

* update error message in AllReduceParameter (#2997)

* update error message in AllReduceParameter

* use tensorflow proto jar (#2994)

* fix callBigDLFunc (intel-analytics#3002)

* Remove final for AbstractModule (intel-analytics#3001)

* DistriOptimizerV2 argument (intel-analytics#3003)

* call DistriOptimizerV2

* fix inception (intel-analytics#3010)

* fix top1 and treenn (intel-analytics#3011)

* remove final setExtraParameters (#3014)

* move pretrain in DistriOptimizerV2 (intel-analytics#3016)

* move getData

* rename

* remove time counting

* deprecate dlframe (intel-analytics#3012)

* deprecate dlframe

* fix throughput (#3017)

* fix throughput

* update

* add release doc for 0.10.0 (intel-analytics#3020)

* test examples by distrioptimizerv2 (intel-analytics#3007)

* enable scala examples by distrioptimizerv2

* update example's readme

* update integration test

* test python examples by distriOptimizerV2 (intel-analytics#3008)

* Test python examples by distriOptimizerV2

* deprecate nn.keras (intel-analytics#3013)

* deprecate nn.keras

* fix loss when minibatch size is different (intel-analytics#3021)

* fix loss

* fix ut

* fix style check (intel-analytics#3022)

* specify pyspark version (intel-analytics#3030)

* specify pyspark version

* add release doc for 0.11 (#3026)

* flip version to 0.12 (intel-analytics#3029)



* update

* fix KerasLayer new parameters() (#3034)

* Fix analytics zoo protobuf shading problem (intel-analytics#3033)

* change shade name and remove protobuf-java (already introduced by tf)

* remove protobuf

* add required dependencies (#3047)

* update doc (intel-analytics#3056)

* Updatedoc (#3060)

* Update install-from-pip.md

* [WIP] spark 3.0 (intel-analytics#3054)

* spark 3.0

* add spark3.0 deployment (intel-analytics#3061)

* add spark3.0 deployment

* add warning to remind Optimizer() deprecates (intel-analytics#3062)

* add warning to remind deprecates

* Update scala maven plugin (#3068)

* update scala maven plugin

* change to public (#3064)

* Add big model support (#3067)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* squeeze target dimension (corner case) in ClassNLLCriterion (intel-analytics#3072)

* fix target dimension match error

* update message (#3073)

* flip version to 0.13-snapshot (intel-analytics#3074)

* flip version to 0.13-snapshot

* Uncompressed Tensor  (intel-analytics#3079)

* support no compressing parameter

* address comments

* hotfix ClassNLLCriterion with cloned target (#3081)

* hotfix ClassNLLCriterion with cloned target

* Fix SerializationUtils clone issue of QuantizedTensor (intel-analytics#3088)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* update clone quantizedtensor

* update

* add OptimPredictorShutdownSpec UT in integration test (#3089)

* move integration UT to a general test script (intel-analytics#3094)

* back port master (intel-analytics#3096)

* set seed to avoid random error in PredictionServiceUT (intel-analytics#3097)

* Jdk11 support (intel-analytics#3098)

* update for jdk 11 support and doc

* add serializeUid (intel-analytics#3099)

* update doc (intel-analytics#3104)

* add doc for running in ide (intel-analytics#3106)

* fix callBigDLFunc return a Int while the true return value from java is a byte array. (intel-analytics#3111)

* add list of df support (intel-analytics#3113)

* Update readme (intel-analytics#3118)

* Update index.md

* add 0.12.2 release download (#3122)

* remove DLFrames (intel-analytics#3124)

* remove DLFrames

* update

* update

* update

* rm dlframe example from test script

* Add Utest about dividing zero (#3128)

* Add Utest about dividing zero

* add Utest and zero check of LocalData

* add Utest and zero check of LocalData

* change

* Add Utest about dividing zero

* fix test

* add python3 to Dockerfile (intel-analytics#3132)

* add python3 to Dockerfile

* update

* update jdk

* update

* make default DistriOptimizer as V2 (intel-analytics#3129)

* make default DistriOptimizer as V2

* update

* fix dlframe (intel-analytics#3133)

* DistriOptimizerV2 logger (intel-analytics#3135)

* DistriOptimizerV2 logger

* update

* fix style check

* validate epoch num

* move dlframe SharedParamsApater to AZ and roll back to OptimizerV1 (intel-analytics#3137)

* upgrade spark version (intel-analytics#3138)

* Update deploy-spark2.sh

* 0.13 release doc (#3144)

* upgrade log4j (intel-analytics#3141)

* flip0.14 (intel-analytics#3142)

* flip0.14

* update

* Update deploy-spark3.sh (#3145)

* update

* update

* update

* update

* fix make dist

* migrate path

* update

* update

Co-authored-by: zhangxiaoli73 <380761639@qq.com>
Co-authored-by: Jerry Wu <wzhongyuan@gmail.com>
Co-authored-by: Xin Qiu <qiuxin2012@users.noreply.github.com>
Co-authored-by: Yanzhang Wang <i8run15@gmail.com>
Co-authored-by: GenBrg <34305977+GenBrg@users.noreply.github.com>
Co-authored-by: LeicongLi <leicongli@gmail.com>
Co-authored-by: Emiliano Martinez <emimartinez.sanchez@gmail.com>
Co-authored-by: abdolence <abdulla.abd.m@gmail.com>
Co-authored-by: Enrique Garcia <engapa@gmail.com>
Co-authored-by: Louie Tsai <louie.tsai@intel.com>
Co-authored-by: yaochi <yaochitc@gmail.com>
Co-authored-by: Menooker <Menooker@users.noreply.github.com>
Co-authored-by: Menooker <myjisgreat@live.cn>
Co-authored-by: Firecrackerxox <mengceng.he@intel.com>
Co-authored-by: majing921201 <1834475657@qq.com>
Co-authored-by: jenniew <jenniewang123@gmail.com>
Co-authored-by: Xiao <lingxiao1989@gmail.com>
Co-authored-by: Firecrackerxox <he044646@sina.com>
Co-authored-by: Hui Li <lihuibinghan@sina.com>
Co-authored-by: Jason Dai <jason.dai@intel.com>
Co-authored-by: dding3 <ding.ding@intel.com>
Co-authored-by: Yang Wang <yang3.wang@intel.com>
Co-authored-by: Yina Chen <33650826+cyita@users.noreply.github.com>
Co-authored-by: Hangrui Cao <50705298+DiegoCao@users.noreply.github.com>
Co-authored-by: pinggao18 <44043817+pinggao18@users.noreply.github.com>
Le-Zheng added a commit to Le-Zheng/analytics-zoo that referenced this pull request Sep 7, 2021
* convert static graph to IR graph and build (intel-analytics#2711)

* add static graph to IR graph

* meet pr comments

* [Enhancement] - Enhance unig test to avoid dynamic resource allocation issue by docker (intel-analytics#2713)

* make the core number fixed

* fix local predictor

* add Trigger and/or python API (intel-analytics#2682)

* add spark 2.4 support (intel-analytics#2715)

* update sparse tensor's document (#2714)

* Reserve all state in OptimMethod when calling Optimizer.optimize() multiple times (#2648)

* reserve optimMethod for each worker

* add valdiation throughput

* cache variable previousOptim

* fix: move mkldnn computing to a single thread pool (intel-analytics#2724)

Because if we use the parent thread directly, there will be two bugs,
1. The child threads forked from parent thread will be bound to core 0
because of the affinity settings.
2. The native thread has some unknown thread local variables. So if
the parent thread exits and is recreated, such as the thread from
Executors.newFixedThreadPool. The whole app will be segment fault.
The parent thread means the main thread (Local Mode) or worker thread of
mapPartition (Distributed Mode).

* add ceilMode for Pooling & fix batchNorm evaluate (#2708)

* add ceilMode for Pooling & fix batchNorm evaluate

* add training status for dnn layer

* fix comments

* fix IRGraph init & Add regualizer (#2736)

* fix IRGraph init & Add regualizer

* meet review comments

* fix: update mkldnn version to v0.17 issues. (intel-analytics#2712)

There're two issues,

1. the padding tensor required. mkl-dnn will use a padding tensor which
    will use more memory, such as 4x1x28x28 to 4x8x28x28(avx2). It will
    pad to times of simd width.
2. the TensorMMap between DenseTensor and DnnTensor. Previous impl
    will allocate DnnTensor when model is created, which will cost too much
    space. So this patch will allocate it at runtime.

* add computshape for some layers and add skip primitives in DnnGraph (intel-analytics#2740)

* add computshape for some layer and add skip primitives in DnnGraph

* meet pr comments

* Improve documentation (intel-analytics#2745)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* include edge case to cover all the data types (#2742)

* layer auto fusion for dnn graph (intel-analytics#2746)

* add auto fusion in dnn graph

* refactor predict for dnn model (intel-analytics#2737)

* refactor predict for dnn model

* remove some unit tests (intel-analytics#2752)

* remove some conflict tests (#2753)

* Update documentation (intel-analytics#2749)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Fix Add operation error when type is Double importing Tensorflow graph (#2721)

* feature: add byte supports for DnnTensor (intel-analytics#2751)

* feat: add byte supports for DnnTensor

* [New Feature] Calculating Scales (#2750)

* [New Feature]Calculating Scales

* recursively update mask for container module (intel-analytics#2754)

* recursively update mask for container module

* [Enhancement] - Speed up BlasWrapper performance under MKL-DNN (intel-analytics#2748)

* add parallel in Blaswrapper

* refactor to support ssd

* meet pr comments

* fix logger serialize

* Loss Function docs improvement (intel-analytics#2757)

* Improve Loss Function docs v2

* change asInstanceOf to toDistirbuted in optimizer (#2755)

* change asInstanceOf to toDistirbuted

* change asInstanceOf to toDistirbuted

* convert scale in blas to dnn (#2758)

* convert scale in blas to dnn

* meet pr comment

* feat: reorder for int8 supports (#2756)

1. Because the new data type, we should add a new attribute called dataType
    to the `MemoryData`.
2. Because we should transfer the scales between FP32->int8 and Int8->FP32.
    we should add two new attributes called `mask` and `scales`.

* fix conversion accuracy (intel-analytics#2760)

*  fix accuracy for saved model

* exclude mkldnn model when conversion

* feature: layer wise supports of int8 (intel-analytics#2762)

Enable the int8 data type in layers, especially for convolutions.
So for a specific layer, it can accept a int8 input. If you want to the fp32
output, should add a reorder.

* feature: mkldnn int8 layer wise supports (intel-analytics#2759)

including 3 steps.

1. generate scales of model.
   need an api like `generateScalesWithMask` to generate the scales of
   fp32 model. and the model returned is an fp32 model too.
2. quantize the model
   the `quantize()` api will be compatible with the `bigquant`
   backend, which will set the quantize flag. And when doing compile,
   the quantized weight, output, input will be generated by mkldnn at
   runtime.
3. do the inference (forward).

* update readme for v1 training (intel-analytics#2763)

* update release doc for preparation (intel-analytics#2764)

* change some docs about mkldnn (intel-analytics#2765)

* add comments about mkldnn

* meet pr comments

* examples for int8 (intel-analytics#2761)

This is an example of how to use mkldnn int8. There're two steps, use
GenInt8Scales to generate the scales first and save the new model. And than you
can use the quantized model as usual.

* enable fustion by default (intel-analytics#2766)

* fix: the influence of default value of fusion (#2768)

* fix: use too much memory of mkldnn models (intel-analytics#2783)

* fix: inplace of input/output and weight dimension error (intel-analytics#2779)

Some layer's input and output use the same memory. We can't do forward in the
`calcScales`. Because at that time, the input has been changed, its scales maybe
not right. Such as,

Seqeuntail().add(Conv).add(ReLU)

it will do two steps, seq.forward(input) first. and when go into the ReLU, it
will do another forward, so the input will be the output. And scales will be
wrong.

For convolution's weight, the dimension always is 5, although the group number
is 1. But for dnn convolution, if there's no group, the weight's dimension
should be 4.

* fix: the blas wrapper has no scales (intel-analytics#2778)

* fix softmax (intel-analytics#2777)

* fix: performance regression on resnet50 (intel-analytics#2774)

the u8 to s8 or s8 to u8 needs no reorder on this case.

* fix log init (#2781)

* fix: dropout should init primitive (#2789)

* Docs update for spark 2.3, build 0.7 and deps exlude (intel-analytics#2671)

* flip to 0.9.0 (intel-analytics#2792)

* Improve Layer documentation v1 (#2767)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Loss Function docs improvement v1

* Improve Loss Function docs v2

* Improve Layers documentation

* Improve documentation on Activations

* minor fix

* Update a code section with python style on Metrics.md (intel-analytics#2665)

* [Fix] doc : some changes for scalaUserGuide and release links according to … (intel-analytics#2791)

* doc : some changes for scalaUserGuide and release links according to v0.8.0 release

* Update build-bigdl-core.md

* Update build-bigdl-core.md

* test: should compare the right grad input (intel-analytics#2794)

* fix the wrong error message (#2800)

* [New feature] Add attention layer and ffn layer (intel-analytics#2795)

* add attention layer

* add ffn layer and more unit tests

* refactor according to pr comments

* add SerializationTest

* fix unit tests

* add python api

* update readme with newly adopted mkl-dnn (#2803)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer (intel-analytics#2802)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer:
Add LARS optimizer: Layer-wise scaled. Also with utility functions to build a set of LARS optim for a container.

Bug fix: The gradient block id of AllReduceParameter is originally composed of {id}{pidTo}gradientBytes{pidFrom}. But the combination of {id}{pidTo} will cause ambiguity. e.g., "112" can be {1}{12} or {11}{2}. Now a "_" is added to separate id from pidTo

* refine documents, correctly set the lrSchedulerOwner bit

* format the added code

* make Lars inherit SGD

* rename Lars -> LarsSGD and reformat

* style changes

* bugfix - set mask for container (intel-analytics#2807)

* bugfix - set mask for container

* bugfix #2805: set dimension mask

* Update Graph.scala

* Update Graph.scala

* change set mask indicator's name

* rename set mask params

* [Enhancement]: Scala Reflection: get default value for constructor parameters (intel-analytics#2808)

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* resolve conflict

* resolve conflict

* code style check

* remove print

* fix typos

fix typos

* replace randomcropper with centercrop for better performance (#2818)

* fix: memory data hash code should contain data type (intel-analytics#2821)

* Optimize backward graph generation and CAddTable (intel-analytics#2817)

* Optimize backward graph generation and caddtable

* refine add table

* change api name

* add layer norm and expand size layers (#2819)

* add layer norm and expand size

* meet pr comments

* feat: enable global average pooling (intel-analytics#2823)

* feat: enable global average pooling

* test: add more unit tests

* Optimizers: use member variable in parent class

* Revert "Optimizers: use member variable in parent class"

This reverts commit 7e47204

* Dilation in MKL-DNN Convolution (intel-analytics#2815)

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* fix typos

fix typos

* make todo all uppercase

* fix: calculate arbitrary mask of scales (intel-analytics#2822)

* Use one AllReduceParameter for multi-optim method  training (intel-analytics#2814)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* change random seed in UT

* [New feature] add transformer layer (intel-analytics#2825)

* add transformer

* refactor class name

* use same embedding for translation

* fix pr comments

* [Bug Fix] Fix Issue 2734 (#2816)

* fix issue 2734

* fix issue 2734

* fix issue 2734

* [Refactor] Reflection Utilization (#2831)

* refactor reflection utils

* refactor reflection utils

* feat: MKLDNN LSTM unidirectional/bidirectional inference support (intel-analytics#2806)

* LSTM draft

* MKLDNN LSTM fixed MD

* added hiddenSize

* setMemoryData NativeData

* weights NativeData format set to ldigo, all 1 test passed

* fixed format any problem

* LSTM weights bias initialisation

* add LSTM2 in nn

* Bidirectional LSTM inference enabled

* modified Bidirectional test

* LSTMSpec input format conversion bug between bigdl and mkldnn fixed, not support random weights, bias

* fixed the last problem 1 3 2 4

* Three inference tests with randomly generated parameters

* Added comments and modified the LSTMSpec (tests using Equivalent.nearequals)

* Deleted nn/LSTM2. Renamed methods. Added a requirement in nn/TimeDistributed

* combined initMemoryDescs() into initFwdPrimitives()

* Add require for input size and hidden size matching if layers of LSTM is more than one

* Refactor RNN

* Add comment on gate order to mkldnn/RNN

* Add unidirectional multilayer test

* add comments/ modify UTs

* phase is not used anymore/ use isTraining() in stead

* operationWant enhanced/ weight init/ release() parameters()

* remove input format check and change some variables names

* input format check / throw exception print info / release code

* comment style and RNNSerialTest

* remove unnecessary comments

* Softmax -> SoftMax (#2837)

* bug fix for cmul (intel-analytics#2836)

* bug fix for cmul

* meet pr comments

* set new storage to weight and bias for weight fusion (intel-analytics#2839)

* Add parameter processor for LARS (#2832)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* implement lars whole layer gradient norm calculation

* change random seed in UT

* add limitation on "trust" of LARS, remove debug output

* reformat

* add tests in DirtriOptimizer for LARS

* reformat

* update parameters in UT

* update parameters in UT

* Add transformer to LM example (intel-analytics#2835)

* add transformer to LM example

* refactor dropout in Transformer

* meet pr comments

* feat: MKLDNN LSTM unidirectional/bidirectional backward support (#2840)

* MKLDNN LSTM backward support with accuracy testing

* fix: require consistent between shape and layout of mkldnn (intel-analytics#2824)

* fix: fusion for multi-group of convolution (intel-analytics#2826)

* fix: support int8 of jointable (#2827)

* fix: support int8 of jointable
* doc: add more docs

* fix: invokeAndWait2 should throw the exception in the tasks (intel-analytics#2843)

* fix acc bug & init dnn thread (intel-analytics#2841)

* support tnc and ntc conversion (intel-analytics#2844)

* support ntc in dnn layer (intel-analytics#2847)

* support ntc in dnn layer

* meet pr comments

* [WIP]Add beam search feature in transformer model (intel-analytics#2834)

* add beam search feature

* Update beam search feature and unit test

* add symbolToLogits function set check

* update clearState and add serial test

* add SequenceBeamSearch to python layers

* add createSequenceBeamSearch method to python api

* feat: add a property to disable omp thread affinity (intel-analytics#2849)

* fix: use treeset to calc topk to upgrade the performance of DetectionOutputSSD (intel-analytics#2853)

* fix: wrong affinity settings (intel-analytics#2857)

* update beam search feature for interface with transformer model (#2855)

* update beam search for padding value and cache structure

* update python API for beam search

* add comments and update python layer

* modify comments format

* modify comments format

* Support converting blas lstm to dnn lstm (#2846)

* convert from blas lstm to dnn lstm

* meet pr comments

* fix load lstm error bug (intel-analytics#2858)

* Add beam search in transformer (intel-analytics#2856)

* Add beam search in transformer

* meet pr comments

* fix: upgrade the performance of normalize (intel-analytics#2854)

* feat: add axis to softmax (intel-analytics#2859)

* add release doc for 0.9 (intel-analytics#2862)

* fix: update core ref to master (intel-analytics#2865)

* flip version to 0.10.0 (intel-analytics#2869)

* [Bug Fix] - Fix module version comparison  (intel-analytics#2871)

* update serialization

* update serialization

* convert IRgraph momentum to mkldnn (intel-analytics#2872)

* tutorial fix (intel-analytics#2879)

* feat: RoiAlign Forward (intel-analytics#2874)

* Add set input output format API in Python (intel-analytics#2880)

* add set input output format

* add static graph check

* feat: Feature Pyramid Networks Forward (intel-analytics#2870)

* fix memory leak for ir graph training (intel-analytics#2895)

* add gemm layer (#2882)

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add gemm layer

* add gemm layer

* add Shape layer (intel-analytics#2885)

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add Gather layer (intel-analytics#2897)

* add gather layer

* [New feature] Add maskhead (intel-analytics#2892)

* support for maskhead

* fix unit tests (intel-analytics#2905)

* modify  predict/predictClass function  (#2868)

* predictClass output modification

* predict/predictClass function modification in Beta Api

* predict/predictClass function modification

* predict/predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* [New feature] Add Boxhead (intel-analytics#2894)

* add boxhead

* add SerialTest

* meet pr comments

* fix: Add TopBlocks to Feature Pyramid Networks (FPN) (#2899)

* Add Mean Average Precision validation method (intel-analytics#2906)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* fix boxhead unit tests (#2912)

* python api nested list input and pooler python api (intel-analytics#2900)

* Auto memory management for MKLDNN (#2867)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* style fixes

* change _implicitMemoryOwner -> _this

* [New feature] Add region proposal (intel-analytics#2896)

* add Regionproposal

* [New feature] add maskrcnn (#2908)

* add maskrcnn

* fix mask head

* move maskrcnn to models

* add maskrcnn serialTest

* Add Onnx Supported Layers (intel-analytics#2902)

* remove duplicated layers

* Update RoiLabel class and add RoiImageFeatureToBatch (intel-analytics#2913)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* update RoiLabel, add RoiImageFeatureToBatch

* fix typo in class name

* updates by suggestions

* minor updates

* Move RoiMiniBatch to MTImageFeatureToBatch.scala

* mask in RoiLabel now have Floats not Bytes

* use IndexedSeq for RoiLabel

* style fix

* add isCrowd and origSize to final target table

* style fix

* isCrowd change to float, add doc

* add tests and bug fixes

* add util getting RoiLabels from ImageFeatures

* add util getting RoiLabels from Table

* comment out the tests

* rename utils in RoiLabel

* feat: MKLDNN GRU forward/backward support (#2893)

* Onnx support: modify unsqueeze function (#2910)

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* add maskutils (intel-analytics#2921)

* add maskutils

* update tests & docs

* fix typo in document

* Fix memory leaks on training (intel-analytics#2914)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* fix memory leak in batch norm

* style fixes

* change _implicitMemoryOwner -> _this

* release submat

* release opencv submats

* support samples with different size  to one mini batch (intel-analytics#2929)

* add to batch with resize

* meet comments

* support batch for mask head and pooler (intel-analytics#2926)

* support batch for mask head

* meet comments

* Onnx support: add a dim parameter to ops.Gather (intel-analytics#2920)

* add dim parameter to ops.Gather

* improve and simplify code

* improve and simplify code

* improve and simplify code

* improve and simplify code

* support batch for regionproposal (#2928)

* support batch for regionproposal

* enable gru blas-to-dnn conversion (intel-analytics#2930)

* Onnx support: add pos parameter to softmax (intel-analytics#2933)

* add pos parameter to softmax

* add pos parameter to softmax

* add pos parameter to softmax

* fix review problem

* fix review problem

* Add resize for segmentation (intel-analytics#2923)

* add resize for segmentation

* meet pr comments

* support batch input for boxhead (#2924)

* boxhead support batch input

* meet pr comments

* COCO SeqFile (intel-analytics#2927)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* ignore non-existing images

* updates based on GH comments

* ONNX Support (#2918)

* onnx dev

* add onnx loader

* clean up

* feat: add precision recall auc (#2941)

* feat: add precision recall auc

* add post processing for maskrcnn model (#2931)

* add mask postprocessing

* put image info to mask model

* fix TimeDistributedCriterion() lack of parameter of dimension issue (intel-analytics#2940)

* revert back api (intel-analytics#2943)

* fix: softmax and bn+scale fusion (intel-analytics#2937)

* feat: multi models support with MKL-DNN backend (intel-analytics#2936)

* feat: multi models support with MKL-DNN backend

* add COCO MAP (#2935)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* add COCO MAP

* revert merge conflict

* ignore non-existing images

* add IOU related API. MAP now parses RLEs

* BBox now inclusive

* updates based on GH comments

* add COCODataset.getImageById

* COCO topK default => -1, remove height: Int, width: Int in GroundTruthRLE

* update imageId2Image

* rename MAPObjectDetection utils, add cocoSegmentationAndBBox, refine formatting

* rename utils

* update documents

* check size of bbox & classes & scores & labels & iscrowd. Handle empty predictions

* add gt and target image size checking, add support for empty target bbox, add UT

* detection sorted before matching with GT. Optimize MAPResult merging. Add UT for merging

* COCO Seq file reader: grey to bgr (intel-analytics#2942)

* grey to bgr

* refactor isGrayScaleImage

* simplify grey scale image checking

* Add the flushing denormal values option on BigDL side (#2934)

* add no argument apply api for softmax (intel-analytics#2945)

* add no argument apply api for softmax

* add no argument apply api for softmax

* ONNX ResNet example (intel-analytics#2939)

* add onnx resnet example

* add doc for onnx

* add doc for onnx

* clean up

* add maskrcnn inference example (intel-analytics#2944)

* add maskrcnn inference example

* meet pr comments

* add model download url

* Update the RoiLabel and MTImageFeatureToBatch (intel-analytics#2925)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* Python MKLDNN examples for CNN(LeNet) and RNN(LSTM) (#2932)

* fix: takeSample only works for dnn backend and get one batch (intel-analytics#2947)

* fix: takeSample only works for dnn backend and get one batch

* edit doc (#2948)

* Rename filesToRoiImageFrame to filesToRoiImageFeatures (intel-analytics#2949)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* filesToRoiImageFrame -> filesToRoiImageFeatures, to public

* fix: move out setMklThreads of MklDnn (intel-analytics#2950)

* memory data cleanup (#2956)

* memory data cleanup

* Onnx support: RoiAlign and TopK parameter update (#2957)

* Topk add dim and increase parameter

* RoiAlign add max pooling mode

* add test cases

* add test cases

* remove masks requirements (intel-analytics#2959)

* fix: the squeeze should not be included in IRElement (intel-analytics#2962)

* enhance COCODataset (#2954)

* enhance COCODataset:
Add COCODataset.loadFromSeqFile
Add COCODataset.toImageFeatures
Add COCOImage.toTable

* rename and polish doc

* fix COCO serialize bug

* fix typo in function name

* typo fix (intel-analytics#2965)

* rename RoiImageFeatureToBatch APIs (#2964)

* RoiMiniBatch enhancement (#2953)

* SerializableIndexedSeq

* allow empty target & image size info

* rename RoiImageFeatureToBatch APIs

* set as private

* change back to array

* MTImageFeatureToBatch without labels

* handle iscrowd

* remove duplication in merge

* feat: add softmax backward (intel-analytics#2967)

* feat: add softmax backward

* fix: fuse bn scale and relu to bn. (intel-analytics#2966)

* fix: fuse bn scale and relu.

* fix mask unit tests (intel-analytics#2973)

* fix: nms stability when using treeset. (intel-analytics#2972)

* flip version to 0.11 (intel-analytics#2974)

* refactor anchor generator (#2963)

* refactor anchor generator

* meet pr comments

* fix code style

* ROIAlign refactor (intel-analytics#2960)

* ROIAlign refactor

* fix unit tests

* fix model load of maskrcnn (intel-analytics#2961)

* fix maskrcnn model load

* delete temp file

* fix maskrcnn tests

* support roialign backward (intel-analytics#2975)

* support roialign backward

* fix sparselinear unit test

* fix: bn nhwc error, the channel should be the last dim (#2981)

* refactor: move torch relevants unit tests to integration tests. (intel-analytics#2971)

* fix: enable integration accuracy tests (intel-analytics#2976)

* fix: softmax dnn backend wrong order of primitive (intel-analytics#2986)

* modify TextClassifier.scala (#2987)

* Add a method to merge nested StaticGraphs (intel-analytics#2985)

* NHWC support when running with MKL-DNN (#2989)

* support NHWC for MKLDNN

* fix unit tests

* Keras with MKL-DNN backend support (#2990)

* Update README.md

* Update README.md

* feat: add distri optimizer v2 (intel-analytics#2992)

* update error message in AllReduceParameter (#2997)

* update error message in AllReduceParameter

* use tensorflow proto jar (#2994)

* fix callBigDLFunc (intel-analytics#3002)

* Remove final for AbstractModule (intel-analytics#3001)

* DistriOptimizerV2 argument (intel-analytics#3003)

* call DistriOptimizerV2

* fix inception (intel-analytics#3010)

* fix top1 and treenn (intel-analytics#3011)

* remove final setExtraParameters (#3014)

* move pretrain in DistriOptimizerV2 (intel-analytics#3016)

* move getData

* rename

* remove time counting

* deprecate dlframe (intel-analytics#3012)

* deprecate dlframe

* fix throughput (#3017)

* fix throughput

* update

* add release doc for 0.10.0 (intel-analytics#3020)

* test examples by distrioptimizerv2 (intel-analytics#3007)

* enable scala examples by distrioptimizerv2

* update example's readme

* update integration test

* test python examples by distriOptimizerV2 (intel-analytics#3008)

* Test python examples by distriOptimizerV2

* deprecate nn.keras (intel-analytics#3013)

* deprecate nn.keras

* fix loss when minibatch size is different (intel-analytics#3021)

* fix loss

* fix ut

* fix style check (intel-analytics#3022)

* specify pyspark version (intel-analytics#3030)

* specify pyspark version

* add release doc for 0.11 (#3026)

* flip version to 0.12 (intel-analytics#3029)



* update

* fix KerasLayer new parameters() (#3034)

* Fix analytics zoo protobuf shading problem (intel-analytics#3033)

* change shade name and remove protobuf-java (already introduced by tf)

* remove protobuf

* add required dependencies (#3047)

* update doc (intel-analytics#3056)

* Updatedoc (#3060)

* Update install-from-pip.md

* [WIP] spark 3.0 (intel-analytics#3054)

* spark 3.0

* add spark3.0 deployment (intel-analytics#3061)

* add spark3.0 deployment

* add warning to remind Optimizer() deprecates (intel-analytics#3062)

* add warning to remind deprecates

* Update scala maven plugin (#3068)

* update scala maven plugin

* change to public (#3064)

* Add big model support (#3067)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* squeeze target dimension (corner case) in ClassNLLCriterion (intel-analytics#3072)

* fix target dimension match error

* update message (#3073)

* flip version to 0.13-snapshot (intel-analytics#3074)

* flip version to 0.13-snapshot

* Uncompressed Tensor  (intel-analytics#3079)

* support no compressing parameter

* address comments

* hotfix ClassNLLCriterion with cloned target (#3081)

* hotfix ClassNLLCriterion with cloned target

* Fix SerializationUtils clone issue of QuantizedTensor (intel-analytics#3088)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* update clone quantizedtensor

* update

* add OptimPredictorShutdownSpec UT in integration test (#3089)

* move integration UT to a general test script (intel-analytics#3094)

* back port master (intel-analytics#3096)

* set seed to avoid random error in PredictionServiceUT (intel-analytics#3097)

* Jdk11 support (intel-analytics#3098)

* update for jdk 11 support and doc

* add serializeUid (intel-analytics#3099)

* update doc (intel-analytics#3104)

* add doc for running in ide (intel-analytics#3106)

* fix callBigDLFunc return a Int while the true return value from java is a byte array. (intel-analytics#3111)

* add list of df support (intel-analytics#3113)

* Update readme (intel-analytics#3118)

* Update index.md

* add 0.12.2 release download (#3122)

* remove DLFrames (intel-analytics#3124)

* remove DLFrames

* update

* update

* update

* rm dlframe example from test script

* Add Utest about dividing zero (#3128)

* Add Utest about dividing zero

* add Utest and zero check of LocalData

* add Utest and zero check of LocalData

* change

* Add Utest about dividing zero

* fix test

* add python3 to Dockerfile (intel-analytics#3132)

* add python3 to Dockerfile

* update

* update jdk

* update

* make default DistriOptimizer as V2 (intel-analytics#3129)

* make default DistriOptimizer as V2

* update

* fix dlframe (intel-analytics#3133)

* DistriOptimizerV2 logger (intel-analytics#3135)

* DistriOptimizerV2 logger

* update

* fix style check

* validate epoch num

* move dlframe SharedParamsApater to AZ and roll back to OptimizerV1 (intel-analytics#3137)

* upgrade spark version (intel-analytics#3138)

* Update deploy-spark2.sh

* 0.13 release doc (#3144)

* upgrade log4j (intel-analytics#3141)

* flip0.14 (intel-analytics#3142)

* flip0.14

* update

* Update deploy-spark3.sh (#3145)

* update

* update

* update

* update

* fix make dist

* migrate path

* update

* update

Co-authored-by: zhangxiaoli73 <380761639@qq.com>
Co-authored-by: Jerry Wu <wzhongyuan@gmail.com>
Co-authored-by: Xin Qiu <qiuxin2012@users.noreply.github.com>
Co-authored-by: Yanzhang Wang <i8run15@gmail.com>
Co-authored-by: GenBrg <34305977+GenBrg@users.noreply.github.com>
Co-authored-by: LeicongLi <leicongli@gmail.com>
Co-authored-by: Emiliano Martinez <emimartinez.sanchez@gmail.com>
Co-authored-by: abdolence <abdulla.abd.m@gmail.com>
Co-authored-by: Enrique Garcia <engapa@gmail.com>
Co-authored-by: Louie Tsai <louie.tsai@intel.com>
Co-authored-by: yaochi <yaochitc@gmail.com>
Co-authored-by: Menooker <Menooker@users.noreply.github.com>
Co-authored-by: Menooker <myjisgreat@live.cn>
Co-authored-by: Firecrackerxox <mengceng.he@intel.com>
Co-authored-by: majing921201 <1834475657@qq.com>
Co-authored-by: jenniew <jenniewang123@gmail.com>
Co-authored-by: Xiao <lingxiao1989@gmail.com>
Co-authored-by: Firecrackerxox <he044646@sina.com>
Co-authored-by: Hui Li <lihuibinghan@sina.com>
Co-authored-by: Jason Dai <jason.dai@intel.com>
Co-authored-by: dding3 <ding.ding@intel.com>
Co-authored-by: Yang Wang <yang3.wang@intel.com>
Co-authored-by: Yina Chen <33650826+cyita@users.noreply.github.com>
Co-authored-by: Hangrui Cao <50705298+DiegoCao@users.noreply.github.com>
Co-authored-by: pinggao18 <44043817+pinggao18@users.noreply.github.com>
Le-Zheng added a commit to Le-Zheng/analytics-zoo that referenced this pull request Sep 7, 2021
* convert static graph to IR graph and build (intel-analytics#2711)

* add static graph to IR graph

* meet pr comments

* [Enhancement] - Enhance unig test to avoid dynamic resource allocation issue by docker (intel-analytics#2713)

* make the core number fixed

* fix local predictor

* add Trigger and/or python API (intel-analytics#2682)

* add spark 2.4 support (intel-analytics#2715)

* update sparse tensor's document (#2714)

* Reserve all state in OptimMethod when calling Optimizer.optimize() multiple times (#2648)

* reserve optimMethod for each worker

* add valdiation throughput

* cache variable previousOptim

* fix: move mkldnn computing to a single thread pool (intel-analytics#2724)

Because if we use the parent thread directly, there will be two bugs,
1. The child threads forked from parent thread will be bound to core 0
because of the affinity settings.
2. The native thread has some unknown thread local variables. So if
the parent thread exits and is recreated, such as the thread from
Executors.newFixedThreadPool. The whole app will be segment fault.
The parent thread means the main thread (Local Mode) or worker thread of
mapPartition (Distributed Mode).

* add ceilMode for Pooling & fix batchNorm evaluate (#2708)

* add ceilMode for Pooling & fix batchNorm evaluate

* add training status for dnn layer

* fix comments

* fix IRGraph init & Add regualizer (#2736)

* fix IRGraph init & Add regualizer

* meet review comments

* fix: update mkldnn version to v0.17 issues. (intel-analytics#2712)

There're two issues,

1. the padding tensor required. mkl-dnn will use a padding tensor which
    will use more memory, such as 4x1x28x28 to 4x8x28x28(avx2). It will
    pad to times of simd width.
2. the TensorMMap between DenseTensor and DnnTensor. Previous impl
    will allocate DnnTensor when model is created, which will cost too much
    space. So this patch will allocate it at runtime.

* add computshape for some layers and add skip primitives in DnnGraph (intel-analytics#2740)

* add computshape for some layer and add skip primitives in DnnGraph

* meet pr comments

* Improve documentation (intel-analytics#2745)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* include edge case to cover all the data types (#2742)

* layer auto fusion for dnn graph (intel-analytics#2746)

* add auto fusion in dnn graph

* refactor predict for dnn model (intel-analytics#2737)

* refactor predict for dnn model

* remove some unit tests (intel-analytics#2752)

* remove some conflict tests (#2753)

* Update documentation (intel-analytics#2749)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Fix Add operation error when type is Double importing Tensorflow graph (#2721)

* feature: add byte supports for DnnTensor (intel-analytics#2751)

* feat: add byte supports for DnnTensor

* [New Feature] Calculating Scales (#2750)

* [New Feature]Calculating Scales

* recursively update mask for container module (intel-analytics#2754)

* recursively update mask for container module

* [Enhancement] - Speed up BlasWrapper performance under MKL-DNN (intel-analytics#2748)

* add parallel in Blaswrapper

* refactor to support ssd

* meet pr comments

* fix logger serialize

* Loss Function docs improvement (intel-analytics#2757)

* Improve Loss Function docs v2

* change asInstanceOf to toDistirbuted in optimizer (#2755)

* change asInstanceOf to toDistirbuted

* change asInstanceOf to toDistirbuted

* convert scale in blas to dnn (#2758)

* convert scale in blas to dnn

* meet pr comment

* feat: reorder for int8 supports (#2756)

1. Because the new data type, we should add a new attribute called dataType
    to the `MemoryData`.
2. Because we should transfer the scales between FP32->int8 and Int8->FP32.
    we should add two new attributes called `mask` and `scales`.

* fix conversion accuracy (intel-analytics#2760)

*  fix accuracy for saved model

* exclude mkldnn model when conversion

* feature: layer wise supports of int8 (intel-analytics#2762)

Enable the int8 data type in layers, especially for convolutions.
So for a specific layer, it can accept a int8 input. If you want to the fp32
output, should add a reorder.

* feature: mkldnn int8 layer wise supports (intel-analytics#2759)

including 3 steps.

1. generate scales of model.
   need an api like `generateScalesWithMask` to generate the scales of
   fp32 model. and the model returned is an fp32 model too.
2. quantize the model
   the `quantize()` api will be compatible with the `bigquant`
   backend, which will set the quantize flag. And when doing compile,
   the quantized weight, output, input will be generated by mkldnn at
   runtime.
3. do the inference (forward).

* update readme for v1 training (intel-analytics#2763)

* update release doc for preparation (intel-analytics#2764)

* change some docs about mkldnn (intel-analytics#2765)

* add comments about mkldnn

* meet pr comments

* examples for int8 (intel-analytics#2761)

This is an example of how to use mkldnn int8. There're two steps, use
GenInt8Scales to generate the scales first and save the new model. And than you
can use the quantized model as usual.

* enable fustion by default (intel-analytics#2766)

* fix: the influence of default value of fusion (#2768)

* fix: use too much memory of mkldnn models (intel-analytics#2783)

* fix: inplace of input/output and weight dimension error (intel-analytics#2779)

Some layer's input and output use the same memory. We can't do forward in the
`calcScales`. Because at that time, the input has been changed, its scales maybe
not right. Such as,

Seqeuntail().add(Conv).add(ReLU)

it will do two steps, seq.forward(input) first. and when go into the ReLU, it
will do another forward, so the input will be the output. And scales will be
wrong.

For convolution's weight, the dimension always is 5, although the group number
is 1. But for dnn convolution, if there's no group, the weight's dimension
should be 4.

* fix: the blas wrapper has no scales (intel-analytics#2778)

* fix softmax (intel-analytics#2777)

* fix: performance regression on resnet50 (intel-analytics#2774)

the u8 to s8 or s8 to u8 needs no reorder on this case.

* fix log init (#2781)

* fix: dropout should init primitive (#2789)

* Docs update for spark 2.3, build 0.7 and deps exlude (intel-analytics#2671)

* flip to 0.9.0 (intel-analytics#2792)

* Improve Layer documentation v1 (#2767)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Loss Function docs improvement v1

* Improve Loss Function docs v2

* Improve Layers documentation

* Improve documentation on Activations

* minor fix

* Update a code section with python style on Metrics.md (intel-analytics#2665)

* [Fix] doc : some changes for scalaUserGuide and release links according to … (intel-analytics#2791)

* doc : some changes for scalaUserGuide and release links according to v0.8.0 release

* Update build-bigdl-core.md

* Update build-bigdl-core.md

* test: should compare the right grad input (intel-analytics#2794)

* fix the wrong error message (#2800)

* [New feature] Add attention layer and ffn layer (intel-analytics#2795)

* add attention layer

* add ffn layer and more unit tests

* refactor according to pr comments

* add SerializationTest

* fix unit tests

* add python api

* update readme with newly adopted mkl-dnn (#2803)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer (intel-analytics#2802)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer:
Add LARS optimizer: Layer-wise scaled. Also with utility functions to build a set of LARS optim for a container.

Bug fix: The gradient block id of AllReduceParameter is originally composed of {id}{pidTo}gradientBytes{pidFrom}. But the combination of {id}{pidTo} will cause ambiguity. e.g., "112" can be {1}{12} or {11}{2}. Now a "_" is added to separate id from pidTo

* refine documents, correctly set the lrSchedulerOwner bit

* format the added code

* make Lars inherit SGD

* rename Lars -> LarsSGD and reformat

* style changes

* bugfix - set mask for container (intel-analytics#2807)

* bugfix - set mask for container

* bugfix #2805: set dimension mask

* Update Graph.scala

* Update Graph.scala

* change set mask indicator's name

* rename set mask params

* [Enhancement]: Scala Reflection: get default value for constructor parameters (intel-analytics#2808)

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* resolve conflict

* resolve conflict

* code style check

* remove print

* fix typos

fix typos

* replace randomcropper with centercrop for better performance (#2818)

* fix: memory data hash code should contain data type (intel-analytics#2821)

* Optimize backward graph generation and CAddTable (intel-analytics#2817)

* Optimize backward graph generation and caddtable

* refine add table

* change api name

* add layer norm and expand size layers (#2819)

* add layer norm and expand size

* meet pr comments

* feat: enable global average pooling (intel-analytics#2823)

* feat: enable global average pooling

* test: add more unit tests

* Optimizers: use member variable in parent class

* Revert "Optimizers: use member variable in parent class"

This reverts commit 7e47204

* Dilation in MKL-DNN Convolution (intel-analytics#2815)

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* fix typos

fix typos

* make todo all uppercase

* fix: calculate arbitrary mask of scales (intel-analytics#2822)

* Use one AllReduceParameter for multi-optim method  training (intel-analytics#2814)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* change random seed in UT

* [New feature] add transformer layer (intel-analytics#2825)

* add transformer

* refactor class name

* use same embedding for translation

* fix pr comments

* [Bug Fix] Fix Issue 2734 (#2816)

* fix issue 2734

* fix issue 2734

* fix issue 2734

* [Refactor] Reflection Utilization (#2831)

* refactor reflection utils

* refactor reflection utils

* feat: MKLDNN LSTM unidirectional/bidirectional inference support (intel-analytics#2806)

* LSTM draft

* MKLDNN LSTM fixed MD

* added hiddenSize

* setMemoryData NativeData

* weights NativeData format set to ldigo, all 1 test passed

* fixed format any problem

* LSTM weights bias initialisation

* add LSTM2 in nn

* Bidirectional LSTM inference enabled

* modified Bidirectional test

* LSTMSpec input format conversion bug between bigdl and mkldnn fixed, not support random weights, bias

* fixed the last problem 1 3 2 4

* Three inference tests with randomly generated parameters

* Added comments and modified the LSTMSpec (tests using Equivalent.nearequals)

* Deleted nn/LSTM2. Renamed methods. Added a requirement in nn/TimeDistributed

* combined initMemoryDescs() into initFwdPrimitives()

* Add require for input size and hidden size matching if layers of LSTM is more than one

* Refactor RNN

* Add comment on gate order to mkldnn/RNN

* Add unidirectional multilayer test

* add comments/ modify UTs

* phase is not used anymore/ use isTraining() in stead

* operationWant enhanced/ weight init/ release() parameters()

* remove input format check and change some variables names

* input format check / throw exception print info / release code

* comment style and RNNSerialTest

* remove unnecessary comments

* Softmax -> SoftMax (#2837)

* bug fix for cmul (intel-analytics#2836)

* bug fix for cmul

* meet pr comments

* set new storage to weight and bias for weight fusion (intel-analytics#2839)

* Add parameter processor for LARS (#2832)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* implement lars whole layer gradient norm calculation

* change random seed in UT

* add limitation on "trust" of LARS, remove debug output

* reformat

* add tests in DirtriOptimizer for LARS

* reformat

* update parameters in UT

* update parameters in UT

* Add transformer to LM example (intel-analytics#2835)

* add transformer to LM example

* refactor dropout in Transformer

* meet pr comments

* feat: MKLDNN LSTM unidirectional/bidirectional backward support (#2840)

* MKLDNN LSTM backward support with accuracy testing

* fix: require consistent between shape and layout of mkldnn (intel-analytics#2824)

* fix: fusion for multi-group of convolution (intel-analytics#2826)

* fix: support int8 of jointable (#2827)

* fix: support int8 of jointable
* doc: add more docs

* fix: invokeAndWait2 should throw the exception in the tasks (intel-analytics#2843)

* fix acc bug & init dnn thread (intel-analytics#2841)

* support tnc and ntc conversion (intel-analytics#2844)

* support ntc in dnn layer (intel-analytics#2847)

* support ntc in dnn layer

* meet pr comments

* [WIP]Add beam search feature in transformer model (intel-analytics#2834)

* add beam search feature

* Update beam search feature and unit test

* add symbolToLogits function set check

* update clearState and add serial test

* add SequenceBeamSearch to python layers

* add createSequenceBeamSearch method to python api

* feat: add a property to disable omp thread affinity (intel-analytics#2849)

* fix: use treeset to calc topk to upgrade the performance of DetectionOutputSSD (intel-analytics#2853)

* fix: wrong affinity settings (intel-analytics#2857)

* update beam search feature for interface with transformer model (#2855)

* update beam search for padding value and cache structure

* update python API for beam search

* add comments and update python layer

* modify comments format

* modify comments format

* Support converting blas lstm to dnn lstm (#2846)

* convert from blas lstm to dnn lstm

* meet pr comments

* fix load lstm error bug (intel-analytics#2858)

* Add beam search in transformer (intel-analytics#2856)

* Add beam search in transformer

* meet pr comments

* fix: upgrade the performance of normalize (intel-analytics#2854)

* feat: add axis to softmax (intel-analytics#2859)

* add release doc for 0.9 (intel-analytics#2862)

* fix: update core ref to master (intel-analytics#2865)

* flip version to 0.10.0 (intel-analytics#2869)

* [Bug Fix] - Fix module version comparison  (intel-analytics#2871)

* update serialization

* update serialization

* convert IRgraph momentum to mkldnn (intel-analytics#2872)

* tutorial fix (intel-analytics#2879)

* feat: RoiAlign Forward (intel-analytics#2874)

* Add set input output format API in Python (intel-analytics#2880)

* add set input output format

* add static graph check

* feat: Feature Pyramid Networks Forward (intel-analytics#2870)

* fix memory leak for ir graph training (intel-analytics#2895)

* add gemm layer (#2882)

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add gemm layer

* add gemm layer

* add Shape layer (intel-analytics#2885)

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add Gather layer (intel-analytics#2897)

* add gather layer

* [New feature] Add maskhead (intel-analytics#2892)

* support for maskhead

* fix unit tests (intel-analytics#2905)

* modify  predict/predictClass function  (#2868)

* predictClass output modification

* predict/predictClass function modification in Beta Api

* predict/predictClass function modification

* predict/predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* [New feature] Add Boxhead (intel-analytics#2894)

* add boxhead

* add SerialTest

* meet pr comments

* fix: Add TopBlocks to Feature Pyramid Networks (FPN) (#2899)

* Add Mean Average Precision validation method (intel-analytics#2906)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* fix boxhead unit tests (#2912)

* python api nested list input and pooler python api (intel-analytics#2900)

* Auto memory management for MKLDNN (#2867)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* style fixes

* change _implicitMemoryOwner -> _this

* [New feature] Add region proposal (intel-analytics#2896)

* add Regionproposal

* [New feature] add maskrcnn (#2908)

* add maskrcnn

* fix mask head

* move maskrcnn to models

* add maskrcnn serialTest

* Add Onnx Supported Layers (intel-analytics#2902)

* remove duplicated layers

* Update RoiLabel class and add RoiImageFeatureToBatch (intel-analytics#2913)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* update RoiLabel, add RoiImageFeatureToBatch

* fix typo in class name

* updates by suggestions

* minor updates

* Move RoiMiniBatch to MTImageFeatureToBatch.scala

* mask in RoiLabel now have Floats not Bytes

* use IndexedSeq for RoiLabel

* style fix

* add isCrowd and origSize to final target table

* style fix

* isCrowd change to float, add doc

* add tests and bug fixes

* add util getting RoiLabels from ImageFeatures

* add util getting RoiLabels from Table

* comment out the tests

* rename utils in RoiLabel

* feat: MKLDNN GRU forward/backward support (#2893)

* Onnx support: modify unsqueeze function (#2910)

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* add maskutils (intel-analytics#2921)

* add maskutils

* update tests & docs

* fix typo in document

* Fix memory leaks on training (intel-analytics#2914)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* fix memory leak in batch norm

* style fixes

* change _implicitMemoryOwner -> _this

* release submat

* release opencv submats

* support samples with different size  to one mini batch (intel-analytics#2929)

* add to batch with resize

* meet comments

* support batch for mask head and pooler (intel-analytics#2926)

* support batch for mask head

* meet comments

* Onnx support: add a dim parameter to ops.Gather (intel-analytics#2920)

* add dim parameter to ops.Gather

* improve and simplify code

* improve and simplify code

* improve and simplify code

* improve and simplify code

* support batch for regionproposal (#2928)

* support batch for regionproposal

* enable gru blas-to-dnn conversion (intel-analytics#2930)

* Onnx support: add pos parameter to softmax (intel-analytics#2933)

* add pos parameter to softmax

* add pos parameter to softmax

* add pos parameter to softmax

* fix review problem

* fix review problem

* Add resize for segmentation (intel-analytics#2923)

* add resize for segmentation

* meet pr comments

* support batch input for boxhead (#2924)

* boxhead support batch input

* meet pr comments

* COCO SeqFile (intel-analytics#2927)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* ignore non-existing images

* updates based on GH comments

* ONNX Support (#2918)

* onnx dev

* add onnx loader

* clean up

* feat: add precision recall auc (#2941)

* feat: add precision recall auc

* add post processing for maskrcnn model (#2931)

* add mask postprocessing

* put image info to mask model

* fix TimeDistributedCriterion() lack of parameter of dimension issue (intel-analytics#2940)

* revert back api (intel-analytics#2943)

* fix: softmax and bn+scale fusion (intel-analytics#2937)

* feat: multi models support with MKL-DNN backend (intel-analytics#2936)

* feat: multi models support with MKL-DNN backend

* add COCO MAP (#2935)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* add COCO MAP

* revert merge conflict

* ignore non-existing images

* add IOU related API. MAP now parses RLEs

* BBox now inclusive

* updates based on GH comments

* add COCODataset.getImageById

* COCO topK default => -1, remove height: Int, width: Int in GroundTruthRLE

* update imageId2Image

* rename MAPObjectDetection utils, add cocoSegmentationAndBBox, refine formatting

* rename utils

* update documents

* check size of bbox & classes & scores & labels & iscrowd. Handle empty predictions

* add gt and target image size checking, add support for empty target bbox, add UT

* detection sorted before matching with GT. Optimize MAPResult merging. Add UT for merging

* COCO Seq file reader: grey to bgr (intel-analytics#2942)

* grey to bgr

* refactor isGrayScaleImage

* simplify grey scale image checking

* Add the flushing denormal values option on BigDL side (#2934)

* add no argument apply api for softmax (intel-analytics#2945)

* add no argument apply api for softmax

* add no argument apply api for softmax

* ONNX ResNet example (intel-analytics#2939)

* add onnx resnet example

* add doc for onnx

* add doc for onnx

* clean up

* add maskrcnn inference example (intel-analytics#2944)

* add maskrcnn inference example

* meet pr comments

* add model download url

* Update the RoiLabel and MTImageFeatureToBatch (intel-analytics#2925)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* Python MKLDNN examples for CNN(LeNet) and RNN(LSTM) (#2932)

* fix: takeSample only works for dnn backend and get one batch (intel-analytics#2947)

* fix: takeSample only works for dnn backend and get one batch

* edit doc (#2948)

* Rename filesToRoiImageFrame to filesToRoiImageFeatures (intel-analytics#2949)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* filesToRoiImageFrame -> filesToRoiImageFeatures, to public

* fix: move out setMklThreads of MklDnn (intel-analytics#2950)

* memory data cleanup (#2956)

* memory data cleanup

* Onnx support: RoiAlign and TopK parameter update (#2957)

* Topk add dim and increase parameter

* RoiAlign add max pooling mode

* add test cases

* add test cases

* remove masks requirements (intel-analytics#2959)

* fix: the squeeze should not be included in IRElement (intel-analytics#2962)

* enhance COCODataset (#2954)

* enhance COCODataset:
Add COCODataset.loadFromSeqFile
Add COCODataset.toImageFeatures
Add COCOImage.toTable

* rename and polish doc

* fix COCO serialize bug

* fix typo in function name

* typo fix (intel-analytics#2965)

* rename RoiImageFeatureToBatch APIs (#2964)

* RoiMiniBatch enhancement (#2953)

* SerializableIndexedSeq

* allow empty target & image size info

* rename RoiImageFeatureToBatch APIs

* set as private

* change back to array

* MTImageFeatureToBatch without labels

* handle iscrowd

* remove duplication in merge

* feat: add softmax backward (intel-analytics#2967)

* feat: add softmax backward

* fix: fuse bn scale and relu to bn. (intel-analytics#2966)

* fix: fuse bn scale and relu.

* fix mask unit tests (intel-analytics#2973)

* fix: nms stability when using treeset. (intel-analytics#2972)

* flip version to 0.11 (intel-analytics#2974)

* refactor anchor generator (#2963)

* refactor anchor generator

* meet pr comments

* fix code style

* ROIAlign refactor (intel-analytics#2960)

* ROIAlign refactor

* fix unit tests

* fix model load of maskrcnn (intel-analytics#2961)

* fix maskrcnn model load

* delete temp file

* fix maskrcnn tests

* support roialign backward (intel-analytics#2975)

* support roialign backward

* fix sparselinear unit test

* fix: bn nhwc error, the channel should be the last dim (#2981)

* refactor: move torch relevants unit tests to integration tests. (intel-analytics#2971)

* fix: enable integration accuracy tests (intel-analytics#2976)

* fix: softmax dnn backend wrong order of primitive (intel-analytics#2986)

* modify TextClassifier.scala (#2987)

* Add a method to merge nested StaticGraphs (intel-analytics#2985)

* NHWC support when running with MKL-DNN (#2989)

* support NHWC for MKLDNN

* fix unit tests

* Keras with MKL-DNN backend support (#2990)

* Update README.md

* Update README.md

* feat: add distri optimizer v2 (intel-analytics#2992)

* update error message in AllReduceParameter (#2997)

* update error message in AllReduceParameter

* use tensorflow proto jar (#2994)

* fix callBigDLFunc (intel-analytics#3002)

* Remove final for AbstractModule (intel-analytics#3001)

* DistriOptimizerV2 argument (intel-analytics#3003)

* call DistriOptimizerV2

* fix inception (intel-analytics#3010)

* fix top1 and treenn (intel-analytics#3011)

* remove final setExtraParameters (#3014)

* move pretrain in DistriOptimizerV2 (intel-analytics#3016)

* move getData

* rename

* remove time counting

* deprecate dlframe (intel-analytics#3012)

* deprecate dlframe

* fix throughput (#3017)

* fix throughput

* update

* add release doc for 0.10.0 (intel-analytics#3020)

* test examples by distrioptimizerv2 (intel-analytics#3007)

* enable scala examples by distrioptimizerv2

* update example's readme

* update integration test

* test python examples by distriOptimizerV2 (intel-analytics#3008)

* Test python examples by distriOptimizerV2

* deprecate nn.keras (intel-analytics#3013)

* deprecate nn.keras

* fix loss when minibatch size is different (intel-analytics#3021)

* fix loss

* fix ut

* fix style check (intel-analytics#3022)

* specify pyspark version (intel-analytics#3030)

* specify pyspark version

* add release doc for 0.11 (#3026)

* flip version to 0.12 (intel-analytics#3029)



* update

* fix KerasLayer new parameters() (#3034)

* Fix analytics zoo protobuf shading problem (intel-analytics#3033)

* change shade name and remove protobuf-java (already introduced by tf)

* remove protobuf

* add required dependencies (#3047)

* update doc (intel-analytics#3056)

* Updatedoc (#3060)

* Update install-from-pip.md

* [WIP] spark 3.0 (intel-analytics#3054)

* spark 3.0

* add spark3.0 deployment (intel-analytics#3061)

* add spark3.0 deployment

* add warning to remind Optimizer() deprecates (intel-analytics#3062)

* add warning to remind deprecates

* Update scala maven plugin (#3068)

* update scala maven plugin

* change to public (#3064)

* Add big model support (#3067)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* squeeze target dimension (corner case) in ClassNLLCriterion (intel-analytics#3072)

* fix target dimension match error

* update message (#3073)

* flip version to 0.13-snapshot (intel-analytics#3074)

* flip version to 0.13-snapshot

* Uncompressed Tensor  (intel-analytics#3079)

* support no compressing parameter

* address comments

* hotfix ClassNLLCriterion with cloned target (#3081)

* hotfix ClassNLLCriterion with cloned target

* Fix SerializationUtils clone issue of QuantizedTensor (intel-analytics#3088)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* update clone quantizedtensor

* update

* add OptimPredictorShutdownSpec UT in integration test (#3089)

* move integration UT to a general test script (intel-analytics#3094)

* back port master (intel-analytics#3096)

* set seed to avoid random error in PredictionServiceUT (intel-analytics#3097)

* Jdk11 support (intel-analytics#3098)

* update for jdk 11 support and doc

* add serializeUid (intel-analytics#3099)

* update doc (intel-analytics#3104)

* add doc for running in ide (intel-analytics#3106)

* fix callBigDLFunc return a Int while the true return value from java is a byte array. (intel-analytics#3111)

* add list of df support (intel-analytics#3113)

* Update readme (intel-analytics#3118)

* Update index.md

* add 0.12.2 release download (#3122)

* remove DLFrames (intel-analytics#3124)

* remove DLFrames

* update

* update

* update

* rm dlframe example from test script

* Add Utest about dividing zero (#3128)

* Add Utest about dividing zero

* add Utest and zero check of LocalData

* add Utest and zero check of LocalData

* change

* Add Utest about dividing zero

* fix test

* add python3 to Dockerfile (intel-analytics#3132)

* add python3 to Dockerfile

* update

* update jdk

* update

* make default DistriOptimizer as V2 (intel-analytics#3129)

* make default DistriOptimizer as V2

* update

* fix dlframe (intel-analytics#3133)

* DistriOptimizerV2 logger (intel-analytics#3135)

* DistriOptimizerV2 logger

* update

* fix style check

* validate epoch num

* move dlframe SharedParamsApater to AZ and roll back to OptimizerV1 (intel-analytics#3137)

* upgrade spark version (intel-analytics#3138)

* Update deploy-spark2.sh

* 0.13 release doc (#3144)

* upgrade log4j (intel-analytics#3141)

* flip0.14 (intel-analytics#3142)

* flip0.14

* update

* Update deploy-spark3.sh (#3145)

* update

* update

* update

* update

* fix make dist

* migrate path

* update

* update

Co-authored-by: zhangxiaoli73 <380761639@qq.com>
Co-authored-by: Jerry Wu <wzhongyuan@gmail.com>
Co-authored-by: Xin Qiu <qiuxin2012@users.noreply.github.com>
Co-authored-by: Yanzhang Wang <i8run15@gmail.com>
Co-authored-by: GenBrg <34305977+GenBrg@users.noreply.github.com>
Co-authored-by: LeicongLi <leicongli@gmail.com>
Co-authored-by: Emiliano Martinez <emimartinez.sanchez@gmail.com>
Co-authored-by: abdolence <abdulla.abd.m@gmail.com>
Co-authored-by: Enrique Garcia <engapa@gmail.com>
Co-authored-by: Louie Tsai <louie.tsai@intel.com>
Co-authored-by: yaochi <yaochitc@gmail.com>
Co-authored-by: Menooker <Menooker@users.noreply.github.com>
Co-authored-by: Menooker <myjisgreat@live.cn>
Co-authored-by: Firecrackerxox <mengceng.he@intel.com>
Co-authored-by: majing921201 <1834475657@qq.com>
Co-authored-by: jenniew <jenniewang123@gmail.com>
Co-authored-by: Xiao <lingxiao1989@gmail.com>
Co-authored-by: Firecrackerxox <he044646@sina.com>
Co-authored-by: Hui Li <lihuibinghan@sina.com>
Co-authored-by: Jason Dai <jason.dai@intel.com>
Co-authored-by: dding3 <ding.ding@intel.com>
Co-authored-by: Yang Wang <yang3.wang@intel.com>
Co-authored-by: Yina Chen <33650826+cyita@users.noreply.github.com>
Co-authored-by: Hangrui Cao <50705298+DiegoCao@users.noreply.github.com>
Co-authored-by: pinggao18 <44043817+pinggao18@users.noreply.github.com>
dding3 added a commit to dding3/analytics-zoo that referenced this pull request Sep 8, 2021
* convert static graph to IR graph and build (intel-analytics#2711)

* add static graph to IR graph

* meet pr comments

* [Enhancement] - Enhance unig test to avoid dynamic resource allocation issue by docker (intel-analytics#2713)

* make the core number fixed

* fix local predictor

* add Trigger and/or python API (intel-analytics#2682)

* add spark 2.4 support (intel-analytics#2715)

* update sparse tensor's document (#2714)

* Reserve all state in OptimMethod when calling Optimizer.optimize() multiple times (#2648)

* reserve optimMethod for each worker

* add valdiation throughput

* cache variable previousOptim

* fix: move mkldnn computing to a single thread pool (intel-analytics#2724)

Because if we use the parent thread directly, there will be two bugs,
1. The child threads forked from parent thread will be bound to core 0
because of the affinity settings.
2. The native thread has some unknown thread local variables. So if
the parent thread exits and is recreated, such as the thread from
Executors.newFixedThreadPool. The whole app will be segment fault.
The parent thread means the main thread (Local Mode) or worker thread of
mapPartition (Distributed Mode).

* add ceilMode for Pooling & fix batchNorm evaluate (#2708)

* add ceilMode for Pooling & fix batchNorm evaluate

* add training status for dnn layer

* fix comments

* fix IRGraph init & Add regualizer (#2736)

* fix IRGraph init & Add regualizer

* meet review comments

* fix: update mkldnn version to v0.17 issues. (intel-analytics#2712)

There're two issues,

1. the padding tensor required. mkl-dnn will use a padding tensor which
    will use more memory, such as 4x1x28x28 to 4x8x28x28(avx2). It will
    pad to times of simd width.
2. the TensorMMap between DenseTensor and DnnTensor. Previous impl
    will allocate DnnTensor when model is created, which will cost too much
    space. So this patch will allocate it at runtime.

* add computshape for some layers and add skip primitives in DnnGraph (intel-analytics#2740)

* add computshape for some layer and add skip primitives in DnnGraph

* meet pr comments

* Improve documentation (intel-analytics#2745)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* include edge case to cover all the data types (#2742)

* layer auto fusion for dnn graph (intel-analytics#2746)

* add auto fusion in dnn graph

* refactor predict for dnn model (intel-analytics#2737)

* refactor predict for dnn model

* remove some unit tests (intel-analytics#2752)

* remove some conflict tests (#2753)

* Update documentation (intel-analytics#2749)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Fix Add operation error when type is Double importing Tensorflow graph (#2721)

* feature: add byte supports for DnnTensor (intel-analytics#2751)

* feat: add byte supports for DnnTensor

* [New Feature] Calculating Scales (#2750)

* [New Feature]Calculating Scales

* recursively update mask for container module (intel-analytics#2754)

* recursively update mask for container module

* [Enhancement] - Speed up BlasWrapper performance under MKL-DNN (intel-analytics#2748)

* add parallel in Blaswrapper

* refactor to support ssd

* meet pr comments

* fix logger serialize

* Loss Function docs improvement (intel-analytics#2757)

* Improve Loss Function docs v2

* change asInstanceOf to toDistirbuted in optimizer (#2755)

* change asInstanceOf to toDistirbuted

* change asInstanceOf to toDistirbuted

* convert scale in blas to dnn (#2758)

* convert scale in blas to dnn

* meet pr comment

* feat: reorder for int8 supports (#2756)

1. Because the new data type, we should add a new attribute called dataType
    to the `MemoryData`.
2. Because we should transfer the scales between FP32->int8 and Int8->FP32.
    we should add two new attributes called `mask` and `scales`.

* fix conversion accuracy (intel-analytics#2760)

*  fix accuracy for saved model

* exclude mkldnn model when conversion

* feature: layer wise supports of int8 (intel-analytics#2762)

Enable the int8 data type in layers, especially for convolutions.
So for a specific layer, it can accept a int8 input. If you want to the fp32
output, should add a reorder.

* feature: mkldnn int8 layer wise supports (intel-analytics#2759)

including 3 steps.

1. generate scales of model.
   need an api like `generateScalesWithMask` to generate the scales of
   fp32 model. and the model returned is an fp32 model too.
2. quantize the model
   the `quantize()` api will be compatible with the `bigquant`
   backend, which will set the quantize flag. And when doing compile,
   the quantized weight, output, input will be generated by mkldnn at
   runtime.
3. do the inference (forward).

* update readme for v1 training (intel-analytics#2763)

* update release doc for preparation (intel-analytics#2764)

* change some docs about mkldnn (intel-analytics#2765)

* add comments about mkldnn

* meet pr comments

* examples for int8 (intel-analytics#2761)

This is an example of how to use mkldnn int8. There're two steps, use
GenInt8Scales to generate the scales first and save the new model. And than you
can use the quantized model as usual.

* enable fustion by default (intel-analytics#2766)

* fix: the influence of default value of fusion (#2768)

* fix: use too much memory of mkldnn models (intel-analytics#2783)

* fix: inplace of input/output and weight dimension error (intel-analytics#2779)

Some layer's input and output use the same memory. We can't do forward in the
`calcScales`. Because at that time, the input has been changed, its scales maybe
not right. Such as,

Seqeuntail().add(Conv).add(ReLU)

it will do two steps, seq.forward(input) first. and when go into the ReLU, it
will do another forward, so the input will be the output. And scales will be
wrong.

For convolution's weight, the dimension always is 5, although the group number
is 1. But for dnn convolution, if there's no group, the weight's dimension
should be 4.

* fix: the blas wrapper has no scales (intel-analytics#2778)

* fix softmax (intel-analytics#2777)

* fix: performance regression on resnet50 (intel-analytics#2774)

the u8 to s8 or s8 to u8 needs no reorder on this case.

* fix log init (#2781)

* fix: dropout should init primitive (#2789)

* Docs update for spark 2.3, build 0.7 and deps exlude (intel-analytics#2671)

* flip to 0.9.0 (intel-analytics#2792)

* Improve Layer documentation v1 (#2767)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Loss Function docs improvement v1

* Improve Loss Function docs v2

* Improve Layers documentation

* Improve documentation on Activations

* minor fix

* Update a code section with python style on Metrics.md (intel-analytics#2665)

* [Fix] doc : some changes for scalaUserGuide and release links according to … (intel-analytics#2791)

* doc : some changes for scalaUserGuide and release links according to v0.8.0 release

* Update build-bigdl-core.md

* Update build-bigdl-core.md

* test: should compare the right grad input (intel-analytics#2794)

* fix the wrong error message (#2800)

* [New feature] Add attention layer and ffn layer (intel-analytics#2795)

* add attention layer

* add ffn layer and more unit tests

* refactor according to pr comments

* add SerializationTest

* fix unit tests

* add python api

* update readme with newly adopted mkl-dnn (#2803)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer (intel-analytics#2802)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer:
Add LARS optimizer: Layer-wise scaled. Also with utility functions to build a set of LARS optim for a container.

Bug fix: The gradient block id of AllReduceParameter is originally composed of {id}{pidTo}gradientBytes{pidFrom}. But the combination of {id}{pidTo} will cause ambiguity. e.g., "112" can be {1}{12} or {11}{2}. Now a "_" is added to separate id from pidTo

* refine documents, correctly set the lrSchedulerOwner bit

* format the added code

* make Lars inherit SGD

* rename Lars -> LarsSGD and reformat

* style changes

* bugfix - set mask for container (intel-analytics#2807)

* bugfix - set mask for container

* bugfix #2805: set dimension mask

* Update Graph.scala

* Update Graph.scala

* change set mask indicator's name

* rename set mask params

* [Enhancement]: Scala Reflection: get default value for constructor parameters (intel-analytics#2808)

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* resolve conflict

* resolve conflict

* code style check

* remove print

* fix typos

fix typos

* replace randomcropper with centercrop for better performance (#2818)

* fix: memory data hash code should contain data type (intel-analytics#2821)

* Optimize backward graph generation and CAddTable (intel-analytics#2817)

* Optimize backward graph generation and caddtable

* refine add table

* change api name

* add layer norm and expand size layers (#2819)

* add layer norm and expand size

* meet pr comments

* feat: enable global average pooling (intel-analytics#2823)

* feat: enable global average pooling

* test: add more unit tests

* Optimizers: use member variable in parent class

* Revert "Optimizers: use member variable in parent class"

This reverts commit 7e47204

* Dilation in MKL-DNN Convolution (intel-analytics#2815)

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* fix typos

fix typos

* make todo all uppercase

* fix: calculate arbitrary mask of scales (intel-analytics#2822)

* Use one AllReduceParameter for multi-optim method  training (intel-analytics#2814)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* change random seed in UT

* [New feature] add transformer layer (intel-analytics#2825)

* add transformer

* refactor class name

* use same embedding for translation

* fix pr comments

* [Bug Fix] Fix Issue 2734 (#2816)

* fix issue 2734

* fix issue 2734

* fix issue 2734

* [Refactor] Reflection Utilization (#2831)

* refactor reflection utils

* refactor reflection utils

* feat: MKLDNN LSTM unidirectional/bidirectional inference support (intel-analytics#2806)

* LSTM draft

* MKLDNN LSTM fixed MD

* added hiddenSize

* setMemoryData NativeData

* weights NativeData format set to ldigo, all 1 test passed

* fixed format any problem

* LSTM weights bias initialisation

* add LSTM2 in nn

* Bidirectional LSTM inference enabled

* modified Bidirectional test

* LSTMSpec input format conversion bug between bigdl and mkldnn fixed, not support random weights, bias

* fixed the last problem 1 3 2 4

* Three inference tests with randomly generated parameters

* Added comments and modified the LSTMSpec (tests using Equivalent.nearequals)

* Deleted nn/LSTM2. Renamed methods. Added a requirement in nn/TimeDistributed

* combined initMemoryDescs() into initFwdPrimitives()

* Add require for input size and hidden size matching if layers of LSTM is more than one

* Refactor RNN

* Add comment on gate order to mkldnn/RNN

* Add unidirectional multilayer test

* add comments/ modify UTs

* phase is not used anymore/ use isTraining() in stead

* operationWant enhanced/ weight init/ release() parameters()

* remove input format check and change some variables names

* input format check / throw exception print info / release code

* comment style and RNNSerialTest

* remove unnecessary comments

* Softmax -> SoftMax (#2837)

* bug fix for cmul (intel-analytics#2836)

* bug fix for cmul

* meet pr comments

* set new storage to weight and bias for weight fusion (intel-analytics#2839)

* Add parameter processor for LARS (#2832)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* implement lars whole layer gradient norm calculation

* change random seed in UT

* add limitation on "trust" of LARS, remove debug output

* reformat

* add tests in DirtriOptimizer for LARS

* reformat

* update parameters in UT

* update parameters in UT

* Add transformer to LM example (intel-analytics#2835)

* add transformer to LM example

* refactor dropout in Transformer

* meet pr comments

* feat: MKLDNN LSTM unidirectional/bidirectional backward support (#2840)

* MKLDNN LSTM backward support with accuracy testing

* fix: require consistent between shape and layout of mkldnn (intel-analytics#2824)

* fix: fusion for multi-group of convolution (intel-analytics#2826)

* fix: support int8 of jointable (#2827)

* fix: support int8 of jointable
* doc: add more docs

* fix: invokeAndWait2 should throw the exception in the tasks (intel-analytics#2843)

* fix acc bug & init dnn thread (intel-analytics#2841)

* support tnc and ntc conversion (intel-analytics#2844)

* support ntc in dnn layer (intel-analytics#2847)

* support ntc in dnn layer

* meet pr comments

* [WIP]Add beam search feature in transformer model (intel-analytics#2834)

* add beam search feature

* Update beam search feature and unit test

* add symbolToLogits function set check

* update clearState and add serial test

* add SequenceBeamSearch to python layers

* add createSequenceBeamSearch method to python api

* feat: add a property to disable omp thread affinity (intel-analytics#2849)

* fix: use treeset to calc topk to upgrade the performance of DetectionOutputSSD (intel-analytics#2853)

* fix: wrong affinity settings (intel-analytics#2857)

* update beam search feature for interface with transformer model (#2855)

* update beam search for padding value and cache structure

* update python API for beam search

* add comments and update python layer

* modify comments format

* modify comments format

* Support converting blas lstm to dnn lstm (#2846)

* convert from blas lstm to dnn lstm

* meet pr comments

* fix load lstm error bug (intel-analytics#2858)

* Add beam search in transformer (intel-analytics#2856)

* Add beam search in transformer

* meet pr comments

* fix: upgrade the performance of normalize (intel-analytics#2854)

* feat: add axis to softmax (intel-analytics#2859)

* add release doc for 0.9 (intel-analytics#2862)

* fix: update core ref to master (intel-analytics#2865)

* flip version to 0.10.0 (intel-analytics#2869)

* [Bug Fix] - Fix module version comparison  (intel-analytics#2871)

* update serialization

* update serialization

* convert IRgraph momentum to mkldnn (intel-analytics#2872)

* tutorial fix (intel-analytics#2879)

* feat: RoiAlign Forward (intel-analytics#2874)

* Add set input output format API in Python (intel-analytics#2880)

* add set input output format

* add static graph check

* feat: Feature Pyramid Networks Forward (intel-analytics#2870)

* fix memory leak for ir graph training (intel-analytics#2895)

* add gemm layer (#2882)

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add gemm layer

* add gemm layer

* add Shape layer (intel-analytics#2885)

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add Gather layer (intel-analytics#2897)

* add gather layer

* [New feature] Add maskhead (intel-analytics#2892)

* support for maskhead

* fix unit tests (intel-analytics#2905)

* modify  predict/predictClass function  (#2868)

* predictClass output modification

* predict/predictClass function modification in Beta Api

* predict/predictClass function modification

* predict/predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* [New feature] Add Boxhead (intel-analytics#2894)

* add boxhead

* add SerialTest

* meet pr comments

* fix: Add TopBlocks to Feature Pyramid Networks (FPN) (#2899)

* Add Mean Average Precision validation method (intel-analytics#2906)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* fix boxhead unit tests (#2912)

* python api nested list input and pooler python api (intel-analytics#2900)

* Auto memory management for MKLDNN (#2867)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* style fixes

* change _implicitMemoryOwner -> _this

* [New feature] Add region proposal (intel-analytics#2896)

* add Regionproposal

* [New feature] add maskrcnn (#2908)

* add maskrcnn

* fix mask head

* move maskrcnn to models

* add maskrcnn serialTest

* Add Onnx Supported Layers (intel-analytics#2902)

* remove duplicated layers

* Update RoiLabel class and add RoiImageFeatureToBatch (intel-analytics#2913)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* update RoiLabel, add RoiImageFeatureToBatch

* fix typo in class name

* updates by suggestions

* minor updates

* Move RoiMiniBatch to MTImageFeatureToBatch.scala

* mask in RoiLabel now have Floats not Bytes

* use IndexedSeq for RoiLabel

* style fix

* add isCrowd and origSize to final target table

* style fix

* isCrowd change to float, add doc

* add tests and bug fixes

* add util getting RoiLabels from ImageFeatures

* add util getting RoiLabels from Table

* comment out the tests

* rename utils in RoiLabel

* feat: MKLDNN GRU forward/backward support (#2893)

* Onnx support: modify unsqueeze function (#2910)

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* add maskutils (intel-analytics#2921)

* add maskutils

* update tests & docs

* fix typo in document

* Fix memory leaks on training (intel-analytics#2914)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* fix memory leak in batch norm

* style fixes

* change _implicitMemoryOwner -> _this

* release submat

* release opencv submats

* support samples with different size  to one mini batch (intel-analytics#2929)

* add to batch with resize

* meet comments

* support batch for mask head and pooler (intel-analytics#2926)

* support batch for mask head

* meet comments

* Onnx support: add a dim parameter to ops.Gather (intel-analytics#2920)

* add dim parameter to ops.Gather

* improve and simplify code

* improve and simplify code

* improve and simplify code

* improve and simplify code

* support batch for regionproposal (#2928)

* support batch for regionproposal

* enable gru blas-to-dnn conversion (intel-analytics#2930)

* Onnx support: add pos parameter to softmax (intel-analytics#2933)

* add pos parameter to softmax

* add pos parameter to softmax

* add pos parameter to softmax

* fix review problem

* fix review problem

* Add resize for segmentation (intel-analytics#2923)

* add resize for segmentation

* meet pr comments

* support batch input for boxhead (#2924)

* boxhead support batch input

* meet pr comments

* COCO SeqFile (intel-analytics#2927)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* ignore non-existing images

* updates based on GH comments

* ONNX Support (#2918)

* onnx dev

* add onnx loader

* clean up

* feat: add precision recall auc (#2941)

* feat: add precision recall auc

* add post processing for maskrcnn model (#2931)

* add mask postprocessing

* put image info to mask model

* fix TimeDistributedCriterion() lack of parameter of dimension issue (intel-analytics#2940)

* revert back api (intel-analytics#2943)

* fix: softmax and bn+scale fusion (intel-analytics#2937)

* feat: multi models support with MKL-DNN backend (intel-analytics#2936)

* feat: multi models support with MKL-DNN backend

* add COCO MAP (#2935)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* add COCO MAP

* revert merge conflict

* ignore non-existing images

* add IOU related API. MAP now parses RLEs

* BBox now inclusive

* updates based on GH comments

* add COCODataset.getImageById

* COCO topK default => -1, remove height: Int, width: Int in GroundTruthRLE

* update imageId2Image

* rename MAPObjectDetection utils, add cocoSegmentationAndBBox, refine formatting

* rename utils

* update documents

* check size of bbox & classes & scores & labels & iscrowd. Handle empty predictions

* add gt and target image size checking, add support for empty target bbox, add UT

* detection sorted before matching with GT. Optimize MAPResult merging. Add UT for merging

* COCO Seq file reader: grey to bgr (intel-analytics#2942)

* grey to bgr

* refactor isGrayScaleImage

* simplify grey scale image checking

* Add the flushing denormal values option on BigDL side (#2934)

* add no argument apply api for softmax (intel-analytics#2945)

* add no argument apply api for softmax

* add no argument apply api for softmax

* ONNX ResNet example (intel-analytics#2939)

* add onnx resnet example

* add doc for onnx

* add doc for onnx

* clean up

* add maskrcnn inference example (intel-analytics#2944)

* add maskrcnn inference example

* meet pr comments

* add model download url

* Update the RoiLabel and MTImageFeatureToBatch (intel-analytics#2925)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* Python MKLDNN examples for CNN(LeNet) and RNN(LSTM) (#2932)

* fix: takeSample only works for dnn backend and get one batch (intel-analytics#2947)

* fix: takeSample only works for dnn backend and get one batch

* edit doc (#2948)

* Rename filesToRoiImageFrame to filesToRoiImageFeatures (intel-analytics#2949)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* filesToRoiImageFrame -> filesToRoiImageFeatures, to public

* fix: move out setMklThreads of MklDnn (intel-analytics#2950)

* memory data cleanup (#2956)

* memory data cleanup

* Onnx support: RoiAlign and TopK parameter update (#2957)

* Topk add dim and increase parameter

* RoiAlign add max pooling mode

* add test cases

* add test cases

* remove masks requirements (intel-analytics#2959)

* fix: the squeeze should not be included in IRElement (intel-analytics#2962)

* enhance COCODataset (#2954)

* enhance COCODataset:
Add COCODataset.loadFromSeqFile
Add COCODataset.toImageFeatures
Add COCOImage.toTable

* rename and polish doc

* fix COCO serialize bug

* fix typo in function name

* typo fix (intel-analytics#2965)

* rename RoiImageFeatureToBatch APIs (#2964)

* RoiMiniBatch enhancement (#2953)

* SerializableIndexedSeq

* allow empty target & image size info

* rename RoiImageFeatureToBatch APIs

* set as private

* change back to array

* MTImageFeatureToBatch without labels

* handle iscrowd

* remove duplication in merge

* feat: add softmax backward (intel-analytics#2967)

* feat: add softmax backward

* fix: fuse bn scale and relu to bn. (intel-analytics#2966)

* fix: fuse bn scale and relu.

* fix mask unit tests (intel-analytics#2973)

* fix: nms stability when using treeset. (intel-analytics#2972)

* flip version to 0.11 (intel-analytics#2974)

* refactor anchor generator (#2963)

* refactor anchor generator

* meet pr comments

* fix code style

* ROIAlign refactor (intel-analytics#2960)

* ROIAlign refactor

* fix unit tests

* fix model load of maskrcnn (intel-analytics#2961)

* fix maskrcnn model load

* delete temp file

* fix maskrcnn tests

* support roialign backward (intel-analytics#2975)

* support roialign backward

* fix sparselinear unit test

* fix: bn nhwc error, the channel should be the last dim (#2981)

* refactor: move torch relevants unit tests to integration tests. (intel-analytics#2971)

* fix: enable integration accuracy tests (intel-analytics#2976)

* fix: softmax dnn backend wrong order of primitive (intel-analytics#2986)

* modify TextClassifier.scala (#2987)

* Add a method to merge nested StaticGraphs (intel-analytics#2985)

* NHWC support when running with MKL-DNN (#2989)

* support NHWC for MKLDNN

* fix unit tests

* Keras with MKL-DNN backend support (#2990)

* Update README.md

* Update README.md

* feat: add distri optimizer v2 (intel-analytics#2992)

* update error message in AllReduceParameter (#2997)

* update error message in AllReduceParameter

* use tensorflow proto jar (#2994)

* fix callBigDLFunc (intel-analytics#3002)

* Remove final for AbstractModule (intel-analytics#3001)

* DistriOptimizerV2 argument (intel-analytics#3003)

* call DistriOptimizerV2

* fix inception (intel-analytics#3010)

* fix top1 and treenn (intel-analytics#3011)

* remove final setExtraParameters (#3014)

* move pretrain in DistriOptimizerV2 (intel-analytics#3016)

* move getData

* rename

* remove time counting

* deprecate dlframe (intel-analytics#3012)

* deprecate dlframe

* fix throughput (#3017)

* fix throughput

* update

* add release doc for 0.10.0 (intel-analytics#3020)

* test examples by distrioptimizerv2 (intel-analytics#3007)

* enable scala examples by distrioptimizerv2

* update example's readme

* update integration test

* test python examples by distriOptimizerV2 (intel-analytics#3008)

* Test python examples by distriOptimizerV2

* deprecate nn.keras (intel-analytics#3013)

* deprecate nn.keras

* fix loss when minibatch size is different (intel-analytics#3021)

* fix loss

* fix ut

* fix style check (intel-analytics#3022)

* specify pyspark version (intel-analytics#3030)

* specify pyspark version

* add release doc for 0.11 (#3026)

* flip version to 0.12 (intel-analytics#3029)



* update

* fix KerasLayer new parameters() (#3034)

* Fix analytics zoo protobuf shading problem (intel-analytics#3033)

* change shade name and remove protobuf-java (already introduced by tf)

* remove protobuf

* add required dependencies (#3047)

* update doc (intel-analytics#3056)

* Updatedoc (#3060)

* Update install-from-pip.md

* [WIP] spark 3.0 (intel-analytics#3054)

* spark 3.0

* add spark3.0 deployment (intel-analytics#3061)

* add spark3.0 deployment

* add warning to remind Optimizer() deprecates (intel-analytics#3062)

* add warning to remind deprecates

* Update scala maven plugin (#3068)

* update scala maven plugin

* change to public (#3064)

* Add big model support (#3067)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* squeeze target dimension (corner case) in ClassNLLCriterion (intel-analytics#3072)

* fix target dimension match error

* update message (#3073)

* flip version to 0.13-snapshot (intel-analytics#3074)

* flip version to 0.13-snapshot

* Uncompressed Tensor  (intel-analytics#3079)

* support no compressing parameter

* address comments

* hotfix ClassNLLCriterion with cloned target (#3081)

* hotfix ClassNLLCriterion with cloned target

* Fix SerializationUtils clone issue of QuantizedTensor (intel-analytics#3088)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* update clone quantizedtensor

* update

* add OptimPredictorShutdownSpec UT in integration test (#3089)

* move integration UT to a general test script (intel-analytics#3094)

* back port master (intel-analytics#3096)

* set seed to avoid random error in PredictionServiceUT (intel-analytics#3097)

* Jdk11 support (intel-analytics#3098)

* update for jdk 11 support and doc

* add serializeUid (intel-analytics#3099)

* update doc (intel-analytics#3104)

* add doc for running in ide (intel-analytics#3106)

* fix callBigDLFunc return a Int while the true return value from java is a byte array. (intel-analytics#3111)

* add list of df support (intel-analytics#3113)

* Update readme (intel-analytics#3118)

* Update index.md

* add 0.12.2 release download (#3122)

* remove DLFrames (intel-analytics#3124)

* remove DLFrames

* update

* update

* update

* rm dlframe example from test script

* Add Utest about dividing zero (#3128)

* Add Utest about dividing zero

* add Utest and zero check of LocalData

* add Utest and zero check of LocalData

* change

* Add Utest about dividing zero

* fix test

* add python3 to Dockerfile (intel-analytics#3132)

* add python3 to Dockerfile

* update

* update jdk

* update

* make default DistriOptimizer as V2 (intel-analytics#3129)

* make default DistriOptimizer as V2

* update

* fix dlframe (intel-analytics#3133)

* DistriOptimizerV2 logger (intel-analytics#3135)

* DistriOptimizerV2 logger

* update

* fix style check

* validate epoch num

* move dlframe SharedParamsApater to AZ and roll back to OptimizerV1 (intel-analytics#3137)

* upgrade spark version (intel-analytics#3138)

* Update deploy-spark2.sh

* 0.13 release doc (#3144)

* upgrade log4j (intel-analytics#3141)

* flip0.14 (intel-analytics#3142)

* flip0.14

* update

* Update deploy-spark3.sh (#3145)

* update

* update

* update

* update

* fix make dist

* migrate path

* update

* update

Co-authored-by: zhangxiaoli73 <380761639@qq.com>
Co-authored-by: Jerry Wu <wzhongyuan@gmail.com>
Co-authored-by: Xin Qiu <qiuxin2012@users.noreply.github.com>
Co-authored-by: Yanzhang Wang <i8run15@gmail.com>
Co-authored-by: GenBrg <34305977+GenBrg@users.noreply.github.com>
Co-authored-by: LeicongLi <leicongli@gmail.com>
Co-authored-by: Emiliano Martinez <emimartinez.sanchez@gmail.com>
Co-authored-by: abdolence <abdulla.abd.m@gmail.com>
Co-authored-by: Enrique Garcia <engapa@gmail.com>
Co-authored-by: Louie Tsai <louie.tsai@intel.com>
Co-authored-by: yaochi <yaochitc@gmail.com>
Co-authored-by: Menooker <Menooker@users.noreply.github.com>
Co-authored-by: Menooker <myjisgreat@live.cn>
Co-authored-by: Firecrackerxox <mengceng.he@intel.com>
Co-authored-by: majing921201 <1834475657@qq.com>
Co-authored-by: jenniew <jenniewang123@gmail.com>
Co-authored-by: Xiao <lingxiao1989@gmail.com>
Co-authored-by: Firecrackerxox <he044646@sina.com>
Co-authored-by: Hui Li <lihuibinghan@sina.com>
Co-authored-by: Jason Dai <jason.dai@intel.com>
Co-authored-by: dding3 <ding.ding@intel.com>
Co-authored-by: Yang Wang <yang3.wang@intel.com>
Co-authored-by: Yina Chen <33650826+cyita@users.noreply.github.com>
Co-authored-by: Hangrui Cao <50705298+DiegoCao@users.noreply.github.com>
Co-authored-by: pinggao18 <44043817+pinggao18@users.noreply.github.com>
Le-Zheng added a commit to Le-Zheng/analytics-zoo that referenced this pull request Sep 8, 2021
* convert static graph to IR graph and build (intel-analytics#2711)

* add static graph to IR graph

* meet pr comments

* [Enhancement] - Enhance unig test to avoid dynamic resource allocation issue by docker (intel-analytics#2713)

* make the core number fixed

* fix local predictor

* add Trigger and/or python API (intel-analytics#2682)

* add spark 2.4 support (intel-analytics#2715)

* update sparse tensor's document (#2714)

* Reserve all state in OptimMethod when calling Optimizer.optimize() multiple times (#2648)

* reserve optimMethod for each worker

* add valdiation throughput

* cache variable previousOptim

* fix: move mkldnn computing to a single thread pool (intel-analytics#2724)

Because if we use the parent thread directly, there will be two bugs,
1. The child threads forked from parent thread will be bound to core 0
because of the affinity settings.
2. The native thread has some unknown thread local variables. So if
the parent thread exits and is recreated, such as the thread from
Executors.newFixedThreadPool. The whole app will be segment fault.
The parent thread means the main thread (Local Mode) or worker thread of
mapPartition (Distributed Mode).

* add ceilMode for Pooling & fix batchNorm evaluate (#2708)

* add ceilMode for Pooling & fix batchNorm evaluate

* add training status for dnn layer

* fix comments

* fix IRGraph init & Add regualizer (#2736)

* fix IRGraph init & Add regualizer

* meet review comments

* fix: update mkldnn version to v0.17 issues. (intel-analytics#2712)

There're two issues,

1. the padding tensor required. mkl-dnn will use a padding tensor which
    will use more memory, such as 4x1x28x28 to 4x8x28x28(avx2). It will
    pad to times of simd width.
2. the TensorMMap between DenseTensor and DnnTensor. Previous impl
    will allocate DnnTensor when model is created, which will cost too much
    space. So this patch will allocate it at runtime.

* add computshape for some layers and add skip primitives in DnnGraph (intel-analytics#2740)

* add computshape for some layer and add skip primitives in DnnGraph

* meet pr comments

* Improve documentation (intel-analytics#2745)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* include edge case to cover all the data types (#2742)

* layer auto fusion for dnn graph (intel-analytics#2746)

* add auto fusion in dnn graph

* refactor predict for dnn model (intel-analytics#2737)

* refactor predict for dnn model

* remove some unit tests (intel-analytics#2752)

* remove some conflict tests (#2753)

* Update documentation (intel-analytics#2749)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Fix Add operation error when type is Double importing Tensorflow graph (#2721)

* feature: add byte supports for DnnTensor (intel-analytics#2751)

* feat: add byte supports for DnnTensor

* [New Feature] Calculating Scales (#2750)

* [New Feature]Calculating Scales

* recursively update mask for container module (intel-analytics#2754)

* recursively update mask for container module

* [Enhancement] - Speed up BlasWrapper performance under MKL-DNN (intel-analytics#2748)

* add parallel in Blaswrapper

* refactor to support ssd

* meet pr comments

* fix logger serialize

* Loss Function docs improvement (intel-analytics#2757)

* Improve Loss Function docs v2

* change asInstanceOf to toDistirbuted in optimizer (#2755)

* change asInstanceOf to toDistirbuted

* change asInstanceOf to toDistirbuted

* convert scale in blas to dnn (#2758)

* convert scale in blas to dnn

* meet pr comment

* feat: reorder for int8 supports (#2756)

1. Because the new data type, we should add a new attribute called dataType
    to the `MemoryData`.
2. Because we should transfer the scales between FP32->int8 and Int8->FP32.
    we should add two new attributes called `mask` and `scales`.

* fix conversion accuracy (intel-analytics#2760)

*  fix accuracy for saved model

* exclude mkldnn model when conversion

* feature: layer wise supports of int8 (intel-analytics#2762)

Enable the int8 data type in layers, especially for convolutions.
So for a specific layer, it can accept a int8 input. If you want to the fp32
output, should add a reorder.

* feature: mkldnn int8 layer wise supports (intel-analytics#2759)

including 3 steps.

1. generate scales of model.
   need an api like `generateScalesWithMask` to generate the scales of
   fp32 model. and the model returned is an fp32 model too.
2. quantize the model
   the `quantize()` api will be compatible with the `bigquant`
   backend, which will set the quantize flag. And when doing compile,
   the quantized weight, output, input will be generated by mkldnn at
   runtime.
3. do the inference (forward).

* update readme for v1 training (intel-analytics#2763)

* update release doc for preparation (intel-analytics#2764)

* change some docs about mkldnn (intel-analytics#2765)

* add comments about mkldnn

* meet pr comments

* examples for int8 (intel-analytics#2761)

This is an example of how to use mkldnn int8. There're two steps, use
GenInt8Scales to generate the scales first and save the new model. And than you
can use the quantized model as usual.

* enable fustion by default (intel-analytics#2766)

* fix: the influence of default value of fusion (#2768)

* fix: use too much memory of mkldnn models (intel-analytics#2783)

* fix: inplace of input/output and weight dimension error (intel-analytics#2779)

Some layer's input and output use the same memory. We can't do forward in the
`calcScales`. Because at that time, the input has been changed, its scales maybe
not right. Such as,

Seqeuntail().add(Conv).add(ReLU)

it will do two steps, seq.forward(input) first. and when go into the ReLU, it
will do another forward, so the input will be the output. And scales will be
wrong.

For convolution's weight, the dimension always is 5, although the group number
is 1. But for dnn convolution, if there's no group, the weight's dimension
should be 4.

* fix: the blas wrapper has no scales (intel-analytics#2778)

* fix softmax (intel-analytics#2777)

* fix: performance regression on resnet50 (intel-analytics#2774)

the u8 to s8 or s8 to u8 needs no reorder on this case.

* fix log init (#2781)

* fix: dropout should init primitive (#2789)

* Docs update for spark 2.3, build 0.7 and deps exlude (intel-analytics#2671)

* flip to 0.9.0 (intel-analytics#2792)

* Improve Layer documentation v1 (#2767)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Loss Function docs improvement v1

* Improve Loss Function docs v2

* Improve Layers documentation

* Improve documentation on Activations

* minor fix

* Update a code section with python style on Metrics.md (intel-analytics#2665)

* [Fix] doc : some changes for scalaUserGuide and release links according to … (intel-analytics#2791)

* doc : some changes for scalaUserGuide and release links according to v0.8.0 release

* Update build-bigdl-core.md

* Update build-bigdl-core.md

* test: should compare the right grad input (intel-analytics#2794)

* fix the wrong error message (#2800)

* [New feature] Add attention layer and ffn layer (intel-analytics#2795)

* add attention layer

* add ffn layer and more unit tests

* refactor according to pr comments

* add SerializationTest

* fix unit tests

* add python api

* update readme with newly adopted mkl-dnn (#2803)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer (intel-analytics#2802)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer:
Add LARS optimizer: Layer-wise scaled. Also with utility functions to build a set of LARS optim for a container.

Bug fix: The gradient block id of AllReduceParameter is originally composed of {id}{pidTo}gradientBytes{pidFrom}. But the combination of {id}{pidTo} will cause ambiguity. e.g., "112" can be {1}{12} or {11}{2}. Now a "_" is added to separate id from pidTo

* refine documents, correctly set the lrSchedulerOwner bit

* format the added code

* make Lars inherit SGD

* rename Lars -> LarsSGD and reformat

* style changes

* bugfix - set mask for container (intel-analytics#2807)

* bugfix - set mask for container

* bugfix #2805: set dimension mask

* Update Graph.scala

* Update Graph.scala

* change set mask indicator's name

* rename set mask params

* [Enhancement]: Scala Reflection: get default value for constructor parameters (intel-analytics#2808)

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* resolve conflict

* resolve conflict

* code style check

* remove print

* fix typos

fix typos

* replace randomcropper with centercrop for better performance (#2818)

* fix: memory data hash code should contain data type (intel-analytics#2821)

* Optimize backward graph generation and CAddTable (intel-analytics#2817)

* Optimize backward graph generation and caddtable

* refine add table

* change api name

* add layer norm and expand size layers (#2819)

* add layer norm and expand size

* meet pr comments

* feat: enable global average pooling (intel-analytics#2823)

* feat: enable global average pooling

* test: add more unit tests

* Optimizers: use member variable in parent class

* Revert "Optimizers: use member variable in parent class"

This reverts commit 7e47204

* Dilation in MKL-DNN Convolution (intel-analytics#2815)

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* fix typos

fix typos

* make todo all uppercase

* fix: calculate arbitrary mask of scales (intel-analytics#2822)

* Use one AllReduceParameter for multi-optim method  training (intel-analytics#2814)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* change random seed in UT

* [New feature] add transformer layer (intel-analytics#2825)

* add transformer

* refactor class name

* use same embedding for translation

* fix pr comments

* [Bug Fix] Fix Issue 2734 (#2816)

* fix issue 2734

* fix issue 2734

* fix issue 2734

* [Refactor] Reflection Utilization (#2831)

* refactor reflection utils

* refactor reflection utils

* feat: MKLDNN LSTM unidirectional/bidirectional inference support (intel-analytics#2806)

* LSTM draft

* MKLDNN LSTM fixed MD

* added hiddenSize

* setMemoryData NativeData

* weights NativeData format set to ldigo, all 1 test passed

* fixed format any problem

* LSTM weights bias initialisation

* add LSTM2 in nn

* Bidirectional LSTM inference enabled

* modified Bidirectional test

* LSTMSpec input format conversion bug between bigdl and mkldnn fixed, not support random weights, bias

* fixed the last problem 1 3 2 4

* Three inference tests with randomly generated parameters

* Added comments and modified the LSTMSpec (tests using Equivalent.nearequals)

* Deleted nn/LSTM2. Renamed methods. Added a requirement in nn/TimeDistributed

* combined initMemoryDescs() into initFwdPrimitives()

* Add require for input size and hidden size matching if layers of LSTM is more than one

* Refactor RNN

* Add comment on gate order to mkldnn/RNN

* Add unidirectional multilayer test

* add comments/ modify UTs

* phase is not used anymore/ use isTraining() in stead

* operationWant enhanced/ weight init/ release() parameters()

* remove input format check and change some variables names

* input format check / throw exception print info / release code

* comment style and RNNSerialTest

* remove unnecessary comments

* Softmax -> SoftMax (#2837)

* bug fix for cmul (intel-analytics#2836)

* bug fix for cmul

* meet pr comments

* set new storage to weight and bias for weight fusion (intel-analytics#2839)

* Add parameter processor for LARS (#2832)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* implement lars whole layer gradient norm calculation

* change random seed in UT

* add limitation on "trust" of LARS, remove debug output

* reformat

* add tests in DirtriOptimizer for LARS

* reformat

* update parameters in UT

* update parameters in UT

* Add transformer to LM example (intel-analytics#2835)

* add transformer to LM example

* refactor dropout in Transformer

* meet pr comments

* feat: MKLDNN LSTM unidirectional/bidirectional backward support (#2840)

* MKLDNN LSTM backward support with accuracy testing

* fix: require consistent between shape and layout of mkldnn (intel-analytics#2824)

* fix: fusion for multi-group of convolution (intel-analytics#2826)

* fix: support int8 of jointable (#2827)

* fix: support int8 of jointable
* doc: add more docs

* fix: invokeAndWait2 should throw the exception in the tasks (intel-analytics#2843)

* fix acc bug & init dnn thread (intel-analytics#2841)

* support tnc and ntc conversion (intel-analytics#2844)

* support ntc in dnn layer (intel-analytics#2847)

* support ntc in dnn layer

* meet pr comments

* [WIP]Add beam search feature in transformer model (intel-analytics#2834)

* add beam search feature

* Update beam search feature and unit test

* add symbolToLogits function set check

* update clearState and add serial test

* add SequenceBeamSearch to python layers

* add createSequenceBeamSearch method to python api

* feat: add a property to disable omp thread affinity (intel-analytics#2849)

* fix: use treeset to calc topk to upgrade the performance of DetectionOutputSSD (intel-analytics#2853)

* fix: wrong affinity settings (intel-analytics#2857)

* update beam search feature for interface with transformer model (#2855)

* update beam search for padding value and cache structure

* update python API for beam search

* add comments and update python layer

* modify comments format

* modify comments format

* Support converting blas lstm to dnn lstm (#2846)

* convert from blas lstm to dnn lstm

* meet pr comments

* fix load lstm error bug (intel-analytics#2858)

* Add beam search in transformer (intel-analytics#2856)

* Add beam search in transformer

* meet pr comments

* fix: upgrade the performance of normalize (intel-analytics#2854)

* feat: add axis to softmax (intel-analytics#2859)

* add release doc for 0.9 (intel-analytics#2862)

* fix: update core ref to master (intel-analytics#2865)

* flip version to 0.10.0 (intel-analytics#2869)

* [Bug Fix] - Fix module version comparison  (intel-analytics#2871)

* update serialization

* update serialization

* convert IRgraph momentum to mkldnn (intel-analytics#2872)

* tutorial fix (intel-analytics#2879)

* feat: RoiAlign Forward (intel-analytics#2874)

* Add set input output format API in Python (intel-analytics#2880)

* add set input output format

* add static graph check

* feat: Feature Pyramid Networks Forward (intel-analytics#2870)

* fix memory leak for ir graph training (intel-analytics#2895)

* add gemm layer (#2882)

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add gemm layer

* add gemm layer

* add Shape layer (intel-analytics#2885)

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add Gather layer (intel-analytics#2897)

* add gather layer

* [New feature] Add maskhead (intel-analytics#2892)

* support for maskhead

* fix unit tests (intel-analytics#2905)

* modify  predict/predictClass function  (#2868)

* predictClass output modification

* predict/predictClass function modification in Beta Api

* predict/predictClass function modification

* predict/predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* [New feature] Add Boxhead (intel-analytics#2894)

* add boxhead

* add SerialTest

* meet pr comments

* fix: Add TopBlocks to Feature Pyramid Networks (FPN) (#2899)

* Add Mean Average Precision validation method (intel-analytics#2906)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* fix boxhead unit tests (#2912)

* python api nested list input and pooler python api (intel-analytics#2900)

* Auto memory management for MKLDNN (#2867)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* style fixes

* change _implicitMemoryOwner -> _this

* [New feature] Add region proposal (intel-analytics#2896)

* add Regionproposal

* [New feature] add maskrcnn (#2908)

* add maskrcnn

* fix mask head

* move maskrcnn to models

* add maskrcnn serialTest

* Add Onnx Supported Layers (intel-analytics#2902)

* remove duplicated layers

* Update RoiLabel class and add RoiImageFeatureToBatch (intel-analytics#2913)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* update RoiLabel, add RoiImageFeatureToBatch

* fix typo in class name

* updates by suggestions

* minor updates

* Move RoiMiniBatch to MTImageFeatureToBatch.scala

* mask in RoiLabel now have Floats not Bytes

* use IndexedSeq for RoiLabel

* style fix

* add isCrowd and origSize to final target table

* style fix

* isCrowd change to float, add doc

* add tests and bug fixes

* add util getting RoiLabels from ImageFeatures

* add util getting RoiLabels from Table

* comment out the tests

* rename utils in RoiLabel

* feat: MKLDNN GRU forward/backward support (#2893)

* Onnx support: modify unsqueeze function (#2910)

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* add maskutils (intel-analytics#2921)

* add maskutils

* update tests & docs

* fix typo in document

* Fix memory leaks on training (intel-analytics#2914)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* fix memory leak in batch norm

* style fixes

* change _implicitMemoryOwner -> _this

* release submat

* release opencv submats

* support samples with different size  to one mini batch (intel-analytics#2929)

* add to batch with resize

* meet comments

* support batch for mask head and pooler (intel-analytics#2926)

* support batch for mask head

* meet comments

* Onnx support: add a dim parameter to ops.Gather (intel-analytics#2920)

* add dim parameter to ops.Gather

* improve and simplify code

* improve and simplify code

* improve and simplify code

* improve and simplify code

* support batch for regionproposal (#2928)

* support batch for regionproposal

* enable gru blas-to-dnn conversion (intel-analytics#2930)

* Onnx support: add pos parameter to softmax (intel-analytics#2933)

* add pos parameter to softmax

* add pos parameter to softmax

* add pos parameter to softmax

* fix review problem

* fix review problem

* Add resize for segmentation (intel-analytics#2923)

* add resize for segmentation

* meet pr comments

* support batch input for boxhead (#2924)

* boxhead support batch input

* meet pr comments

* COCO SeqFile (intel-analytics#2927)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* ignore non-existing images

* updates based on GH comments

* ONNX Support (#2918)

* onnx dev

* add onnx loader

* clean up

* feat: add precision recall auc (#2941)

* feat: add precision recall auc

* add post processing for maskrcnn model (#2931)

* add mask postprocessing

* put image info to mask model

* fix TimeDistributedCriterion() lack of parameter of dimension issue (intel-analytics#2940)

* revert back api (intel-analytics#2943)

* fix: softmax and bn+scale fusion (intel-analytics#2937)

* feat: multi models support with MKL-DNN backend (intel-analytics#2936)

* feat: multi models support with MKL-DNN backend

* add COCO MAP (#2935)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* add COCO MAP

* revert merge conflict

* ignore non-existing images

* add IOU related API. MAP now parses RLEs

* BBox now inclusive

* updates based on GH comments

* add COCODataset.getImageById

* COCO topK default => -1, remove height: Int, width: Int in GroundTruthRLE

* update imageId2Image

* rename MAPObjectDetection utils, add cocoSegmentationAndBBox, refine formatting

* rename utils

* update documents

* check size of bbox & classes & scores & labels & iscrowd. Handle empty predictions

* add gt and target image size checking, add support for empty target bbox, add UT

* detection sorted before matching with GT. Optimize MAPResult merging. Add UT for merging

* COCO Seq file reader: grey to bgr (intel-analytics#2942)

* grey to bgr

* refactor isGrayScaleImage

* simplify grey scale image checking

* Add the flushing denormal values option on BigDL side (#2934)

* add no argument apply api for softmax (intel-analytics#2945)

* add no argument apply api for softmax

* add no argument apply api for softmax

* ONNX ResNet example (intel-analytics#2939)

* add onnx resnet example

* add doc for onnx

* add doc for onnx

* clean up

* add maskrcnn inference example (intel-analytics#2944)

* add maskrcnn inference example

* meet pr comments

* add model download url

* Update the RoiLabel and MTImageFeatureToBatch (intel-analytics#2925)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* Python MKLDNN examples for CNN(LeNet) and RNN(LSTM) (#2932)

* fix: takeSample only works for dnn backend and get one batch (intel-analytics#2947)

* fix: takeSample only works for dnn backend and get one batch

* edit doc (#2948)

* Rename filesToRoiImageFrame to filesToRoiImageFeatures (intel-analytics#2949)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* filesToRoiImageFrame -> filesToRoiImageFeatures, to public

* fix: move out setMklThreads of MklDnn (intel-analytics#2950)

* memory data cleanup (#2956)

* memory data cleanup

* Onnx support: RoiAlign and TopK parameter update (#2957)

* Topk add dim and increase parameter

* RoiAlign add max pooling mode

* add test cases

* add test cases

* remove masks requirements (intel-analytics#2959)

* fix: the squeeze should not be included in IRElement (intel-analytics#2962)

* enhance COCODataset (#2954)

* enhance COCODataset:
Add COCODataset.loadFromSeqFile
Add COCODataset.toImageFeatures
Add COCOImage.toTable

* rename and polish doc

* fix COCO serialize bug

* fix typo in function name

* typo fix (intel-analytics#2965)

* rename RoiImageFeatureToBatch APIs (#2964)

* RoiMiniBatch enhancement (#2953)

* SerializableIndexedSeq

* allow empty target & image size info

* rename RoiImageFeatureToBatch APIs

* set as private

* change back to array

* MTImageFeatureToBatch without labels

* handle iscrowd

* remove duplication in merge

* feat: add softmax backward (intel-analytics#2967)

* feat: add softmax backward

* fix: fuse bn scale and relu to bn. (intel-analytics#2966)

* fix: fuse bn scale and relu.

* fix mask unit tests (intel-analytics#2973)

* fix: nms stability when using treeset. (intel-analytics#2972)

* flip version to 0.11 (intel-analytics#2974)

* refactor anchor generator (#2963)

* refactor anchor generator

* meet pr comments

* fix code style

* ROIAlign refactor (intel-analytics#2960)

* ROIAlign refactor

* fix unit tests

* fix model load of maskrcnn (intel-analytics#2961)

* fix maskrcnn model load

* delete temp file

* fix maskrcnn tests

* support roialign backward (intel-analytics#2975)

* support roialign backward

* fix sparselinear unit test

* fix: bn nhwc error, the channel should be the last dim (#2981)

* refactor: move torch relevants unit tests to integration tests. (intel-analytics#2971)

* fix: enable integration accuracy tests (intel-analytics#2976)

* fix: softmax dnn backend wrong order of primitive (intel-analytics#2986)

* modify TextClassifier.scala (#2987)

* Add a method to merge nested StaticGraphs (intel-analytics#2985)

* NHWC support when running with MKL-DNN (#2989)

* support NHWC for MKLDNN

* fix unit tests

* Keras with MKL-DNN backend support (#2990)

* Update README.md

* Update README.md

* feat: add distri optimizer v2 (intel-analytics#2992)

* update error message in AllReduceParameter (#2997)

* update error message in AllReduceParameter

* use tensorflow proto jar (#2994)

* fix callBigDLFunc (intel-analytics#3002)

* Remove final for AbstractModule (intel-analytics#3001)

* DistriOptimizerV2 argument (intel-analytics#3003)

* call DistriOptimizerV2

* fix inception (intel-analytics#3010)

* fix top1 and treenn (intel-analytics#3011)

* remove final setExtraParameters (#3014)

* move pretrain in DistriOptimizerV2 (intel-analytics#3016)

* move getData

* rename

* remove time counting

* deprecate dlframe (intel-analytics#3012)

* deprecate dlframe

* fix throughput (#3017)

* fix throughput

* update

* add release doc for 0.10.0 (intel-analytics#3020)

* test examples by distrioptimizerv2 (intel-analytics#3007)

* enable scala examples by distrioptimizerv2

* update example's readme

* update integration test

* test python examples by distriOptimizerV2 (intel-analytics#3008)

* Test python examples by distriOptimizerV2

* deprecate nn.keras (intel-analytics#3013)

* deprecate nn.keras

* fix loss when minibatch size is different (intel-analytics#3021)

* fix loss

* fix ut

* fix style check (intel-analytics#3022)

* specify pyspark version (intel-analytics#3030)

* specify pyspark version

* add release doc for 0.11 (#3026)

* flip version to 0.12 (intel-analytics#3029)



* update

* fix KerasLayer new parameters() (#3034)

* Fix analytics zoo protobuf shading problem (intel-analytics#3033)

* change shade name and remove protobuf-java (already introduced by tf)

* remove protobuf

* add required dependencies (#3047)

* update doc (intel-analytics#3056)

* Updatedoc (#3060)

* Update install-from-pip.md

* [WIP] spark 3.0 (intel-analytics#3054)

* spark 3.0

* add spark3.0 deployment (intel-analytics#3061)

* add spark3.0 deployment

* add warning to remind Optimizer() deprecates (intel-analytics#3062)

* add warning to remind deprecates

* Update scala maven plugin (#3068)

* update scala maven plugin

* change to public (#3064)

* Add big model support (#3067)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* squeeze target dimension (corner case) in ClassNLLCriterion (intel-analytics#3072)

* fix target dimension match error

* update message (#3073)

* flip version to 0.13-snapshot (intel-analytics#3074)

* flip version to 0.13-snapshot

* Uncompressed Tensor  (intel-analytics#3079)

* support no compressing parameter

* address comments

* hotfix ClassNLLCriterion with cloned target (#3081)

* hotfix ClassNLLCriterion with cloned target

* Fix SerializationUtils clone issue of QuantizedTensor (intel-analytics#3088)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* update clone quantizedtensor

* update

* add OptimPredictorShutdownSpec UT in integration test (#3089)

* move integration UT to a general test script (intel-analytics#3094)

* back port master (intel-analytics#3096)

* set seed to avoid random error in PredictionServiceUT (intel-analytics#3097)

* Jdk11 support (intel-analytics#3098)

* update for jdk 11 support and doc

* add serializeUid (intel-analytics#3099)

* update doc (intel-analytics#3104)

* add doc for running in ide (intel-analytics#3106)

* fix callBigDLFunc return a Int while the true return value from java is a byte array. (intel-analytics#3111)

* add list of df support (intel-analytics#3113)

* Update readme (intel-analytics#3118)

* Update index.md

* add 0.12.2 release download (#3122)

* remove DLFrames (intel-analytics#3124)

* remove DLFrames

* update

* update

* update

* rm dlframe example from test script

* Add Utest about dividing zero (#3128)

* Add Utest about dividing zero

* add Utest and zero check of LocalData

* add Utest and zero check of LocalData

* change

* Add Utest about dividing zero

* fix test

* add python3 to Dockerfile (intel-analytics#3132)

* add python3 to Dockerfile

* update

* update jdk

* update

* make default DistriOptimizer as V2 (intel-analytics#3129)

* make default DistriOptimizer as V2

* update

* fix dlframe (intel-analytics#3133)

* DistriOptimizerV2 logger (intel-analytics#3135)

* DistriOptimizerV2 logger

* update

* fix style check

* validate epoch num

* move dlframe SharedParamsApater to AZ and roll back to OptimizerV1 (intel-analytics#3137)

* upgrade spark version (intel-analytics#3138)

* Update deploy-spark2.sh

* 0.13 release doc (#3144)

* upgrade log4j (intel-analytics#3141)

* flip0.14 (intel-analytics#3142)

* flip0.14

* update

* Update deploy-spark3.sh (#3145)

* update

* update

* update

* update

* fix make dist

* migrate path

* update

* update

Co-authored-by: zhangxiaoli73 <380761639@qq.com>
Co-authored-by: Jerry Wu <wzhongyuan@gmail.com>
Co-authored-by: Xin Qiu <qiuxin2012@users.noreply.github.com>
Co-authored-by: Yanzhang Wang <i8run15@gmail.com>
Co-authored-by: GenBrg <34305977+GenBrg@users.noreply.github.com>
Co-authored-by: LeicongLi <leicongli@gmail.com>
Co-authored-by: Emiliano Martinez <emimartinez.sanchez@gmail.com>
Co-authored-by: abdolence <abdulla.abd.m@gmail.com>
Co-authored-by: Enrique Garcia <engapa@gmail.com>
Co-authored-by: Louie Tsai <louie.tsai@intel.com>
Co-authored-by: yaochi <yaochitc@gmail.com>
Co-authored-by: Menooker <Menooker@users.noreply.github.com>
Co-authored-by: Menooker <myjisgreat@live.cn>
Co-authored-by: Firecrackerxox <mengceng.he@intel.com>
Co-authored-by: majing921201 <1834475657@qq.com>
Co-authored-by: jenniew <jenniewang123@gmail.com>
Co-authored-by: Xiao <lingxiao1989@gmail.com>
Co-authored-by: Firecrackerxox <he044646@sina.com>
Co-authored-by: Hui Li <lihuibinghan@sina.com>
Co-authored-by: Jason Dai <jason.dai@intel.com>
Co-authored-by: dding3 <ding.ding@intel.com>
Co-authored-by: Yang Wang <yang3.wang@intel.com>
Co-authored-by: Yina Chen <33650826+cyita@users.noreply.github.com>
Co-authored-by: Hangrui Cao <50705298+DiegoCao@users.noreply.github.com>
Co-authored-by: pinggao18 <44043817+pinggao18@users.noreply.github.com>
Le-Zheng added a commit to Le-Zheng/analytics-zoo that referenced this pull request Sep 10, 2021
* convert static graph to IR graph and build (intel-analytics#2711)

* add static graph to IR graph

* meet pr comments

* [Enhancement] - Enhance unig test to avoid dynamic resource allocation issue by docker (intel-analytics#2713)

* make the core number fixed

* fix local predictor

* add Trigger and/or python API (intel-analytics#2682)

* add spark 2.4 support (intel-analytics#2715)

* update sparse tensor's document (#2714)

* Reserve all state in OptimMethod when calling Optimizer.optimize() multiple times (#2648)

* reserve optimMethod for each worker

* add valdiation throughput

* cache variable previousOptim

* fix: move mkldnn computing to a single thread pool (intel-analytics#2724)

Because if we use the parent thread directly, there will be two bugs,
1. The child threads forked from parent thread will be bound to core 0
because of the affinity settings.
2. The native thread has some unknown thread local variables. So if
the parent thread exits and is recreated, such as the thread from
Executors.newFixedThreadPool. The whole app will be segment fault.
The parent thread means the main thread (Local Mode) or worker thread of
mapPartition (Distributed Mode).

* add ceilMode for Pooling & fix batchNorm evaluate (#2708)

* add ceilMode for Pooling & fix batchNorm evaluate

* add training status for dnn layer

* fix comments

* fix IRGraph init & Add regualizer (#2736)

* fix IRGraph init & Add regualizer

* meet review comments

* fix: update mkldnn version to v0.17 issues. (intel-analytics#2712)

There're two issues,

1. the padding tensor required. mkl-dnn will use a padding tensor which
    will use more memory, such as 4x1x28x28 to 4x8x28x28(avx2). It will
    pad to times of simd width.
2. the TensorMMap between DenseTensor and DnnTensor. Previous impl
    will allocate DnnTensor when model is created, which will cost too much
    space. So this patch will allocate it at runtime.

* add computshape for some layers and add skip primitives in DnnGraph (intel-analytics#2740)

* add computshape for some layer and add skip primitives in DnnGraph

* meet pr comments

* Improve documentation (intel-analytics#2745)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* include edge case to cover all the data types (#2742)

* layer auto fusion for dnn graph (intel-analytics#2746)

* add auto fusion in dnn graph

* refactor predict for dnn model (intel-analytics#2737)

* refactor predict for dnn model

* remove some unit tests (intel-analytics#2752)

* remove some conflict tests (#2753)

* Update documentation (intel-analytics#2749)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Fix Add operation error when type is Double importing Tensorflow graph (#2721)

* feature: add byte supports for DnnTensor (intel-analytics#2751)

* feat: add byte supports for DnnTensor

* [New Feature] Calculating Scales (#2750)

* [New Feature]Calculating Scales

* recursively update mask for container module (intel-analytics#2754)

* recursively update mask for container module

* [Enhancement] - Speed up BlasWrapper performance under MKL-DNN (intel-analytics#2748)

* add parallel in Blaswrapper

* refactor to support ssd

* meet pr comments

* fix logger serialize

* Loss Function docs improvement (intel-analytics#2757)

* Improve Loss Function docs v2

* change asInstanceOf to toDistirbuted in optimizer (#2755)

* change asInstanceOf to toDistirbuted

* change asInstanceOf to toDistirbuted

* convert scale in blas to dnn (#2758)

* convert scale in blas to dnn

* meet pr comment

* feat: reorder for int8 supports (#2756)

1. Because the new data type, we should add a new attribute called dataType
    to the `MemoryData`.
2. Because we should transfer the scales between FP32->int8 and Int8->FP32.
    we should add two new attributes called `mask` and `scales`.

* fix conversion accuracy (intel-analytics#2760)

*  fix accuracy for saved model

* exclude mkldnn model when conversion

* feature: layer wise supports of int8 (intel-analytics#2762)

Enable the int8 data type in layers, especially for convolutions.
So for a specific layer, it can accept a int8 input. If you want to the fp32
output, should add a reorder.

* feature: mkldnn int8 layer wise supports (intel-analytics#2759)

including 3 steps.

1. generate scales of model.
   need an api like `generateScalesWithMask` to generate the scales of
   fp32 model. and the model returned is an fp32 model too.
2. quantize the model
   the `quantize()` api will be compatible with the `bigquant`
   backend, which will set the quantize flag. And when doing compile,
   the quantized weight, output, input will be generated by mkldnn at
   runtime.
3. do the inference (forward).

* update readme for v1 training (intel-analytics#2763)

* update release doc for preparation (intel-analytics#2764)

* change some docs about mkldnn (intel-analytics#2765)

* add comments about mkldnn

* meet pr comments

* examples for int8 (intel-analytics#2761)

This is an example of how to use mkldnn int8. There're two steps, use
GenInt8Scales to generate the scales first and save the new model. And than you
can use the quantized model as usual.

* enable fustion by default (intel-analytics#2766)

* fix: the influence of default value of fusion (#2768)

* fix: use too much memory of mkldnn models (intel-analytics#2783)

* fix: inplace of input/output and weight dimension error (intel-analytics#2779)

Some layer's input and output use the same memory. We can't do forward in the
`calcScales`. Because at that time, the input has been changed, its scales maybe
not right. Such as,

Seqeuntail().add(Conv).add(ReLU)

it will do two steps, seq.forward(input) first. and when go into the ReLU, it
will do another forward, so the input will be the output. And scales will be
wrong.

For convolution's weight, the dimension always is 5, although the group number
is 1. But for dnn convolution, if there's no group, the weight's dimension
should be 4.

* fix: the blas wrapper has no scales (intel-analytics#2778)

* fix softmax (intel-analytics#2777)

* fix: performance regression on resnet50 (intel-analytics#2774)

the u8 to s8 or s8 to u8 needs no reorder on this case.

* fix log init (#2781)

* fix: dropout should init primitive (#2789)

* Docs update for spark 2.3, build 0.7 and deps exlude (intel-analytics#2671)

* flip to 0.9.0 (intel-analytics#2792)

* Improve Layer documentation v1 (#2767)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Loss Function docs improvement v1

* Improve Loss Function docs v2

* Improve Layers documentation

* Improve documentation on Activations

* minor fix

* Update a code section with python style on Metrics.md (intel-analytics#2665)

* [Fix] doc : some changes for scalaUserGuide and release links according to … (intel-analytics#2791)

* doc : some changes for scalaUserGuide and release links according to v0.8.0 release

* Update build-bigdl-core.md

* Update build-bigdl-core.md

* test: should compare the right grad input (intel-analytics#2794)

* fix the wrong error message (#2800)

* [New feature] Add attention layer and ffn layer (intel-analytics#2795)

* add attention layer

* add ffn layer and more unit tests

* refactor according to pr comments

* add SerializationTest

* fix unit tests

* add python api

* update readme with newly adopted mkl-dnn (#2803)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer (intel-analytics#2802)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer:
Add LARS optimizer: Layer-wise scaled. Also with utility functions to build a set of LARS optim for a container.

Bug fix: The gradient block id of AllReduceParameter is originally composed of {id}{pidTo}gradientBytes{pidFrom}. But the combination of {id}{pidTo} will cause ambiguity. e.g., "112" can be {1}{12} or {11}{2}. Now a "_" is added to separate id from pidTo

* refine documents, correctly set the lrSchedulerOwner bit

* format the added code

* make Lars inherit SGD

* rename Lars -> LarsSGD and reformat

* style changes

* bugfix - set mask for container (intel-analytics#2807)

* bugfix - set mask for container

* bugfix #2805: set dimension mask

* Update Graph.scala

* Update Graph.scala

* change set mask indicator's name

* rename set mask params

* [Enhancement]: Scala Reflection: get default value for constructor parameters (intel-analytics#2808)

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* resolve conflict

* resolve conflict

* code style check

* remove print

* fix typos

fix typos

* replace randomcropper with centercrop for better performance (#2818)

* fix: memory data hash code should contain data type (intel-analytics#2821)

* Optimize backward graph generation and CAddTable (intel-analytics#2817)

* Optimize backward graph generation and caddtable

* refine add table

* change api name

* add layer norm and expand size layers (#2819)

* add layer norm and expand size

* meet pr comments

* feat: enable global average pooling (intel-analytics#2823)

* feat: enable global average pooling

* test: add more unit tests

* Optimizers: use member variable in parent class

* Revert "Optimizers: use member variable in parent class"

This reverts commit 7e47204

* Dilation in MKL-DNN Convolution (intel-analytics#2815)

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* fix typos

fix typos

* make todo all uppercase

* fix: calculate arbitrary mask of scales (intel-analytics#2822)

* Use one AllReduceParameter for multi-optim method  training (intel-analytics#2814)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* change random seed in UT

* [New feature] add transformer layer (intel-analytics#2825)

* add transformer

* refactor class name

* use same embedding for translation

* fix pr comments

* [Bug Fix] Fix Issue 2734 (#2816)

* fix issue 2734

* fix issue 2734

* fix issue 2734

* [Refactor] Reflection Utilization (#2831)

* refactor reflection utils

* refactor reflection utils

* feat: MKLDNN LSTM unidirectional/bidirectional inference support (intel-analytics#2806)

* LSTM draft

* MKLDNN LSTM fixed MD

* added hiddenSize

* setMemoryData NativeData

* weights NativeData format set to ldigo, all 1 test passed

* fixed format any problem

* LSTM weights bias initialisation

* add LSTM2 in nn

* Bidirectional LSTM inference enabled

* modified Bidirectional test

* LSTMSpec input format conversion bug between bigdl and mkldnn fixed, not support random weights, bias

* fixed the last problem 1 3 2 4

* Three inference tests with randomly generated parameters

* Added comments and modified the LSTMSpec (tests using Equivalent.nearequals)

* Deleted nn/LSTM2. Renamed methods. Added a requirement in nn/TimeDistributed

* combined initMemoryDescs() into initFwdPrimitives()

* Add require for input size and hidden size matching if layers of LSTM is more than one

* Refactor RNN

* Add comment on gate order to mkldnn/RNN

* Add unidirectional multilayer test

* add comments/ modify UTs

* phase is not used anymore/ use isTraining() in stead

* operationWant enhanced/ weight init/ release() parameters()

* remove input format check and change some variables names

* input format check / throw exception print info / release code

* comment style and RNNSerialTest

* remove unnecessary comments

* Softmax -> SoftMax (#2837)

* bug fix for cmul (intel-analytics#2836)

* bug fix for cmul

* meet pr comments

* set new storage to weight and bias for weight fusion (intel-analytics#2839)

* Add parameter processor for LARS (#2832)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* implement lars whole layer gradient norm calculation

* change random seed in UT

* add limitation on "trust" of LARS, remove debug output

* reformat

* add tests in DirtriOptimizer for LARS

* reformat

* update parameters in UT

* update parameters in UT

* Add transformer to LM example (intel-analytics#2835)

* add transformer to LM example

* refactor dropout in Transformer

* meet pr comments

* feat: MKLDNN LSTM unidirectional/bidirectional backward support (#2840)

* MKLDNN LSTM backward support with accuracy testing

* fix: require consistent between shape and layout of mkldnn (intel-analytics#2824)

* fix: fusion for multi-group of convolution (intel-analytics#2826)

* fix: support int8 of jointable (#2827)

* fix: support int8 of jointable
* doc: add more docs

* fix: invokeAndWait2 should throw the exception in the tasks (intel-analytics#2843)

* fix acc bug & init dnn thread (intel-analytics#2841)

* support tnc and ntc conversion (intel-analytics#2844)

* support ntc in dnn layer (intel-analytics#2847)

* support ntc in dnn layer

* meet pr comments

* [WIP]Add beam search feature in transformer model (intel-analytics#2834)

* add beam search feature

* Update beam search feature and unit test

* add symbolToLogits function set check

* update clearState and add serial test

* add SequenceBeamSearch to python layers

* add createSequenceBeamSearch method to python api

* feat: add a property to disable omp thread affinity (intel-analytics#2849)

* fix: use treeset to calc topk to upgrade the performance of DetectionOutputSSD (intel-analytics#2853)

* fix: wrong affinity settings (intel-analytics#2857)

* update beam search feature for interface with transformer model (#2855)

* update beam search for padding value and cache structure

* update python API for beam search

* add comments and update python layer

* modify comments format

* modify comments format

* Support converting blas lstm to dnn lstm (#2846)

* convert from blas lstm to dnn lstm

* meet pr comments

* fix load lstm error bug (intel-analytics#2858)

* Add beam search in transformer (intel-analytics#2856)

* Add beam search in transformer

* meet pr comments

* fix: upgrade the performance of normalize (intel-analytics#2854)

* feat: add axis to softmax (intel-analytics#2859)

* add release doc for 0.9 (intel-analytics#2862)

* fix: update core ref to master (intel-analytics#2865)

* flip version to 0.10.0 (intel-analytics#2869)

* [Bug Fix] - Fix module version comparison  (intel-analytics#2871)

* update serialization

* update serialization

* convert IRgraph momentum to mkldnn (intel-analytics#2872)

* tutorial fix (intel-analytics#2879)

* feat: RoiAlign Forward (intel-analytics#2874)

* Add set input output format API in Python (intel-analytics#2880)

* add set input output format

* add static graph check

* feat: Feature Pyramid Networks Forward (intel-analytics#2870)

* fix memory leak for ir graph training (intel-analytics#2895)

* add gemm layer (#2882)

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add gemm layer

* add gemm layer

* add Shape layer (intel-analytics#2885)

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add Gather layer (intel-analytics#2897)

* add gather layer

* [New feature] Add maskhead (intel-analytics#2892)

* support for maskhead

* fix unit tests (intel-analytics#2905)

* modify  predict/predictClass function  (#2868)

* predictClass output modification

* predict/predictClass function modification in Beta Api

* predict/predictClass function modification

* predict/predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* [New feature] Add Boxhead (intel-analytics#2894)

* add boxhead

* add SerialTest

* meet pr comments

* fix: Add TopBlocks to Feature Pyramid Networks (FPN) (#2899)

* Add Mean Average Precision validation method (intel-analytics#2906)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* fix boxhead unit tests (#2912)

* python api nested list input and pooler python api (intel-analytics#2900)

* Auto memory management for MKLDNN (#2867)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* style fixes

* change _implicitMemoryOwner -> _this

* [New feature] Add region proposal (intel-analytics#2896)

* add Regionproposal

* [New feature] add maskrcnn (#2908)

* add maskrcnn

* fix mask head

* move maskrcnn to models

* add maskrcnn serialTest

* Add Onnx Supported Layers (intel-analytics#2902)

* remove duplicated layers

* Update RoiLabel class and add RoiImageFeatureToBatch (intel-analytics#2913)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* update RoiLabel, add RoiImageFeatureToBatch

* fix typo in class name

* updates by suggestions

* minor updates

* Move RoiMiniBatch to MTImageFeatureToBatch.scala

* mask in RoiLabel now have Floats not Bytes

* use IndexedSeq for RoiLabel

* style fix

* add isCrowd and origSize to final target table

* style fix

* isCrowd change to float, add doc

* add tests and bug fixes

* add util getting RoiLabels from ImageFeatures

* add util getting RoiLabels from Table

* comment out the tests

* rename utils in RoiLabel

* feat: MKLDNN GRU forward/backward support (#2893)

* Onnx support: modify unsqueeze function (#2910)

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* add maskutils (intel-analytics#2921)

* add maskutils

* update tests & docs

* fix typo in document

* Fix memory leaks on training (intel-analytics#2914)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* fix memory leak in batch norm

* style fixes

* change _implicitMemoryOwner -> _this

* release submat

* release opencv submats

* support samples with different size  to one mini batch (intel-analytics#2929)

* add to batch with resize

* meet comments

* support batch for mask head and pooler (intel-analytics#2926)

* support batch for mask head

* meet comments

* Onnx support: add a dim parameter to ops.Gather (intel-analytics#2920)

* add dim parameter to ops.Gather

* improve and simplify code

* improve and simplify code

* improve and simplify code

* improve and simplify code

* support batch for regionproposal (#2928)

* support batch for regionproposal

* enable gru blas-to-dnn conversion (intel-analytics#2930)

* Onnx support: add pos parameter to softmax (intel-analytics#2933)

* add pos parameter to softmax

* add pos parameter to softmax

* add pos parameter to softmax

* fix review problem

* fix review problem

* Add resize for segmentation (intel-analytics#2923)

* add resize for segmentation

* meet pr comments

* support batch input for boxhead (#2924)

* boxhead support batch input

* meet pr comments

* COCO SeqFile (intel-analytics#2927)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* ignore non-existing images

* updates based on GH comments

* ONNX Support (#2918)

* onnx dev

* add onnx loader

* clean up

* feat: add precision recall auc (#2941)

* feat: add precision recall auc

* add post processing for maskrcnn model (#2931)

* add mask postprocessing

* put image info to mask model

* fix TimeDistributedCriterion() lack of parameter of dimension issue (intel-analytics#2940)

* revert back api (intel-analytics#2943)

* fix: softmax and bn+scale fusion (intel-analytics#2937)

* feat: multi models support with MKL-DNN backend (intel-analytics#2936)

* feat: multi models support with MKL-DNN backend

* add COCO MAP (#2935)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* add COCO MAP

* revert merge conflict

* ignore non-existing images

* add IOU related API. MAP now parses RLEs

* BBox now inclusive

* updates based on GH comments

* add COCODataset.getImageById

* COCO topK default => -1, remove height: Int, width: Int in GroundTruthRLE

* update imageId2Image

* rename MAPObjectDetection utils, add cocoSegmentationAndBBox, refine formatting

* rename utils

* update documents

* check size of bbox & classes & scores & labels & iscrowd. Handle empty predictions

* add gt and target image size checking, add support for empty target bbox, add UT

* detection sorted before matching with GT. Optimize MAPResult merging. Add UT for merging

* COCO Seq file reader: grey to bgr (intel-analytics#2942)

* grey to bgr

* refactor isGrayScaleImage

* simplify grey scale image checking

* Add the flushing denormal values option on BigDL side (#2934)

* add no argument apply api for softmax (intel-analytics#2945)

* add no argument apply api for softmax

* add no argument apply api for softmax

* ONNX ResNet example (intel-analytics#2939)

* add onnx resnet example

* add doc for onnx

* add doc for onnx

* clean up

* add maskrcnn inference example (intel-analytics#2944)

* add maskrcnn inference example

* meet pr comments

* add model download url

* Update the RoiLabel and MTImageFeatureToBatch (intel-analytics#2925)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* Python MKLDNN examples for CNN(LeNet) and RNN(LSTM) (#2932)

* fix: takeSample only works for dnn backend and get one batch (intel-analytics#2947)

* fix: takeSample only works for dnn backend and get one batch

* edit doc (#2948)

* Rename filesToRoiImageFrame to filesToRoiImageFeatures (intel-analytics#2949)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* filesToRoiImageFrame -> filesToRoiImageFeatures, to public

* fix: move out setMklThreads of MklDnn (intel-analytics#2950)

* memory data cleanup (#2956)

* memory data cleanup

* Onnx support: RoiAlign and TopK parameter update (#2957)

* Topk add dim and increase parameter

* RoiAlign add max pooling mode

* add test cases

* add test cases

* remove masks requirements (intel-analytics#2959)

* fix: the squeeze should not be included in IRElement (intel-analytics#2962)

* enhance COCODataset (#2954)

* enhance COCODataset:
Add COCODataset.loadFromSeqFile
Add COCODataset.toImageFeatures
Add COCOImage.toTable

* rename and polish doc

* fix COCO serialize bug

* fix typo in function name

* typo fix (intel-analytics#2965)

* rename RoiImageFeatureToBatch APIs (#2964)

* RoiMiniBatch enhancement (#2953)

* SerializableIndexedSeq

* allow empty target & image size info

* rename RoiImageFeatureToBatch APIs

* set as private

* change back to array

* MTImageFeatureToBatch without labels

* handle iscrowd

* remove duplication in merge

* feat: add softmax backward (intel-analytics#2967)

* feat: add softmax backward

* fix: fuse bn scale and relu to bn. (intel-analytics#2966)

* fix: fuse bn scale and relu.

* fix mask unit tests (intel-analytics#2973)

* fix: nms stability when using treeset. (intel-analytics#2972)

* flip version to 0.11 (intel-analytics#2974)

* refactor anchor generator (#2963)

* refactor anchor generator

* meet pr comments

* fix code style

* ROIAlign refactor (intel-analytics#2960)

* ROIAlign refactor

* fix unit tests

* fix model load of maskrcnn (intel-analytics#2961)

* fix maskrcnn model load

* delete temp file

* fix maskrcnn tests

* support roialign backward (intel-analytics#2975)

* support roialign backward

* fix sparselinear unit test

* fix: bn nhwc error, the channel should be the last dim (#2981)

* refactor: move torch relevants unit tests to integration tests. (intel-analytics#2971)

* fix: enable integration accuracy tests (intel-analytics#2976)

* fix: softmax dnn backend wrong order of primitive (intel-analytics#2986)

* modify TextClassifier.scala (#2987)

* Add a method to merge nested StaticGraphs (intel-analytics#2985)

* NHWC support when running with MKL-DNN (#2989)

* support NHWC for MKLDNN

* fix unit tests

* Keras with MKL-DNN backend support (#2990)

* Update README.md

* Update README.md

* feat: add distri optimizer v2 (intel-analytics#2992)

* update error message in AllReduceParameter (#2997)

* update error message in AllReduceParameter

* use tensorflow proto jar (#2994)

* fix callBigDLFunc (intel-analytics#3002)

* Remove final for AbstractModule (intel-analytics#3001)

* DistriOptimizerV2 argument (intel-analytics#3003)

* call DistriOptimizerV2

* fix inception (intel-analytics#3010)

* fix top1 and treenn (intel-analytics#3011)

* remove final setExtraParameters (#3014)

* move pretrain in DistriOptimizerV2 (intel-analytics#3016)

* move getData

* rename

* remove time counting

* deprecate dlframe (intel-analytics#3012)

* deprecate dlframe

* fix throughput (#3017)

* fix throughput

* update

* add release doc for 0.10.0 (intel-analytics#3020)

* test examples by distrioptimizerv2 (intel-analytics#3007)

* enable scala examples by distrioptimizerv2

* update example's readme

* update integration test

* test python examples by distriOptimizerV2 (intel-analytics#3008)

* Test python examples by distriOptimizerV2

* deprecate nn.keras (intel-analytics#3013)

* deprecate nn.keras

* fix loss when minibatch size is different (intel-analytics#3021)

* fix loss

* fix ut

* fix style check (intel-analytics#3022)

* specify pyspark version (intel-analytics#3030)

* specify pyspark version

* add release doc for 0.11 (#3026)

* flip version to 0.12 (intel-analytics#3029)



* update

* fix KerasLayer new parameters() (#3034)

* Fix analytics zoo protobuf shading problem (intel-analytics#3033)

* change shade name and remove protobuf-java (already introduced by tf)

* remove protobuf

* add required dependencies (#3047)

* update doc (intel-analytics#3056)

* Updatedoc (#3060)

* Update install-from-pip.md

* [WIP] spark 3.0 (intel-analytics#3054)

* spark 3.0

* add spark3.0 deployment (intel-analytics#3061)

* add spark3.0 deployment

* add warning to remind Optimizer() deprecates (intel-analytics#3062)

* add warning to remind deprecates

* Update scala maven plugin (#3068)

* update scala maven plugin

* change to public (#3064)

* Add big model support (#3067)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* squeeze target dimension (corner case) in ClassNLLCriterion (intel-analytics#3072)

* fix target dimension match error

* update message (#3073)

* flip version to 0.13-snapshot (intel-analytics#3074)

* flip version to 0.13-snapshot

* Uncompressed Tensor  (intel-analytics#3079)

* support no compressing parameter

* address comments

* hotfix ClassNLLCriterion with cloned target (#3081)

* hotfix ClassNLLCriterion with cloned target

* Fix SerializationUtils clone issue of QuantizedTensor (intel-analytics#3088)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* update clone quantizedtensor

* update

* add OptimPredictorShutdownSpec UT in integration test (#3089)

* move integration UT to a general test script (intel-analytics#3094)

* back port master (intel-analytics#3096)

* set seed to avoid random error in PredictionServiceUT (intel-analytics#3097)

* Jdk11 support (intel-analytics#3098)

* update for jdk 11 support and doc

* add serializeUid (intel-analytics#3099)

* update doc (intel-analytics#3104)

* add doc for running in ide (intel-analytics#3106)

* fix callBigDLFunc return a Int while the true return value from java is a byte array. (intel-analytics#3111)

* add list of df support (intel-analytics#3113)

* Update readme (intel-analytics#3118)

* Update index.md

* add 0.12.2 release download (#3122)

* remove DLFrames (intel-analytics#3124)

* remove DLFrames

* update

* update

* update

* rm dlframe example from test script

* Add Utest about dividing zero (#3128)

* Add Utest about dividing zero

* add Utest and zero check of LocalData

* add Utest and zero check of LocalData

* change

* Add Utest about dividing zero

* fix test

* add python3 to Dockerfile (intel-analytics#3132)

* add python3 to Dockerfile

* update

* update jdk

* update

* make default DistriOptimizer as V2 (intel-analytics#3129)

* make default DistriOptimizer as V2

* update

* fix dlframe (intel-analytics#3133)

* DistriOptimizerV2 logger (intel-analytics#3135)

* DistriOptimizerV2 logger

* update

* fix style check

* validate epoch num

* move dlframe SharedParamsApater to AZ and roll back to OptimizerV1 (intel-analytics#3137)

* upgrade spark version (intel-analytics#3138)

* Update deploy-spark2.sh

* 0.13 release doc (#3144)

* upgrade log4j (intel-analytics#3141)

* flip0.14 (intel-analytics#3142)

* flip0.14

* update

* Update deploy-spark3.sh (#3145)

* update

* update

* update

* update

* fix make dist

* migrate path

* update

* update

Co-authored-by: zhangxiaoli73 <380761639@qq.com>
Co-authored-by: Jerry Wu <wzhongyuan@gmail.com>
Co-authored-by: Xin Qiu <qiuxin2012@users.noreply.github.com>
Co-authored-by: Yanzhang Wang <i8run15@gmail.com>
Co-authored-by: GenBrg <34305977+GenBrg@users.noreply.github.com>
Co-authored-by: LeicongLi <leicongli@gmail.com>
Co-authored-by: Emiliano Martinez <emimartinez.sanchez@gmail.com>
Co-authored-by: abdolence <abdulla.abd.m@gmail.com>
Co-authored-by: Enrique Garcia <engapa@gmail.com>
Co-authored-by: Louie Tsai <louie.tsai@intel.com>
Co-authored-by: yaochi <yaochitc@gmail.com>
Co-authored-by: Menooker <Menooker@users.noreply.github.com>
Co-authored-by: Menooker <myjisgreat@live.cn>
Co-authored-by: Firecrackerxox <mengceng.he@intel.com>
Co-authored-by: majing921201 <1834475657@qq.com>
Co-authored-by: jenniew <jenniewang123@gmail.com>
Co-authored-by: Xiao <lingxiao1989@gmail.com>
Co-authored-by: Firecrackerxox <he044646@sina.com>
Co-authored-by: Hui Li <lihuibinghan@sina.com>
Co-authored-by: Jason Dai <jason.dai@intel.com>
Co-authored-by: dding3 <ding.ding@intel.com>
Co-authored-by: Yang Wang <yang3.wang@intel.com>
Co-authored-by: Yina Chen <33650826+cyita@users.noreply.github.com>
Co-authored-by: Hangrui Cao <50705298+DiegoCao@users.noreply.github.com>
Co-authored-by: pinggao18 <44043817+pinggao18@users.noreply.github.com>
Le-Zheng added a commit to Le-Zheng/analytics-zoo that referenced this pull request Sep 14, 2021
* convert static graph to IR graph and build (intel-analytics#2711)

* add static graph to IR graph

* meet pr comments

* [Enhancement] - Enhance unig test to avoid dynamic resource allocation issue by docker (intel-analytics#2713)

* make the core number fixed

* fix local predictor

* add Trigger and/or python API (intel-analytics#2682)

* add spark 2.4 support (intel-analytics#2715)

* update sparse tensor's document (#2714)

* Reserve all state in OptimMethod when calling Optimizer.optimize() multiple times (#2648)

* reserve optimMethod for each worker

* add valdiation throughput

* cache variable previousOptim

* fix: move mkldnn computing to a single thread pool (intel-analytics#2724)

Because if we use the parent thread directly, there will be two bugs,
1. The child threads forked from parent thread will be bound to core 0
because of the affinity settings.
2. The native thread has some unknown thread local variables. So if
the parent thread exits and is recreated, such as the thread from
Executors.newFixedThreadPool. The whole app will be segment fault.
The parent thread means the main thread (Local Mode) or worker thread of
mapPartition (Distributed Mode).

* add ceilMode for Pooling & fix batchNorm evaluate (#2708)

* add ceilMode for Pooling & fix batchNorm evaluate

* add training status for dnn layer

* fix comments

* fix IRGraph init & Add regualizer (#2736)

* fix IRGraph init & Add regualizer

* meet review comments

* fix: update mkldnn version to v0.17 issues. (intel-analytics#2712)

There're two issues,

1. the padding tensor required. mkl-dnn will use a padding tensor which
    will use more memory, such as 4x1x28x28 to 4x8x28x28(avx2). It will
    pad to times of simd width.
2. the TensorMMap between DenseTensor and DnnTensor. Previous impl
    will allocate DnnTensor when model is created, which will cost too much
    space. So this patch will allocate it at runtime.

* add computshape for some layers and add skip primitives in DnnGraph (intel-analytics#2740)

* add computshape for some layer and add skip primitives in DnnGraph

* meet pr comments

* Improve documentation (intel-analytics#2745)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* include edge case to cover all the data types (#2742)

* layer auto fusion for dnn graph (intel-analytics#2746)

* add auto fusion in dnn graph

* refactor predict for dnn model (intel-analytics#2737)

* refactor predict for dnn model

* remove some unit tests (intel-analytics#2752)

* remove some conflict tests (#2753)

* Update documentation (intel-analytics#2749)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Fix Add operation error when type is Double importing Tensorflow graph (#2721)

* feature: add byte supports for DnnTensor (intel-analytics#2751)

* feat: add byte supports for DnnTensor

* [New Feature] Calculating Scales (#2750)

* [New Feature]Calculating Scales

* recursively update mask for container module (intel-analytics#2754)

* recursively update mask for container module

* [Enhancement] - Speed up BlasWrapper performance under MKL-DNN (intel-analytics#2748)

* add parallel in Blaswrapper

* refactor to support ssd

* meet pr comments

* fix logger serialize

* Loss Function docs improvement (intel-analytics#2757)

* Improve Loss Function docs v2

* change asInstanceOf to toDistirbuted in optimizer (#2755)

* change asInstanceOf to toDistirbuted

* change asInstanceOf to toDistirbuted

* convert scale in blas to dnn (#2758)

* convert scale in blas to dnn

* meet pr comment

* feat: reorder for int8 supports (#2756)

1. Because the new data type, we should add a new attribute called dataType
    to the `MemoryData`.
2. Because we should transfer the scales between FP32->int8 and Int8->FP32.
    we should add two new attributes called `mask` and `scales`.

* fix conversion accuracy (intel-analytics#2760)

*  fix accuracy for saved model

* exclude mkldnn model when conversion

* feature: layer wise supports of int8 (intel-analytics#2762)

Enable the int8 data type in layers, especially for convolutions.
So for a specific layer, it can accept a int8 input. If you want to the fp32
output, should add a reorder.

* feature: mkldnn int8 layer wise supports (intel-analytics#2759)

including 3 steps.

1. generate scales of model.
   need an api like `generateScalesWithMask` to generate the scales of
   fp32 model. and the model returned is an fp32 model too.
2. quantize the model
   the `quantize()` api will be compatible with the `bigquant`
   backend, which will set the quantize flag. And when doing compile,
   the quantized weight, output, input will be generated by mkldnn at
   runtime.
3. do the inference (forward).

* update readme for v1 training (intel-analytics#2763)

* update release doc for preparation (intel-analytics#2764)

* change some docs about mkldnn (intel-analytics#2765)

* add comments about mkldnn

* meet pr comments

* examples for int8 (intel-analytics#2761)

This is an example of how to use mkldnn int8. There're two steps, use
GenInt8Scales to generate the scales first and save the new model. And than you
can use the quantized model as usual.

* enable fustion by default (intel-analytics#2766)

* fix: the influence of default value of fusion (#2768)

* fix: use too much memory of mkldnn models (intel-analytics#2783)

* fix: inplace of input/output and weight dimension error (intel-analytics#2779)

Some layer's input and output use the same memory. We can't do forward in the
`calcScales`. Because at that time, the input has been changed, its scales maybe
not right. Such as,

Seqeuntail().add(Conv).add(ReLU)

it will do two steps, seq.forward(input) first. and when go into the ReLU, it
will do another forward, so the input will be the output. And scales will be
wrong.

For convolution's weight, the dimension always is 5, although the group number
is 1. But for dnn convolution, if there's no group, the weight's dimension
should be 4.

* fix: the blas wrapper has no scales (intel-analytics#2778)

* fix softmax (intel-analytics#2777)

* fix: performance regression on resnet50 (intel-analytics#2774)

the u8 to s8 or s8 to u8 needs no reorder on this case.

* fix log init (#2781)

* fix: dropout should init primitive (#2789)

* Docs update for spark 2.3, build 0.7 and deps exlude (intel-analytics#2671)

* flip to 0.9.0 (intel-analytics#2792)

* Improve Layer documentation v1 (#2767)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Loss Function docs improvement v1

* Improve Loss Function docs v2

* Improve Layers documentation

* Improve documentation on Activations

* minor fix

* Update a code section with python style on Metrics.md (intel-analytics#2665)

* [Fix] doc : some changes for scalaUserGuide and release links according to … (intel-analytics#2791)

* doc : some changes for scalaUserGuide and release links according to v0.8.0 release

* Update build-bigdl-core.md

* Update build-bigdl-core.md

* test: should compare the right grad input (intel-analytics#2794)

* fix the wrong error message (#2800)

* [New feature] Add attention layer and ffn layer (intel-analytics#2795)

* add attention layer

* add ffn layer and more unit tests

* refactor according to pr comments

* add SerializationTest

* fix unit tests

* add python api

* update readme with newly adopted mkl-dnn (#2803)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer (intel-analytics#2802)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer:
Add LARS optimizer: Layer-wise scaled. Also with utility functions to build a set of LARS optim for a container.

Bug fix: The gradient block id of AllReduceParameter is originally composed of {id}{pidTo}gradientBytes{pidFrom}. But the combination of {id}{pidTo} will cause ambiguity. e.g., "112" can be {1}{12} or {11}{2}. Now a "_" is added to separate id from pidTo

* refine documents, correctly set the lrSchedulerOwner bit

* format the added code

* make Lars inherit SGD

* rename Lars -> LarsSGD and reformat

* style changes

* bugfix - set mask for container (intel-analytics#2807)

* bugfix - set mask for container

* bugfix #2805: set dimension mask

* Update Graph.scala

* Update Graph.scala

* change set mask indicator's name

* rename set mask params

* [Enhancement]: Scala Reflection: get default value for constructor parameters (intel-analytics#2808)

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* resolve conflict

* resolve conflict

* code style check

* remove print

* fix typos

fix typos

* replace randomcropper with centercrop for better performance (#2818)

* fix: memory data hash code should contain data type (intel-analytics#2821)

* Optimize backward graph generation and CAddTable (intel-analytics#2817)

* Optimize backward graph generation and caddtable

* refine add table

* change api name

* add layer norm and expand size layers (#2819)

* add layer norm and expand size

* meet pr comments

* feat: enable global average pooling (intel-analytics#2823)

* feat: enable global average pooling

* test: add more unit tests

* Optimizers: use member variable in parent class

* Revert "Optimizers: use member variable in parent class"

This reverts commit 7e47204

* Dilation in MKL-DNN Convolution (intel-analytics#2815)

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* fix typos

fix typos

* make todo all uppercase

* fix: calculate arbitrary mask of scales (intel-analytics#2822)

* Use one AllReduceParameter for multi-optim method  training (intel-analytics#2814)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* change random seed in UT

* [New feature] add transformer layer (intel-analytics#2825)

* add transformer

* refactor class name

* use same embedding for translation

* fix pr comments

* [Bug Fix] Fix Issue 2734 (#2816)

* fix issue 2734

* fix issue 2734

* fix issue 2734

* [Refactor] Reflection Utilization (#2831)

* refactor reflection utils

* refactor reflection utils

* feat: MKLDNN LSTM unidirectional/bidirectional inference support (intel-analytics#2806)

* LSTM draft

* MKLDNN LSTM fixed MD

* added hiddenSize

* setMemoryData NativeData

* weights NativeData format set to ldigo, all 1 test passed

* fixed format any problem

* LSTM weights bias initialisation

* add LSTM2 in nn

* Bidirectional LSTM inference enabled

* modified Bidirectional test

* LSTMSpec input format conversion bug between bigdl and mkldnn fixed, not support random weights, bias

* fixed the last problem 1 3 2 4

* Three inference tests with randomly generated parameters

* Added comments and modified the LSTMSpec (tests using Equivalent.nearequals)

* Deleted nn/LSTM2. Renamed methods. Added a requirement in nn/TimeDistributed

* combined initMemoryDescs() into initFwdPrimitives()

* Add require for input size and hidden size matching if layers of LSTM is more than one

* Refactor RNN

* Add comment on gate order to mkldnn/RNN

* Add unidirectional multilayer test

* add comments/ modify UTs

* phase is not used anymore/ use isTraining() in stead

* operationWant enhanced/ weight init/ release() parameters()

* remove input format check and change some variables names

* input format check / throw exception print info / release code

* comment style and RNNSerialTest

* remove unnecessary comments

* Softmax -> SoftMax (#2837)

* bug fix for cmul (intel-analytics#2836)

* bug fix for cmul

* meet pr comments

* set new storage to weight and bias for weight fusion (intel-analytics#2839)

* Add parameter processor for LARS (#2832)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* implement lars whole layer gradient norm calculation

* change random seed in UT

* add limitation on "trust" of LARS, remove debug output

* reformat

* add tests in DirtriOptimizer for LARS

* reformat

* update parameters in UT

* update parameters in UT

* Add transformer to LM example (intel-analytics#2835)

* add transformer to LM example

* refactor dropout in Transformer

* meet pr comments

* feat: MKLDNN LSTM unidirectional/bidirectional backward support (#2840)

* MKLDNN LSTM backward support with accuracy testing

* fix: require consistent between shape and layout of mkldnn (intel-analytics#2824)

* fix: fusion for multi-group of convolution (intel-analytics#2826)

* fix: support int8 of jointable (#2827)

* fix: support int8 of jointable
* doc: add more docs

* fix: invokeAndWait2 should throw the exception in the tasks (intel-analytics#2843)

* fix acc bug & init dnn thread (intel-analytics#2841)

* support tnc and ntc conversion (intel-analytics#2844)

* support ntc in dnn layer (intel-analytics#2847)

* support ntc in dnn layer

* meet pr comments

* [WIP]Add beam search feature in transformer model (intel-analytics#2834)

* add beam search feature

* Update beam search feature and unit test

* add symbolToLogits function set check

* update clearState and add serial test

* add SequenceBeamSearch to python layers

* add createSequenceBeamSearch method to python api

* feat: add a property to disable omp thread affinity (intel-analytics#2849)

* fix: use treeset to calc topk to upgrade the performance of DetectionOutputSSD (intel-analytics#2853)

* fix: wrong affinity settings (intel-analytics#2857)

* update beam search feature for interface with transformer model (#2855)

* update beam search for padding value and cache structure

* update python API for beam search

* add comments and update python layer

* modify comments format

* modify comments format

* Support converting blas lstm to dnn lstm (#2846)

* convert from blas lstm to dnn lstm

* meet pr comments

* fix load lstm error bug (intel-analytics#2858)

* Add beam search in transformer (intel-analytics#2856)

* Add beam search in transformer

* meet pr comments

* fix: upgrade the performance of normalize (intel-analytics#2854)

* feat: add axis to softmax (intel-analytics#2859)

* add release doc for 0.9 (intel-analytics#2862)

* fix: update core ref to master (intel-analytics#2865)

* flip version to 0.10.0 (intel-analytics#2869)

* [Bug Fix] - Fix module version comparison  (intel-analytics#2871)

* update serialization

* update serialization

* convert IRgraph momentum to mkldnn (intel-analytics#2872)

* tutorial fix (intel-analytics#2879)

* feat: RoiAlign Forward (intel-analytics#2874)

* Add set input output format API in Python (intel-analytics#2880)

* add set input output format

* add static graph check

* feat: Feature Pyramid Networks Forward (intel-analytics#2870)

* fix memory leak for ir graph training (intel-analytics#2895)

* add gemm layer (#2882)

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add gemm layer

* add gemm layer

* add Shape layer (intel-analytics#2885)

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add Gather layer (intel-analytics#2897)

* add gather layer

* [New feature] Add maskhead (intel-analytics#2892)

* support for maskhead

* fix unit tests (intel-analytics#2905)

* modify  predict/predictClass function  (#2868)

* predictClass output modification

* predict/predictClass function modification in Beta Api

* predict/predictClass function modification

* predict/predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* [New feature] Add Boxhead (intel-analytics#2894)

* add boxhead

* add SerialTest

* meet pr comments

* fix: Add TopBlocks to Feature Pyramid Networks (FPN) (#2899)

* Add Mean Average Precision validation method (intel-analytics#2906)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* fix boxhead unit tests (#2912)

* python api nested list input and pooler python api (intel-analytics#2900)

* Auto memory management for MKLDNN (#2867)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* style fixes

* change _implicitMemoryOwner -> _this

* [New feature] Add region proposal (intel-analytics#2896)

* add Regionproposal

* [New feature] add maskrcnn (#2908)

* add maskrcnn

* fix mask head

* move maskrcnn to models

* add maskrcnn serialTest

* Add Onnx Supported Layers (intel-analytics#2902)

* remove duplicated layers

* Update RoiLabel class and add RoiImageFeatureToBatch (intel-analytics#2913)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* update RoiLabel, add RoiImageFeatureToBatch

* fix typo in class name

* updates by suggestions

* minor updates

* Move RoiMiniBatch to MTImageFeatureToBatch.scala

* mask in RoiLabel now have Floats not Bytes

* use IndexedSeq for RoiLabel

* style fix

* add isCrowd and origSize to final target table

* style fix

* isCrowd change to float, add doc

* add tests and bug fixes

* add util getting RoiLabels from ImageFeatures

* add util getting RoiLabels from Table

* comment out the tests

* rename utils in RoiLabel

* feat: MKLDNN GRU forward/backward support (#2893)

* Onnx support: modify unsqueeze function (#2910)

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* add maskutils (intel-analytics#2921)

* add maskutils

* update tests & docs

* fix typo in document

* Fix memory leaks on training (intel-analytics#2914)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* fix memory leak in batch norm

* style fixes

* change _implicitMemoryOwner -> _this

* release submat

* release opencv submats

* support samples with different size  to one mini batch (intel-analytics#2929)

* add to batch with resize

* meet comments

* support batch for mask head and pooler (intel-analytics#2926)

* support batch for mask head

* meet comments

* Onnx support: add a dim parameter to ops.Gather (intel-analytics#2920)

* add dim parameter to ops.Gather

* improve and simplify code

* improve and simplify code

* improve and simplify code

* improve and simplify code

* support batch for regionproposal (#2928)

* support batch for regionproposal

* enable gru blas-to-dnn conversion (intel-analytics#2930)

* Onnx support: add pos parameter to softmax (intel-analytics#2933)

* add pos parameter to softmax

* add pos parameter to softmax

* add pos parameter to softmax

* fix review problem

* fix review problem

* Add resize for segmentation (intel-analytics#2923)

* add resize for segmentation

* meet pr comments

* support batch input for boxhead (#2924)

* boxhead support batch input

* meet pr comments

* COCO SeqFile (intel-analytics#2927)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* ignore non-existing images

* updates based on GH comments

* ONNX Support (#2918)

* onnx dev

* add onnx loader

* clean up

* feat: add precision recall auc (#2941)

* feat: add precision recall auc

* add post processing for maskrcnn model (#2931)

* add mask postprocessing

* put image info to mask model

* fix TimeDistributedCriterion() lack of parameter of dimension issue (intel-analytics#2940)

* revert back api (intel-analytics#2943)

* fix: softmax and bn+scale fusion (intel-analytics#2937)

* feat: multi models support with MKL-DNN backend (intel-analytics#2936)

* feat: multi models support with MKL-DNN backend

* add COCO MAP (#2935)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* add COCO MAP

* revert merge conflict

* ignore non-existing images

* add IOU related API. MAP now parses RLEs

* BBox now inclusive

* updates based on GH comments

* add COCODataset.getImageById

* COCO topK default => -1, remove height: Int, width: Int in GroundTruthRLE

* update imageId2Image

* rename MAPObjectDetection utils, add cocoSegmentationAndBBox, refine formatting

* rename utils

* update documents

* check size of bbox & classes & scores & labels & iscrowd. Handle empty predictions

* add gt and target image size checking, add support for empty target bbox, add UT

* detection sorted before matching with GT. Optimize MAPResult merging. Add UT for merging

* COCO Seq file reader: grey to bgr (intel-analytics#2942)

* grey to bgr

* refactor isGrayScaleImage

* simplify grey scale image checking

* Add the flushing denormal values option on BigDL side (#2934)

* add no argument apply api for softmax (intel-analytics#2945)

* add no argument apply api for softmax

* add no argument apply api for softmax

* ONNX ResNet example (intel-analytics#2939)

* add onnx resnet example

* add doc for onnx

* add doc for onnx

* clean up

* add maskrcnn inference example (intel-analytics#2944)

* add maskrcnn inference example

* meet pr comments

* add model download url

* Update the RoiLabel and MTImageFeatureToBatch (intel-analytics#2925)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* Python MKLDNN examples for CNN(LeNet) and RNN(LSTM) (#2932)

* fix: takeSample only works for dnn backend and get one batch (intel-analytics#2947)

* fix: takeSample only works for dnn backend and get one batch

* edit doc (#2948)

* Rename filesToRoiImageFrame to filesToRoiImageFeatures (intel-analytics#2949)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* filesToRoiImageFrame -> filesToRoiImageFeatures, to public

* fix: move out setMklThreads of MklDnn (intel-analytics#2950)

* memory data cleanup (#2956)

* memory data cleanup

* Onnx support: RoiAlign and TopK parameter update (#2957)

* Topk add dim and increase parameter

* RoiAlign add max pooling mode

* add test cases

* add test cases

* remove masks requirements (intel-analytics#2959)

* fix: the squeeze should not be included in IRElement (intel-analytics#2962)

* enhance COCODataset (#2954)

* enhance COCODataset:
Add COCODataset.loadFromSeqFile
Add COCODataset.toImageFeatures
Add COCOImage.toTable

* rename and polish doc

* fix COCO serialize bug

* fix typo in function name

* typo fix (intel-analytics#2965)

* rename RoiImageFeatureToBatch APIs (#2964)

* RoiMiniBatch enhancement (#2953)

* SerializableIndexedSeq

* allow empty target & image size info

* rename RoiImageFeatureToBatch APIs

* set as private

* change back to array

* MTImageFeatureToBatch without labels

* handle iscrowd

* remove duplication in merge

* feat: add softmax backward (intel-analytics#2967)

* feat: add softmax backward

* fix: fuse bn scale and relu to bn. (intel-analytics#2966)

* fix: fuse bn scale and relu.

* fix mask unit tests (intel-analytics#2973)

* fix: nms stability when using treeset. (intel-analytics#2972)

* flip version to 0.11 (intel-analytics#2974)

* refactor anchor generator (#2963)

* refactor anchor generator

* meet pr comments

* fix code style

* ROIAlign refactor (intel-analytics#2960)

* ROIAlign refactor

* fix unit tests

* fix model load of maskrcnn (intel-analytics#2961)

* fix maskrcnn model load

* delete temp file

* fix maskrcnn tests

* support roialign backward (intel-analytics#2975)

* support roialign backward

* fix sparselinear unit test

* fix: bn nhwc error, the channel should be the last dim (#2981)

* refactor: move torch relevants unit tests to integration tests. (intel-analytics#2971)

* fix: enable integration accuracy tests (intel-analytics#2976)

* fix: softmax dnn backend wrong order of primitive (intel-analytics#2986)

* modify TextClassifier.scala (#2987)

* Add a method to merge nested StaticGraphs (intel-analytics#2985)

* NHWC support when running with MKL-DNN (#2989)

* support NHWC for MKLDNN

* fix unit tests

* Keras with MKL-DNN backend support (#2990)

* Update README.md

* Update README.md

* feat: add distri optimizer v2 (intel-analytics#2992)

* update error message in AllReduceParameter (#2997)

* update error message in AllReduceParameter

* use tensorflow proto jar (#2994)

* fix callBigDLFunc (intel-analytics#3002)

* Remove final for AbstractModule (intel-analytics#3001)

* DistriOptimizerV2 argument (intel-analytics#3003)

* call DistriOptimizerV2

* fix inception (intel-analytics#3010)

* fix top1 and treenn (intel-analytics#3011)

* remove final setExtraParameters (#3014)

* move pretrain in DistriOptimizerV2 (intel-analytics#3016)

* move getData

* rename

* remove time counting

* deprecate dlframe (intel-analytics#3012)

* deprecate dlframe

* fix throughput (#3017)

* fix throughput

* update

* add release doc for 0.10.0 (intel-analytics#3020)

* test examples by distrioptimizerv2 (intel-analytics#3007)

* enable scala examples by distrioptimizerv2

* update example's readme

* update integration test

* test python examples by distriOptimizerV2 (intel-analytics#3008)

* Test python examples by distriOptimizerV2

* deprecate nn.keras (intel-analytics#3013)

* deprecate nn.keras

* fix loss when minibatch size is different (intel-analytics#3021)

* fix loss

* fix ut

* fix style check (intel-analytics#3022)

* specify pyspark version (intel-analytics#3030)

* specify pyspark version

* add release doc for 0.11 (#3026)

* flip version to 0.12 (intel-analytics#3029)



* update

* fix KerasLayer new parameters() (#3034)

* Fix analytics zoo protobuf shading problem (intel-analytics#3033)

* change shade name and remove protobuf-java (already introduced by tf)

* remove protobuf

* add required dependencies (#3047)

* update doc (intel-analytics#3056)

* Updatedoc (#3060)

* Update install-from-pip.md

* [WIP] spark 3.0 (intel-analytics#3054)

* spark 3.0

* add spark3.0 deployment (intel-analytics#3061)

* add spark3.0 deployment

* add warning to remind Optimizer() deprecates (intel-analytics#3062)

* add warning to remind deprecates

* Update scala maven plugin (#3068)

* update scala maven plugin

* change to public (#3064)

* Add big model support (#3067)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* squeeze target dimension (corner case) in ClassNLLCriterion (intel-analytics#3072)

* fix target dimension match error

* update message (#3073)

* flip version to 0.13-snapshot (intel-analytics#3074)

* flip version to 0.13-snapshot

* Uncompressed Tensor  (intel-analytics#3079)

* support no compressing parameter

* address comments

* hotfix ClassNLLCriterion with cloned target (#3081)

* hotfix ClassNLLCriterion with cloned target

* Fix SerializationUtils clone issue of QuantizedTensor (intel-analytics#3088)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* update clone quantizedtensor

* update

* add OptimPredictorShutdownSpec UT in integration test (#3089)

* move integration UT to a general test script (intel-analytics#3094)

* back port master (intel-analytics#3096)

* set seed to avoid random error in PredictionServiceUT (intel-analytics#3097)

* Jdk11 support (intel-analytics#3098)

* update for jdk 11 support and doc

* add serializeUid (intel-analytics#3099)

* update doc (intel-analytics#3104)

* add doc for running in ide (intel-analytics#3106)

* fix callBigDLFunc return a Int while the true return value from java is a byte array. (intel-analytics#3111)

* add list of df support (intel-analytics#3113)

* Update readme (intel-analytics#3118)

* Update index.md

* add 0.12.2 release download (#3122)

* remove DLFrames (intel-analytics#3124)

* remove DLFrames

* update

* update

* update

* rm dlframe example from test script

* Add Utest about dividing zero (#3128)

* Add Utest about dividing zero

* add Utest and zero check of LocalData

* add Utest and zero check of LocalData

* change

* Add Utest about dividing zero

* fix test

* add python3 to Dockerfile (intel-analytics#3132)

* add python3 to Dockerfile

* update

* update jdk

* update

* make default DistriOptimizer as V2 (intel-analytics#3129)

* make default DistriOptimizer as V2

* update

* fix dlframe (intel-analytics#3133)

* DistriOptimizerV2 logger (intel-analytics#3135)

* DistriOptimizerV2 logger

* update

* fix style check

* validate epoch num

* move dlframe SharedParamsApater to AZ and roll back to OptimizerV1 (intel-analytics#3137)

* upgrade spark version (intel-analytics#3138)

* Update deploy-spark2.sh

* 0.13 release doc (#3144)

* upgrade log4j (intel-analytics#3141)

* flip0.14 (intel-analytics#3142)

* flip0.14

* update

* Update deploy-spark3.sh (#3145)

* update

* update

* update

* update

* fix make dist

* migrate path

* update

* update

Co-authored-by: zhangxiaoli73 <380761639@qq.com>
Co-authored-by: Jerry Wu <wzhongyuan@gmail.com>
Co-authored-by: Xin Qiu <qiuxin2012@users.noreply.github.com>
Co-authored-by: Yanzhang Wang <i8run15@gmail.com>
Co-authored-by: GenBrg <34305977+GenBrg@users.noreply.github.com>
Co-authored-by: LeicongLi <leicongli@gmail.com>
Co-authored-by: Emiliano Martinez <emimartinez.sanchez@gmail.com>
Co-authored-by: abdolence <abdulla.abd.m@gmail.com>
Co-authored-by: Enrique Garcia <engapa@gmail.com>
Co-authored-by: Louie Tsai <louie.tsai@intel.com>
Co-authored-by: yaochi <yaochitc@gmail.com>
Co-authored-by: Menooker <Menooker@users.noreply.github.com>
Co-authored-by: Menooker <myjisgreat@live.cn>
Co-authored-by: Firecrackerxox <mengceng.he@intel.com>
Co-authored-by: majing921201 <1834475657@qq.com>
Co-authored-by: jenniew <jenniewang123@gmail.com>
Co-authored-by: Xiao <lingxiao1989@gmail.com>
Co-authored-by: Firecrackerxox <he044646@sina.com>
Co-authored-by: Hui Li <lihuibinghan@sina.com>
Co-authored-by: Jason Dai <jason.dai@intel.com>
Co-authored-by: dding3 <ding.ding@intel.com>
Co-authored-by: Yang Wang <yang3.wang@intel.com>
Co-authored-by: Yina Chen <33650826+cyita@users.noreply.github.com>
Co-authored-by: Hangrui Cao <50705298+DiegoCao@users.noreply.github.com>
Co-authored-by: pinggao18 <44043817+pinggao18@users.noreply.github.com>
Le-Zheng added a commit to Le-Zheng/analytics-zoo that referenced this pull request Sep 14, 2021
* convert static graph to IR graph and build (intel-analytics#2711)

* add static graph to IR graph

* meet pr comments

* [Enhancement] - Enhance unig test to avoid dynamic resource allocation issue by docker (intel-analytics#2713)

* make the core number fixed

* fix local predictor

* add Trigger and/or python API (intel-analytics#2682)

* add spark 2.4 support (intel-analytics#2715)

* update sparse tensor's document (#2714)

* Reserve all state in OptimMethod when calling Optimizer.optimize() multiple times (#2648)

* reserve optimMethod for each worker

* add valdiation throughput

* cache variable previousOptim

* fix: move mkldnn computing to a single thread pool (intel-analytics#2724)

Because if we use the parent thread directly, there will be two bugs,
1. The child threads forked from parent thread will be bound to core 0
because of the affinity settings.
2. The native thread has some unknown thread local variables. So if
the parent thread exits and is recreated, such as the thread from
Executors.newFixedThreadPool. The whole app will be segment fault.
The parent thread means the main thread (Local Mode) or worker thread of
mapPartition (Distributed Mode).

* add ceilMode for Pooling & fix batchNorm evaluate (#2708)

* add ceilMode for Pooling & fix batchNorm evaluate

* add training status for dnn layer

* fix comments

* fix IRGraph init & Add regualizer (#2736)

* fix IRGraph init & Add regualizer

* meet review comments

* fix: update mkldnn version to v0.17 issues. (intel-analytics#2712)

There're two issues,

1. the padding tensor required. mkl-dnn will use a padding tensor which
    will use more memory, such as 4x1x28x28 to 4x8x28x28(avx2). It will
    pad to times of simd width.
2. the TensorMMap between DenseTensor and DnnTensor. Previous impl
    will allocate DnnTensor when model is created, which will cost too much
    space. So this patch will allocate it at runtime.

* add computshape for some layers and add skip primitives in DnnGraph (intel-analytics#2740)

* add computshape for some layer and add skip primitives in DnnGraph

* meet pr comments

* Improve documentation (intel-analytics#2745)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* include edge case to cover all the data types (#2742)

* layer auto fusion for dnn graph (intel-analytics#2746)

* add auto fusion in dnn graph

* refactor predict for dnn model (intel-analytics#2737)

* refactor predict for dnn model

* remove some unit tests (intel-analytics#2752)

* remove some conflict tests (#2753)

* Update documentation (intel-analytics#2749)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Fix Add operation error when type is Double importing Tensorflow graph (#2721)

* feature: add byte supports for DnnTensor (intel-analytics#2751)

* feat: add byte supports for DnnTensor

* [New Feature] Calculating Scales (#2750)

* [New Feature]Calculating Scales

* recursively update mask for container module (intel-analytics#2754)

* recursively update mask for container module

* [Enhancement] - Speed up BlasWrapper performance under MKL-DNN (intel-analytics#2748)

* add parallel in Blaswrapper

* refactor to support ssd

* meet pr comments

* fix logger serialize

* Loss Function docs improvement (intel-analytics#2757)

* Improve Loss Function docs v2

* change asInstanceOf to toDistirbuted in optimizer (#2755)

* change asInstanceOf to toDistirbuted

* change asInstanceOf to toDistirbuted

* convert scale in blas to dnn (#2758)

* convert scale in blas to dnn

* meet pr comment

* feat: reorder for int8 supports (#2756)

1. Because the new data type, we should add a new attribute called dataType
    to the `MemoryData`.
2. Because we should transfer the scales between FP32->int8 and Int8->FP32.
    we should add two new attributes called `mask` and `scales`.

* fix conversion accuracy (intel-analytics#2760)

*  fix accuracy for saved model

* exclude mkldnn model when conversion

* feature: layer wise supports of int8 (intel-analytics#2762)

Enable the int8 data type in layers, especially for convolutions.
So for a specific layer, it can accept a int8 input. If you want to the fp32
output, should add a reorder.

* feature: mkldnn int8 layer wise supports (intel-analytics#2759)

including 3 steps.

1. generate scales of model.
   need an api like `generateScalesWithMask` to generate the scales of
   fp32 model. and the model returned is an fp32 model too.
2. quantize the model
   the `quantize()` api will be compatible with the `bigquant`
   backend, which will set the quantize flag. And when doing compile,
   the quantized weight, output, input will be generated by mkldnn at
   runtime.
3. do the inference (forward).

* update readme for v1 training (intel-analytics#2763)

* update release doc for preparation (intel-analytics#2764)

* change some docs about mkldnn (intel-analytics#2765)

* add comments about mkldnn

* meet pr comments

* examples for int8 (intel-analytics#2761)

This is an example of how to use mkldnn int8. There're two steps, use
GenInt8Scales to generate the scales first and save the new model. And than you
can use the quantized model as usual.

* enable fustion by default (intel-analytics#2766)

* fix: the influence of default value of fusion (#2768)

* fix: use too much memory of mkldnn models (intel-analytics#2783)

* fix: inplace of input/output and weight dimension error (intel-analytics#2779)

Some layer's input and output use the same memory. We can't do forward in the
`calcScales`. Because at that time, the input has been changed, its scales maybe
not right. Such as,

Seqeuntail().add(Conv).add(ReLU)

it will do two steps, seq.forward(input) first. and when go into the ReLU, it
will do another forward, so the input will be the output. And scales will be
wrong.

For convolution's weight, the dimension always is 5, although the group number
is 1. But for dnn convolution, if there's no group, the weight's dimension
should be 4.

* fix: the blas wrapper has no scales (intel-analytics#2778)

* fix softmax (intel-analytics#2777)

* fix: performance regression on resnet50 (intel-analytics#2774)

the u8 to s8 or s8 to u8 needs no reorder on this case.

* fix log init (#2781)

* fix: dropout should init primitive (#2789)

* Docs update for spark 2.3, build 0.7 and deps exlude (intel-analytics#2671)

* flip to 0.9.0 (intel-analytics#2792)

* Improve Layer documentation v1 (#2767)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Loss Function docs improvement v1

* Improve Loss Function docs v2

* Improve Layers documentation

* Improve documentation on Activations

* minor fix

* Update a code section with python style on Metrics.md (intel-analytics#2665)

* [Fix] doc : some changes for scalaUserGuide and release links according to … (intel-analytics#2791)

* doc : some changes for scalaUserGuide and release links according to v0.8.0 release

* Update build-bigdl-core.md

* Update build-bigdl-core.md

* test: should compare the right grad input (intel-analytics#2794)

* fix the wrong error message (#2800)

* [New feature] Add attention layer and ffn layer (intel-analytics#2795)

* add attention layer

* add ffn layer and more unit tests

* refactor according to pr comments

* add SerializationTest

* fix unit tests

* add python api

* update readme with newly adopted mkl-dnn (#2803)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer (intel-analytics#2802)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer:
Add LARS optimizer: Layer-wise scaled. Also with utility functions to build a set of LARS optim for a container.

Bug fix: The gradient block id of AllReduceParameter is originally composed of {id}{pidTo}gradientBytes{pidFrom}. But the combination of {id}{pidTo} will cause ambiguity. e.g., "112" can be {1}{12} or {11}{2}. Now a "_" is added to separate id from pidTo

* refine documents, correctly set the lrSchedulerOwner bit

* format the added code

* make Lars inherit SGD

* rename Lars -> LarsSGD and reformat

* style changes

* bugfix - set mask for container (intel-analytics#2807)

* bugfix - set mask for container

* bugfix #2805: set dimension mask

* Update Graph.scala

* Update Graph.scala

* change set mask indicator's name

* rename set mask params

* [Enhancement]: Scala Reflection: get default value for constructor parameters (intel-analytics#2808)

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* resolve conflict

* resolve conflict

* code style check

* remove print

* fix typos

fix typos

* replace randomcropper with centercrop for better performance (#2818)

* fix: memory data hash code should contain data type (intel-analytics#2821)

* Optimize backward graph generation and CAddTable (intel-analytics#2817)

* Optimize backward graph generation and caddtable

* refine add table

* change api name

* add layer norm and expand size layers (#2819)

* add layer norm and expand size

* meet pr comments

* feat: enable global average pooling (intel-analytics#2823)

* feat: enable global average pooling

* test: add more unit tests

* Optimizers: use member variable in parent class

* Revert "Optimizers: use member variable in parent class"

This reverts commit 7e47204

* Dilation in MKL-DNN Convolution (intel-analytics#2815)

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* fix typos

fix typos

* make todo all uppercase

* fix: calculate arbitrary mask of scales (intel-analytics#2822)

* Use one AllReduceParameter for multi-optim method  training (intel-analytics#2814)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* change random seed in UT

* [New feature] add transformer layer (intel-analytics#2825)

* add transformer

* refactor class name

* use same embedding for translation

* fix pr comments

* [Bug Fix] Fix Issue 2734 (#2816)

* fix issue 2734

* fix issue 2734

* fix issue 2734

* [Refactor] Reflection Utilization (#2831)

* refactor reflection utils

* refactor reflection utils

* feat: MKLDNN LSTM unidirectional/bidirectional inference support (intel-analytics#2806)

* LSTM draft

* MKLDNN LSTM fixed MD

* added hiddenSize

* setMemoryData NativeData

* weights NativeData format set to ldigo, all 1 test passed

* fixed format any problem

* LSTM weights bias initialisation

* add LSTM2 in nn

* Bidirectional LSTM inference enabled

* modified Bidirectional test

* LSTMSpec input format conversion bug between bigdl and mkldnn fixed, not support random weights, bias

* fixed the last problem 1 3 2 4

* Three inference tests with randomly generated parameters

* Added comments and modified the LSTMSpec (tests using Equivalent.nearequals)

* Deleted nn/LSTM2. Renamed methods. Added a requirement in nn/TimeDistributed

* combined initMemoryDescs() into initFwdPrimitives()

* Add require for input size and hidden size matching if layers of LSTM is more than one

* Refactor RNN

* Add comment on gate order to mkldnn/RNN

* Add unidirectional multilayer test

* add comments/ modify UTs

* phase is not used anymore/ use isTraining() in stead

* operationWant enhanced/ weight init/ release() parameters()

* remove input format check and change some variables names

* input format check / throw exception print info / release code

* comment style and RNNSerialTest

* remove unnecessary comments

* Softmax -> SoftMax (#2837)

* bug fix for cmul (intel-analytics#2836)

* bug fix for cmul

* meet pr comments

* set new storage to weight and bias for weight fusion (intel-analytics#2839)

* Add parameter processor for LARS (#2832)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* implement lars whole layer gradient norm calculation

* change random seed in UT

* add limitation on "trust" of LARS, remove debug output

* reformat

* add tests in DirtriOptimizer for LARS

* reformat

* update parameters in UT

* update parameters in UT

* Add transformer to LM example (intel-analytics#2835)

* add transformer to LM example

* refactor dropout in Transformer

* meet pr comments

* feat: MKLDNN LSTM unidirectional/bidirectional backward support (#2840)

* MKLDNN LSTM backward support with accuracy testing

* fix: require consistent between shape and layout of mkldnn (intel-analytics#2824)

* fix: fusion for multi-group of convolution (intel-analytics#2826)

* fix: support int8 of jointable (#2827)

* fix: support int8 of jointable
* doc: add more docs

* fix: invokeAndWait2 should throw the exception in the tasks (intel-analytics#2843)

* fix acc bug & init dnn thread (intel-analytics#2841)

* support tnc and ntc conversion (intel-analytics#2844)

* support ntc in dnn layer (intel-analytics#2847)

* support ntc in dnn layer

* meet pr comments

* [WIP]Add beam search feature in transformer model (intel-analytics#2834)

* add beam search feature

* Update beam search feature and unit test

* add symbolToLogits function set check

* update clearState and add serial test

* add SequenceBeamSearch to python layers

* add createSequenceBeamSearch method to python api

* feat: add a property to disable omp thread affinity (intel-analytics#2849)

* fix: use treeset to calc topk to upgrade the performance of DetectionOutputSSD (intel-analytics#2853)

* fix: wrong affinity settings (intel-analytics#2857)

* update beam search feature for interface with transformer model (#2855)

* update beam search for padding value and cache structure

* update python API for beam search

* add comments and update python layer

* modify comments format

* modify comments format

* Support converting blas lstm to dnn lstm (#2846)

* convert from blas lstm to dnn lstm

* meet pr comments

* fix load lstm error bug (intel-analytics#2858)

* Add beam search in transformer (intel-analytics#2856)

* Add beam search in transformer

* meet pr comments

* fix: upgrade the performance of normalize (intel-analytics#2854)

* feat: add axis to softmax (intel-analytics#2859)

* add release doc for 0.9 (intel-analytics#2862)

* fix: update core ref to master (intel-analytics#2865)

* flip version to 0.10.0 (intel-analytics#2869)

* [Bug Fix] - Fix module version comparison  (intel-analytics#2871)

* update serialization

* update serialization

* convert IRgraph momentum to mkldnn (intel-analytics#2872)

* tutorial fix (intel-analytics#2879)

* feat: RoiAlign Forward (intel-analytics#2874)

* Add set input output format API in Python (intel-analytics#2880)

* add set input output format

* add static graph check

* feat: Feature Pyramid Networks Forward (intel-analytics#2870)

* fix memory leak for ir graph training (intel-analytics#2895)

* add gemm layer (#2882)

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add gemm layer

* add gemm layer

* add Shape layer (intel-analytics#2885)

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add Gather layer (intel-analytics#2897)

* add gather layer

* [New feature] Add maskhead (intel-analytics#2892)

* support for maskhead

* fix unit tests (intel-analytics#2905)

* modify  predict/predictClass function  (#2868)

* predictClass output modification

* predict/predictClass function modification in Beta Api

* predict/predictClass function modification

* predict/predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* [New feature] Add Boxhead (intel-analytics#2894)

* add boxhead

* add SerialTest

* meet pr comments

* fix: Add TopBlocks to Feature Pyramid Networks (FPN) (#2899)

* Add Mean Average Precision validation method (intel-analytics#2906)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* fix boxhead unit tests (#2912)

* python api nested list input and pooler python api (intel-analytics#2900)

* Auto memory management for MKLDNN (#2867)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* style fixes

* change _implicitMemoryOwner -> _this

* [New feature] Add region proposal (intel-analytics#2896)

* add Regionproposal

* [New feature] add maskrcnn (#2908)

* add maskrcnn

* fix mask head

* move maskrcnn to models

* add maskrcnn serialTest

* Add Onnx Supported Layers (intel-analytics#2902)

* remove duplicated layers

* Update RoiLabel class and add RoiImageFeatureToBatch (intel-analytics#2913)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* update RoiLabel, add RoiImageFeatureToBatch

* fix typo in class name

* updates by suggestions

* minor updates

* Move RoiMiniBatch to MTImageFeatureToBatch.scala

* mask in RoiLabel now have Floats not Bytes

* use IndexedSeq for RoiLabel

* style fix

* add isCrowd and origSize to final target table

* style fix

* isCrowd change to float, add doc

* add tests and bug fixes

* add util getting RoiLabels from ImageFeatures

* add util getting RoiLabels from Table

* comment out the tests

* rename utils in RoiLabel

* feat: MKLDNN GRU forward/backward support (#2893)

* Onnx support: modify unsqueeze function (#2910)

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* add maskutils (intel-analytics#2921)

* add maskutils

* update tests & docs

* fix typo in document

* Fix memory leaks on training (intel-analytics#2914)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* fix memory leak in batch norm

* style fixes

* change _implicitMemoryOwner -> _this

* release submat

* release opencv submats

* support samples with different size  to one mini batch (intel-analytics#2929)

* add to batch with resize

* meet comments

* support batch for mask head and pooler (intel-analytics#2926)

* support batch for mask head

* meet comments

* Onnx support: add a dim parameter to ops.Gather (intel-analytics#2920)

* add dim parameter to ops.Gather

* improve and simplify code

* improve and simplify code

* improve and simplify code

* improve and simplify code

* support batch for regionproposal (#2928)

* support batch for regionproposal

* enable gru blas-to-dnn conversion (intel-analytics#2930)

* Onnx support: add pos parameter to softmax (intel-analytics#2933)

* add pos parameter to softmax

* add pos parameter to softmax

* add pos parameter to softmax

* fix review problem

* fix review problem

* Add resize for segmentation (intel-analytics#2923)

* add resize for segmentation

* meet pr comments

* support batch input for boxhead (#2924)

* boxhead support batch input

* meet pr comments

* COCO SeqFile (intel-analytics#2927)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* ignore non-existing images

* updates based on GH comments

* ONNX Support (#2918)

* onnx dev

* add onnx loader

* clean up

* feat: add precision recall auc (#2941)

* feat: add precision recall auc

* add post processing for maskrcnn model (#2931)

* add mask postprocessing

* put image info to mask model

* fix TimeDistributedCriterion() lack of parameter of dimension issue (intel-analytics#2940)

* revert back api (intel-analytics#2943)

* fix: softmax and bn+scale fusion (intel-analytics#2937)

* feat: multi models support with MKL-DNN backend (intel-analytics#2936)

* feat: multi models support with MKL-DNN backend

* add COCO MAP (#2935)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* add COCO MAP

* revert merge conflict

* ignore non-existing images

* add IOU related API. MAP now parses RLEs

* BBox now inclusive

* updates based on GH comments

* add COCODataset.getImageById

* COCO topK default => -1, remove height: Int, width: Int in GroundTruthRLE

* update imageId2Image

* rename MAPObjectDetection utils, add cocoSegmentationAndBBox, refine formatting

* rename utils

* update documents

* check size of bbox & classes & scores & labels & iscrowd. Handle empty predictions

* add gt and target image size checking, add support for empty target bbox, add UT

* detection sorted before matching with GT. Optimize MAPResult merging. Add UT for merging

* COCO Seq file reader: grey to bgr (intel-analytics#2942)

* grey to bgr

* refactor isGrayScaleImage

* simplify grey scale image checking

* Add the flushing denormal values option on BigDL side (#2934)

* add no argument apply api for softmax (intel-analytics#2945)

* add no argument apply api for softmax

* add no argument apply api for softmax

* ONNX ResNet example (intel-analytics#2939)

* add onnx resnet example

* add doc for onnx

* add doc for onnx

* clean up

* add maskrcnn inference example (intel-analytics#2944)

* add maskrcnn inference example

* meet pr comments

* add model download url

* Update the RoiLabel and MTImageFeatureToBatch (intel-analytics#2925)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* Python MKLDNN examples for CNN(LeNet) and RNN(LSTM) (#2932)

* fix: takeSample only works for dnn backend and get one batch (intel-analytics#2947)

* fix: takeSample only works for dnn backend and get one batch

* edit doc (#2948)

* Rename filesToRoiImageFrame to filesToRoiImageFeatures (intel-analytics#2949)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* filesToRoiImageFrame -> filesToRoiImageFeatures, to public

* fix: move out setMklThreads of MklDnn (intel-analytics#2950)

* memory data cleanup (#2956)

* memory data cleanup

* Onnx support: RoiAlign and TopK parameter update (#2957)

* Topk add dim and increase parameter

* RoiAlign add max pooling mode

* add test cases

* add test cases

* remove masks requirements (intel-analytics#2959)

* fix: the squeeze should not be included in IRElement (intel-analytics#2962)

* enhance COCODataset (#2954)

* enhance COCODataset:
Add COCODataset.loadFromSeqFile
Add COCODataset.toImageFeatures
Add COCOImage.toTable

* rename and polish doc

* fix COCO serialize bug

* fix typo in function name

* typo fix (intel-analytics#2965)

* rename RoiImageFeatureToBatch APIs (#2964)

* RoiMiniBatch enhancement (#2953)

* SerializableIndexedSeq

* allow empty target & image size info

* rename RoiImageFeatureToBatch APIs

* set as private

* change back to array

* MTImageFeatureToBatch without labels

* handle iscrowd

* remove duplication in merge

* feat: add softmax backward (intel-analytics#2967)

* feat: add softmax backward

* fix: fuse bn scale and relu to bn. (intel-analytics#2966)

* fix: fuse bn scale and relu.

* fix mask unit tests (intel-analytics#2973)

* fix: nms stability when using treeset. (intel-analytics#2972)

* flip version to 0.11 (intel-analytics#2974)

* refactor anchor generator (#2963)

* refactor anchor generator

* meet pr comments

* fix code style

* ROIAlign refactor (intel-analytics#2960)

* ROIAlign refactor

* fix unit tests

* fix model load of maskrcnn (intel-analytics#2961)

* fix maskrcnn model load

* delete temp file

* fix maskrcnn tests

* support roialign backward (intel-analytics#2975)

* support roialign backward

* fix sparselinear unit test

* fix: bn nhwc error, the channel should be the last dim (#2981)

* refactor: move torch relevants unit tests to integration tests. (intel-analytics#2971)

* fix: enable integration accuracy tests (intel-analytics#2976)

* fix: softmax dnn backend wrong order of primitive (intel-analytics#2986)

* modify TextClassifier.scala (#2987)

* Add a method to merge nested StaticGraphs (intel-analytics#2985)

* NHWC support when running with MKL-DNN (#2989)

* support NHWC for MKLDNN

* fix unit tests

* Keras with MKL-DNN backend support (#2990)

* Update README.md

* Update README.md

* feat: add distri optimizer v2 (intel-analytics#2992)

* update error message in AllReduceParameter (#2997)

* update error message in AllReduceParameter

* use tensorflow proto jar (#2994)

* fix callBigDLFunc (intel-analytics#3002)

* Remove final for AbstractModule (intel-analytics#3001)

* DistriOptimizerV2 argument (intel-analytics#3003)

* call DistriOptimizerV2

* fix inception (intel-analytics#3010)

* fix top1 and treenn (intel-analytics#3011)

* remove final setExtraParameters (#3014)

* move pretrain in DistriOptimizerV2 (intel-analytics#3016)

* move getData

* rename

* remove time counting

* deprecate dlframe (intel-analytics#3012)

* deprecate dlframe

* fix throughput (#3017)

* fix throughput

* update

* add release doc for 0.10.0 (intel-analytics#3020)

* test examples by distrioptimizerv2 (intel-analytics#3007)

* enable scala examples by distrioptimizerv2

* update example's readme

* update integration test

* test python examples by distriOptimizerV2 (intel-analytics#3008)

* Test python examples by distriOptimizerV2

* deprecate nn.keras (intel-analytics#3013)

* deprecate nn.keras

* fix loss when minibatch size is different (intel-analytics#3021)

* fix loss

* fix ut

* fix style check (intel-analytics#3022)

* specify pyspark version (intel-analytics#3030)

* specify pyspark version

* add release doc for 0.11 (#3026)

* flip version to 0.12 (intel-analytics#3029)



* update

* fix KerasLayer new parameters() (#3034)

* Fix analytics zoo protobuf shading problem (intel-analytics#3033)

* change shade name and remove protobuf-java (already introduced by tf)

* remove protobuf

* add required dependencies (#3047)

* update doc (intel-analytics#3056)

* Updatedoc (#3060)

* Update install-from-pip.md

* [WIP] spark 3.0 (intel-analytics#3054)

* spark 3.0

* add spark3.0 deployment (intel-analytics#3061)

* add spark3.0 deployment

* add warning to remind Optimizer() deprecates (intel-analytics#3062)

* add warning to remind deprecates

* Update scala maven plugin (#3068)

* update scala maven plugin

* change to public (#3064)

* Add big model support (#3067)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* squeeze target dimension (corner case) in ClassNLLCriterion (intel-analytics#3072)

* fix target dimension match error

* update message (#3073)

* flip version to 0.13-snapshot (intel-analytics#3074)

* flip version to 0.13-snapshot

* Uncompressed Tensor  (intel-analytics#3079)

* support no compressing parameter

* address comments

* hotfix ClassNLLCriterion with cloned target (#3081)

* hotfix ClassNLLCriterion with cloned target

* Fix SerializationUtils clone issue of QuantizedTensor (intel-analytics#3088)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* update clone quantizedtensor

* update

* add OptimPredictorShutdownSpec UT in integration test (#3089)

* move integration UT to a general test script (intel-analytics#3094)

* back port master (intel-analytics#3096)

* set seed to avoid random error in PredictionServiceUT (intel-analytics#3097)

* Jdk11 support (intel-analytics#3098)

* update for jdk 11 support and doc

* add serializeUid (intel-analytics#3099)

* update doc (intel-analytics#3104)

* add doc for running in ide (intel-analytics#3106)

* fix callBigDLFunc return a Int while the true return value from java is a byte array. (intel-analytics#3111)

* add list of df support (intel-analytics#3113)

* Update readme (intel-analytics#3118)

* Update index.md

* add 0.12.2 release download (#3122)

* remove DLFrames (intel-analytics#3124)

* remove DLFrames

* update

* update

* update

* rm dlframe example from test script

* Add Utest about dividing zero (#3128)

* Add Utest about dividing zero

* add Utest and zero check of LocalData

* add Utest and zero check of LocalData

* change

* Add Utest about dividing zero

* fix test

* add python3 to Dockerfile (intel-analytics#3132)

* add python3 to Dockerfile

* update

* update jdk

* update

* make default DistriOptimizer as V2 (intel-analytics#3129)

* make default DistriOptimizer as V2

* update

* fix dlframe (intel-analytics#3133)

* DistriOptimizerV2 logger (intel-analytics#3135)

* DistriOptimizerV2 logger

* update

* fix style check

* validate epoch num

* move dlframe SharedParamsApater to AZ and roll back to OptimizerV1 (intel-analytics#3137)

* upgrade spark version (intel-analytics#3138)

* Update deploy-spark2.sh

* 0.13 release doc (#3144)

* upgrade log4j (intel-analytics#3141)

* flip0.14 (intel-analytics#3142)

* flip0.14

* update

* Update deploy-spark3.sh (#3145)

* update

* update

* update

* update

* fix make dist

* migrate path

* update

* update

Co-authored-by: zhangxiaoli73 <380761639@qq.com>
Co-authored-by: Jerry Wu <wzhongyuan@gmail.com>
Co-authored-by: Xin Qiu <qiuxin2012@users.noreply.github.com>
Co-authored-by: Yanzhang Wang <i8run15@gmail.com>
Co-authored-by: GenBrg <34305977+GenBrg@users.noreply.github.com>
Co-authored-by: LeicongLi <leicongli@gmail.com>
Co-authored-by: Emiliano Martinez <emimartinez.sanchez@gmail.com>
Co-authored-by: abdolence <abdulla.abd.m@gmail.com>
Co-authored-by: Enrique Garcia <engapa@gmail.com>
Co-authored-by: Louie Tsai <louie.tsai@intel.com>
Co-authored-by: yaochi <yaochitc@gmail.com>
Co-authored-by: Menooker <Menooker@users.noreply.github.com>
Co-authored-by: Menooker <myjisgreat@live.cn>
Co-authored-by: Firecrackerxox <mengceng.he@intel.com>
Co-authored-by: majing921201 <1834475657@qq.com>
Co-authored-by: jenniew <jenniewang123@gmail.com>
Co-authored-by: Xiao <lingxiao1989@gmail.com>
Co-authored-by: Firecrackerxox <he044646@sina.com>
Co-authored-by: Hui Li <lihuibinghan@sina.com>
Co-authored-by: Jason Dai <jason.dai@intel.com>
Co-authored-by: dding3 <ding.ding@intel.com>
Co-authored-by: Yang Wang <yang3.wang@intel.com>
Co-authored-by: Yina Chen <33650826+cyita@users.noreply.github.com>
Co-authored-by: Hangrui Cao <50705298+DiegoCao@users.noreply.github.com>
Co-authored-by: pinggao18 <44043817+pinggao18@users.noreply.github.com>
Le-Zheng added a commit to Le-Zheng/analytics-zoo that referenced this pull request Sep 14, 2021
* convert static graph to IR graph and build (intel-analytics#2711)

* add static graph to IR graph

* meet pr comments

* [Enhancement] - Enhance unig test to avoid dynamic resource allocation issue by docker (intel-analytics#2713)

* make the core number fixed

* fix local predictor

* add Trigger and/or python API (intel-analytics#2682)

* add spark 2.4 support (intel-analytics#2715)

* update sparse tensor's document (#2714)

* Reserve all state in OptimMethod when calling Optimizer.optimize() multiple times (#2648)

* reserve optimMethod for each worker

* add valdiation throughput

* cache variable previousOptim

* fix: move mkldnn computing to a single thread pool (intel-analytics#2724)

Because if we use the parent thread directly, there will be two bugs,
1. The child threads forked from parent thread will be bound to core 0
because of the affinity settings.
2. The native thread has some unknown thread local variables. So if
the parent thread exits and is recreated, such as the thread from
Executors.newFixedThreadPool. The whole app will be segment fault.
The parent thread means the main thread (Local Mode) or worker thread of
mapPartition (Distributed Mode).

* add ceilMode for Pooling & fix batchNorm evaluate (#2708)

* add ceilMode for Pooling & fix batchNorm evaluate

* add training status for dnn layer

* fix comments

* fix IRGraph init & Add regualizer (#2736)

* fix IRGraph init & Add regualizer

* meet review comments

* fix: update mkldnn version to v0.17 issues. (intel-analytics#2712)

There're two issues,

1. the padding tensor required. mkl-dnn will use a padding tensor which
    will use more memory, such as 4x1x28x28 to 4x8x28x28(avx2). It will
    pad to times of simd width.
2. the TensorMMap between DenseTensor and DnnTensor. Previous impl
    will allocate DnnTensor when model is created, which will cost too much
    space. So this patch will allocate it at runtime.

* add computshape for some layers and add skip primitives in DnnGraph (intel-analytics#2740)

* add computshape for some layer and add skip primitives in DnnGraph

* meet pr comments

* Improve documentation (intel-analytics#2745)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* include edge case to cover all the data types (#2742)

* layer auto fusion for dnn graph (intel-analytics#2746)

* add auto fusion in dnn graph

* refactor predict for dnn model (intel-analytics#2737)

* refactor predict for dnn model

* remove some unit tests (intel-analytics#2752)

* remove some conflict tests (#2753)

* Update documentation (intel-analytics#2749)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Fix Add operation error when type is Double importing Tensorflow graph (#2721)

* feature: add byte supports for DnnTensor (intel-analytics#2751)

* feat: add byte supports for DnnTensor

* [New Feature] Calculating Scales (#2750)

* [New Feature]Calculating Scales

* recursively update mask for container module (intel-analytics#2754)

* recursively update mask for container module

* [Enhancement] - Speed up BlasWrapper performance under MKL-DNN (intel-analytics#2748)

* add parallel in Blaswrapper

* refactor to support ssd

* meet pr comments

* fix logger serialize

* Loss Function docs improvement (intel-analytics#2757)

* Improve Loss Function docs v2

* change asInstanceOf to toDistirbuted in optimizer (#2755)

* change asInstanceOf to toDistirbuted

* change asInstanceOf to toDistirbuted

* convert scale in blas to dnn (#2758)

* convert scale in blas to dnn

* meet pr comment

* feat: reorder for int8 supports (#2756)

1. Because the new data type, we should add a new attribute called dataType
    to the `MemoryData`.
2. Because we should transfer the scales between FP32->int8 and Int8->FP32.
    we should add two new attributes called `mask` and `scales`.

* fix conversion accuracy (intel-analytics#2760)

*  fix accuracy for saved model

* exclude mkldnn model when conversion

* feature: layer wise supports of int8 (intel-analytics#2762)

Enable the int8 data type in layers, especially for convolutions.
So for a specific layer, it can accept a int8 input. If you want to the fp32
output, should add a reorder.

* feature: mkldnn int8 layer wise supports (intel-analytics#2759)

including 3 steps.

1. generate scales of model.
   need an api like `generateScalesWithMask` to generate the scales of
   fp32 model. and the model returned is an fp32 model too.
2. quantize the model
   the `quantize()` api will be compatible with the `bigquant`
   backend, which will set the quantize flag. And when doing compile,
   the quantized weight, output, input will be generated by mkldnn at
   runtime.
3. do the inference (forward).

* update readme for v1 training (intel-analytics#2763)

* update release doc for preparation (intel-analytics#2764)

* change some docs about mkldnn (intel-analytics#2765)

* add comments about mkldnn

* meet pr comments

* examples for int8 (intel-analytics#2761)

This is an example of how to use mkldnn int8. There're two steps, use
GenInt8Scales to generate the scales first and save the new model. And than you
can use the quantized model as usual.

* enable fustion by default (intel-analytics#2766)

* fix: the influence of default value of fusion (#2768)

* fix: use too much memory of mkldnn models (intel-analytics#2783)

* fix: inplace of input/output and weight dimension error (intel-analytics#2779)

Some layer's input and output use the same memory. We can't do forward in the
`calcScales`. Because at that time, the input has been changed, its scales maybe
not right. Such as,

Seqeuntail().add(Conv).add(ReLU)

it will do two steps, seq.forward(input) first. and when go into the ReLU, it
will do another forward, so the input will be the output. And scales will be
wrong.

For convolution's weight, the dimension always is 5, although the group number
is 1. But for dnn convolution, if there's no group, the weight's dimension
should be 4.

* fix: the blas wrapper has no scales (intel-analytics#2778)

* fix softmax (intel-analytics#2777)

* fix: performance regression on resnet50 (intel-analytics#2774)

the u8 to s8 or s8 to u8 needs no reorder on this case.

* fix log init (#2781)

* fix: dropout should init primitive (#2789)

* Docs update for spark 2.3, build 0.7 and deps exlude (intel-analytics#2671)

* flip to 0.9.0 (intel-analytics#2792)

* Improve Layer documentation v1 (#2767)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Loss Function docs improvement v1

* Improve Loss Function docs v2

* Improve Layers documentation

* Improve documentation on Activations

* minor fix

* Update a code section with python style on Metrics.md (intel-analytics#2665)

* [Fix] doc : some changes for scalaUserGuide and release links according to … (intel-analytics#2791)

* doc : some changes for scalaUserGuide and release links according to v0.8.0 release

* Update build-bigdl-core.md

* Update build-bigdl-core.md

* test: should compare the right grad input (intel-analytics#2794)

* fix the wrong error message (#2800)

* [New feature] Add attention layer and ffn layer (intel-analytics#2795)

* add attention layer

* add ffn layer and more unit tests

* refactor according to pr comments

* add SerializationTest

* fix unit tests

* add python api

* update readme with newly adopted mkl-dnn (#2803)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer (intel-analytics#2802)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer:
Add LARS optimizer: Layer-wise scaled. Also with utility functions to build a set of LARS optim for a container.

Bug fix: The gradient block id of AllReduceParameter is originally composed of {id}{pidTo}gradientBytes{pidFrom}. But the combination of {id}{pidTo} will cause ambiguity. e.g., "112" can be {1}{12} or {11}{2}. Now a "_" is added to separate id from pidTo

* refine documents, correctly set the lrSchedulerOwner bit

* format the added code

* make Lars inherit SGD

* rename Lars -> LarsSGD and reformat

* style changes

* bugfix - set mask for container (intel-analytics#2807)

* bugfix - set mask for container

* bugfix #2805: set dimension mask

* Update Graph.scala

* Update Graph.scala

* change set mask indicator's name

* rename set mask params

* [Enhancement]: Scala Reflection: get default value for constructor parameters (intel-analytics#2808)

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* resolve conflict

* resolve conflict

* code style check

* remove print

* fix typos

fix typos

* replace randomcropper with centercrop for better performance (#2818)

* fix: memory data hash code should contain data type (intel-analytics#2821)

* Optimize backward graph generation and CAddTable (intel-analytics#2817)

* Optimize backward graph generation and caddtable

* refine add table

* change api name

* add layer norm and expand size layers (#2819)

* add layer norm and expand size

* meet pr comments

* feat: enable global average pooling (intel-analytics#2823)

* feat: enable global average pooling

* test: add more unit tests

* Optimizers: use member variable in parent class

* Revert "Optimizers: use member variable in parent class"

This reverts commit 7e47204

* Dilation in MKL-DNN Convolution (intel-analytics#2815)

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* fix typos

fix typos

* make todo all uppercase

* fix: calculate arbitrary mask of scales (intel-analytics#2822)

* Use one AllReduceParameter for multi-optim method  training (intel-analytics#2814)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* change random seed in UT

* [New feature] add transformer layer (intel-analytics#2825)

* add transformer

* refactor class name

* use same embedding for translation

* fix pr comments

* [Bug Fix] Fix Issue 2734 (#2816)

* fix issue 2734

* fix issue 2734

* fix issue 2734

* [Refactor] Reflection Utilization (#2831)

* refactor reflection utils

* refactor reflection utils

* feat: MKLDNN LSTM unidirectional/bidirectional inference support (intel-analytics#2806)

* LSTM draft

* MKLDNN LSTM fixed MD

* added hiddenSize

* setMemoryData NativeData

* weights NativeData format set to ldigo, all 1 test passed

* fixed format any problem

* LSTM weights bias initialisation

* add LSTM2 in nn

* Bidirectional LSTM inference enabled

* modified Bidirectional test

* LSTMSpec input format conversion bug between bigdl and mkldnn fixed, not support random weights, bias

* fixed the last problem 1 3 2 4

* Three inference tests with randomly generated parameters

* Added comments and modified the LSTMSpec (tests using Equivalent.nearequals)

* Deleted nn/LSTM2. Renamed methods. Added a requirement in nn/TimeDistributed

* combined initMemoryDescs() into initFwdPrimitives()

* Add require for input size and hidden size matching if layers of LSTM is more than one

* Refactor RNN

* Add comment on gate order to mkldnn/RNN

* Add unidirectional multilayer test

* add comments/ modify UTs

* phase is not used anymore/ use isTraining() in stead

* operationWant enhanced/ weight init/ release() parameters()

* remove input format check and change some variables names

* input format check / throw exception print info / release code

* comment style and RNNSerialTest

* remove unnecessary comments

* Softmax -> SoftMax (#2837)

* bug fix for cmul (intel-analytics#2836)

* bug fix for cmul

* meet pr comments

* set new storage to weight and bias for weight fusion (intel-analytics#2839)

* Add parameter processor for LARS (#2832)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* implement lars whole layer gradient norm calculation

* change random seed in UT

* add limitation on "trust" of LARS, remove debug output

* reformat

* add tests in DirtriOptimizer for LARS

* reformat

* update parameters in UT

* update parameters in UT

* Add transformer to LM example (intel-analytics#2835)

* add transformer to LM example

* refactor dropout in Transformer

* meet pr comments

* feat: MKLDNN LSTM unidirectional/bidirectional backward support (#2840)

* MKLDNN LSTM backward support with accuracy testing

* fix: require consistent between shape and layout of mkldnn (intel-analytics#2824)

* fix: fusion for multi-group of convolution (intel-analytics#2826)

* fix: support int8 of jointable (#2827)

* fix: support int8 of jointable
* doc: add more docs

* fix: invokeAndWait2 should throw the exception in the tasks (intel-analytics#2843)

* fix acc bug & init dnn thread (intel-analytics#2841)

* support tnc and ntc conversion (intel-analytics#2844)

* support ntc in dnn layer (intel-analytics#2847)

* support ntc in dnn layer

* meet pr comments

* [WIP]Add beam search feature in transformer model (intel-analytics#2834)

* add beam search feature

* Update beam search feature and unit test

* add symbolToLogits function set check

* update clearState and add serial test

* add SequenceBeamSearch to python layers

* add createSequenceBeamSearch method to python api

* feat: add a property to disable omp thread affinity (intel-analytics#2849)

* fix: use treeset to calc topk to upgrade the performance of DetectionOutputSSD (intel-analytics#2853)

* fix: wrong affinity settings (intel-analytics#2857)

* update beam search feature for interface with transformer model (#2855)

* update beam search for padding value and cache structure

* update python API for beam search

* add comments and update python layer

* modify comments format

* modify comments format

* Support converting blas lstm to dnn lstm (#2846)

* convert from blas lstm to dnn lstm

* meet pr comments

* fix load lstm error bug (intel-analytics#2858)

* Add beam search in transformer (intel-analytics#2856)

* Add beam search in transformer

* meet pr comments

* fix: upgrade the performance of normalize (intel-analytics#2854)

* feat: add axis to softmax (intel-analytics#2859)

* add release doc for 0.9 (intel-analytics#2862)

* fix: update core ref to master (intel-analytics#2865)

* flip version to 0.10.0 (intel-analytics#2869)

* [Bug Fix] - Fix module version comparison  (intel-analytics#2871)

* update serialization

* update serialization

* convert IRgraph momentum to mkldnn (intel-analytics#2872)

* tutorial fix (intel-analytics#2879)

* feat: RoiAlign Forward (intel-analytics#2874)

* Add set input output format API in Python (intel-analytics#2880)

* add set input output format

* add static graph check

* feat: Feature Pyramid Networks Forward (intel-analytics#2870)

* fix memory leak for ir graph training (intel-analytics#2895)

* add gemm layer (#2882)

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add gemm layer

* add gemm layer

* add Shape layer (intel-analytics#2885)

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add Gather layer (intel-analytics#2897)

* add gather layer

* [New feature] Add maskhead (intel-analytics#2892)

* support for maskhead

* fix unit tests (intel-analytics#2905)

* modify  predict/predictClass function  (#2868)

* predictClass output modification

* predict/predictClass function modification in Beta Api

* predict/predictClass function modification

* predict/predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* [New feature] Add Boxhead (intel-analytics#2894)

* add boxhead

* add SerialTest

* meet pr comments

* fix: Add TopBlocks to Feature Pyramid Networks (FPN) (#2899)

* Add Mean Average Precision validation method (intel-analytics#2906)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* fix boxhead unit tests (#2912)

* python api nested list input and pooler python api (intel-analytics#2900)

* Auto memory management for MKLDNN (#2867)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* style fixes

* change _implicitMemoryOwner -> _this

* [New feature] Add region proposal (intel-analytics#2896)

* add Regionproposal

* [New feature] add maskrcnn (#2908)

* add maskrcnn

* fix mask head

* move maskrcnn to models

* add maskrcnn serialTest

* Add Onnx Supported Layers (intel-analytics#2902)

* remove duplicated layers

* Update RoiLabel class and add RoiImageFeatureToBatch (intel-analytics#2913)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* update RoiLabel, add RoiImageFeatureToBatch

* fix typo in class name

* updates by suggestions

* minor updates

* Move RoiMiniBatch to MTImageFeatureToBatch.scala

* mask in RoiLabel now have Floats not Bytes

* use IndexedSeq for RoiLabel

* style fix

* add isCrowd and origSize to final target table

* style fix

* isCrowd change to float, add doc

* add tests and bug fixes

* add util getting RoiLabels from ImageFeatures

* add util getting RoiLabels from Table

* comment out the tests

* rename utils in RoiLabel

* feat: MKLDNN GRU forward/backward support (#2893)

* Onnx support: modify unsqueeze function (#2910)

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* add maskutils (intel-analytics#2921)

* add maskutils

* update tests & docs

* fix typo in document

* Fix memory leaks on training (intel-analytics#2914)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* fix memory leak in batch norm

* style fixes

* change _implicitMemoryOwner -> _this

* release submat

* release opencv submats

* support samples with different size  to one mini batch (intel-analytics#2929)

* add to batch with resize

* meet comments

* support batch for mask head and pooler (intel-analytics#2926)

* support batch for mask head

* meet comments

* Onnx support: add a dim parameter to ops.Gather (intel-analytics#2920)

* add dim parameter to ops.Gather

* improve and simplify code

* improve and simplify code

* improve and simplify code

* improve and simplify code

* support batch for regionproposal (#2928)

* support batch for regionproposal

* enable gru blas-to-dnn conversion (intel-analytics#2930)

* Onnx support: add pos parameter to softmax (intel-analytics#2933)

* add pos parameter to softmax

* add pos parameter to softmax

* add pos parameter to softmax

* fix review problem

* fix review problem

* Add resize for segmentation (intel-analytics#2923)

* add resize for segmentation

* meet pr comments

* support batch input for boxhead (#2924)

* boxhead support batch input

* meet pr comments

* COCO SeqFile (intel-analytics#2927)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* ignore non-existing images

* updates based on GH comments

* ONNX Support (#2918)

* onnx dev

* add onnx loader

* clean up

* feat: add precision recall auc (#2941)

* feat: add precision recall auc

* add post processing for maskrcnn model (#2931)

* add mask postprocessing

* put image info to mask model

* fix TimeDistributedCriterion() lack of parameter of dimension issue (intel-analytics#2940)

* revert back api (intel-analytics#2943)

* fix: softmax and bn+scale fusion (intel-analytics#2937)

* feat: multi models support with MKL-DNN backend (intel-analytics#2936)

* feat: multi models support with MKL-DNN backend

* add COCO MAP (#2935)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* add COCO MAP

* revert merge conflict

* ignore non-existing images

* add IOU related API. MAP now parses RLEs

* BBox now inclusive

* updates based on GH comments

* add COCODataset.getImageById

* COCO topK default => -1, remove height: Int, width: Int in GroundTruthRLE

* update imageId2Image

* rename MAPObjectDetection utils, add cocoSegmentationAndBBox, refine formatting

* rename utils

* update documents

* check size of bbox & classes & scores & labels & iscrowd. Handle empty predictions

* add gt and target image size checking, add support for empty target bbox, add UT

* detection sorted before matching with GT. Optimize MAPResult merging. Add UT for merging

* COCO Seq file reader: grey to bgr (intel-analytics#2942)

* grey to bgr

* refactor isGrayScaleImage

* simplify grey scale image checking

* Add the flushing denormal values option on BigDL side (#2934)

* add no argument apply api for softmax (intel-analytics#2945)

* add no argument apply api for softmax

* add no argument apply api for softmax

* ONNX ResNet example (intel-analytics#2939)

* add onnx resnet example

* add doc for onnx

* add doc for onnx

* clean up

* add maskrcnn inference example (intel-analytics#2944)

* add maskrcnn inference example

* meet pr comments

* add model download url

* Update the RoiLabel and MTImageFeatureToBatch (intel-analytics#2925)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* Python MKLDNN examples for CNN(LeNet) and RNN(LSTM) (#2932)

* fix: takeSample only works for dnn backend and get one batch (intel-analytics#2947)

* fix: takeSample only works for dnn backend and get one batch

* edit doc (#2948)

* Rename filesToRoiImageFrame to filesToRoiImageFeatures (intel-analytics#2949)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* filesToRoiImageFrame -> filesToRoiImageFeatures, to public

* fix: move out setMklThreads of MklDnn (intel-analytics#2950)

* memory data cleanup (#2956)

* memory data cleanup

* Onnx support: RoiAlign and TopK parameter update (#2957)

* Topk add dim and increase parameter

* RoiAlign add max pooling mode

* add test cases

* add test cases

* remove masks requirements (intel-analytics#2959)

* fix: the squeeze should not be included in IRElement (intel-analytics#2962)

* enhance COCODataset (#2954)

* enhance COCODataset:
Add COCODataset.loadFromSeqFile
Add COCODataset.toImageFeatures
Add COCOImage.toTable

* rename and polish doc

* fix COCO serialize bug

* fix typo in function name

* typo fix (intel-analytics#2965)

* rename RoiImageFeatureToBatch APIs (#2964)

* RoiMiniBatch enhancement (#2953)

* SerializableIndexedSeq

* allow empty target & image size info

* rename RoiImageFeatureToBatch APIs

* set as private

* change back to array

* MTImageFeatureToBatch without labels

* handle iscrowd

* remove duplication in merge

* feat: add softmax backward (intel-analytics#2967)

* feat: add softmax backward

* fix: fuse bn scale and relu to bn. (intel-analytics#2966)

* fix: fuse bn scale and relu.

* fix mask unit tests (intel-analytics#2973)

* fix: nms stability when using treeset. (intel-analytics#2972)

* flip version to 0.11 (intel-analytics#2974)

* refactor anchor generator (#2963)

* refactor anchor generator

* meet pr comments

* fix code style

* ROIAlign refactor (intel-analytics#2960)

* ROIAlign refactor

* fix unit tests

* fix model load of maskrcnn (intel-analytics#2961)

* fix maskrcnn model load

* delete temp file

* fix maskrcnn tests

* support roialign backward (intel-analytics#2975)

* support roialign backward

* fix sparselinear unit test

* fix: bn nhwc error, the channel should be the last dim (#2981)

* refactor: move torch relevants unit tests to integration tests. (intel-analytics#2971)

* fix: enable integration accuracy tests (intel-analytics#2976)

* fix: softmax dnn backend wrong order of primitive (intel-analytics#2986)

* modify TextClassifier.scala (#2987)

* Add a method to merge nested StaticGraphs (intel-analytics#2985)

* NHWC support when running with MKL-DNN (#2989)

* support NHWC for MKLDNN

* fix unit tests

* Keras with MKL-DNN backend support (#2990)

* Update README.md

* Update README.md

* feat: add distri optimizer v2 (intel-analytics#2992)

* update error message in AllReduceParameter (#2997)

* update error message in AllReduceParameter

* use tensorflow proto jar (#2994)

* fix callBigDLFunc (intel-analytics#3002)

* Remove final for AbstractModule (intel-analytics#3001)

* DistriOptimizerV2 argument (intel-analytics#3003)

* call DistriOptimizerV2

* fix inception (intel-analytics#3010)

* fix top1 and treenn (intel-analytics#3011)

* remove final setExtraParameters (#3014)

* move pretrain in DistriOptimizerV2 (intel-analytics#3016)

* move getData

* rename

* remove time counting

* deprecate dlframe (intel-analytics#3012)

* deprecate dlframe

* fix throughput (#3017)

* fix throughput

* update

* add release doc for 0.10.0 (intel-analytics#3020)

* test examples by distrioptimizerv2 (intel-analytics#3007)

* enable scala examples by distrioptimizerv2

* update example's readme

* update integration test

* test python examples by distriOptimizerV2 (intel-analytics#3008)

* Test python examples by distriOptimizerV2

* deprecate nn.keras (intel-analytics#3013)

* deprecate nn.keras

* fix loss when minibatch size is different (intel-analytics#3021)

* fix loss

* fix ut

* fix style check (intel-analytics#3022)

* specify pyspark version (intel-analytics#3030)

* specify pyspark version

* add release doc for 0.11 (#3026)

* flip version to 0.12 (intel-analytics#3029)



* update

* fix KerasLayer new parameters() (#3034)

* Fix analytics zoo protobuf shading problem (intel-analytics#3033)

* change shade name and remove protobuf-java (already introduced by tf)

* remove protobuf

* add required dependencies (#3047)

* update doc (intel-analytics#3056)

* Updatedoc (#3060)

* Update install-from-pip.md

* [WIP] spark 3.0 (intel-analytics#3054)

* spark 3.0

* add spark3.0 deployment (intel-analytics#3061)

* add spark3.0 deployment

* add warning to remind Optimizer() deprecates (intel-analytics#3062)

* add warning to remind deprecates

* Update scala maven plugin (#3068)

* update scala maven plugin

* change to public (#3064)

* Add big model support (#3067)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* squeeze target dimension (corner case) in ClassNLLCriterion (intel-analytics#3072)

* fix target dimension match error

* update message (#3073)

* flip version to 0.13-snapshot (intel-analytics#3074)

* flip version to 0.13-snapshot

* Uncompressed Tensor  (intel-analytics#3079)

* support no compressing parameter

* address comments

* hotfix ClassNLLCriterion with cloned target (#3081)

* hotfix ClassNLLCriterion with cloned target

* Fix SerializationUtils clone issue of QuantizedTensor (intel-analytics#3088)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* update clone quantizedtensor

* update

* add OptimPredictorShutdownSpec UT in integration test (#3089)

* move integration UT to a general test script (intel-analytics#3094)

* back port master (intel-analytics#3096)

* set seed to avoid random error in PredictionServiceUT (intel-analytics#3097)

* Jdk11 support (intel-analytics#3098)

* update for jdk 11 support and doc

* add serializeUid (intel-analytics#3099)

* update doc (intel-analytics#3104)

* add doc for running in ide (intel-analytics#3106)

* fix callBigDLFunc return a Int while the true return value from java is a byte array. (intel-analytics#3111)

* add list of df support (intel-analytics#3113)

* Update readme (intel-analytics#3118)

* Update index.md

* add 0.12.2 release download (#3122)

* remove DLFrames (intel-analytics#3124)

* remove DLFrames

* update

* update

* update

* rm dlframe example from test script

* Add Utest about dividing zero (#3128)

* Add Utest about dividing zero

* add Utest and zero check of LocalData

* add Utest and zero check of LocalData

* change

* Add Utest about dividing zero

* fix test

* add python3 to Dockerfile (intel-analytics#3132)

* add python3 to Dockerfile

* update

* update jdk

* update

* make default DistriOptimizer as V2 (intel-analytics#3129)

* make default DistriOptimizer as V2

* update

* fix dlframe (intel-analytics#3133)

* DistriOptimizerV2 logger (intel-analytics#3135)

* DistriOptimizerV2 logger

* update

* fix style check

* validate epoch num

* move dlframe SharedParamsApater to AZ and roll back to OptimizerV1 (intel-analytics#3137)

* upgrade spark version (intel-analytics#3138)

* Update deploy-spark2.sh

* 0.13 release doc (#3144)

* upgrade log4j (intel-analytics#3141)

* flip0.14 (intel-analytics#3142)

* flip0.14

* update

* Update deploy-spark3.sh (#3145)

* update

* update

* update

* update

* fix make dist

* migrate path

* update

* update

Co-authored-by: zhangxiaoli73 <380761639@qq.com>
Co-authored-by: Jerry Wu <wzhongyuan@gmail.com>
Co-authored-by: Xin Qiu <qiuxin2012@users.noreply.github.com>
Co-authored-by: Yanzhang Wang <i8run15@gmail.com>
Co-authored-by: GenBrg <34305977+GenBrg@users.noreply.github.com>
Co-authored-by: LeicongLi <leicongli@gmail.com>
Co-authored-by: Emiliano Martinez <emimartinez.sanchez@gmail.com>
Co-authored-by: abdolence <abdulla.abd.m@gmail.com>
Co-authored-by: Enrique Garcia <engapa@gmail.com>
Co-authored-by: Louie Tsai <louie.tsai@intel.com>
Co-authored-by: yaochi <yaochitc@gmail.com>
Co-authored-by: Menooker <Menooker@users.noreply.github.com>
Co-authored-by: Menooker <myjisgreat@live.cn>
Co-authored-by: Firecrackerxox <mengceng.he@intel.com>
Co-authored-by: majing921201 <1834475657@qq.com>
Co-authored-by: jenniew <jenniewang123@gmail.com>
Co-authored-by: Xiao <lingxiao1989@gmail.com>
Co-authored-by: Firecrackerxox <he044646@sina.com>
Co-authored-by: Hui Li <lihuibinghan@sina.com>
Co-authored-by: Jason Dai <jason.dai@intel.com>
Co-authored-by: dding3 <ding.ding@intel.com>
Co-authored-by: Yang Wang <yang3.wang@intel.com>
Co-authored-by: Yina Chen <33650826+cyita@users.noreply.github.com>
Co-authored-by: Hangrui Cao <50705298+DiegoCao@users.noreply.github.com>
Co-authored-by: pinggao18 <44043817+pinggao18@users.noreply.github.com>
Le-Zheng added a commit to Le-Zheng/analytics-zoo that referenced this pull request Sep 14, 2021
* convert static graph to IR graph and build (intel-analytics#2711)

* add static graph to IR graph

* meet pr comments

* [Enhancement] - Enhance unig test to avoid dynamic resource allocation issue by docker (intel-analytics#2713)

* make the core number fixed

* fix local predictor

* add Trigger and/or python API (intel-analytics#2682)

* add spark 2.4 support (intel-analytics#2715)

* update sparse tensor's document (#2714)

* Reserve all state in OptimMethod when calling Optimizer.optimize() multiple times (#2648)

* reserve optimMethod for each worker

* add valdiation throughput

* cache variable previousOptim

* fix: move mkldnn computing to a single thread pool (intel-analytics#2724)

Because if we use the parent thread directly, there will be two bugs,
1. The child threads forked from parent thread will be bound to core 0
because of the affinity settings.
2. The native thread has some unknown thread local variables. So if
the parent thread exits and is recreated, such as the thread from
Executors.newFixedThreadPool. The whole app will be segment fault.
The parent thread means the main thread (Local Mode) or worker thread of
mapPartition (Distributed Mode).

* add ceilMode for Pooling & fix batchNorm evaluate (#2708)

* add ceilMode for Pooling & fix batchNorm evaluate

* add training status for dnn layer

* fix comments

* fix IRGraph init & Add regualizer (#2736)

* fix IRGraph init & Add regualizer

* meet review comments

* fix: update mkldnn version to v0.17 issues. (intel-analytics#2712)

There're two issues,

1. the padding tensor required. mkl-dnn will use a padding tensor which
    will use more memory, such as 4x1x28x28 to 4x8x28x28(avx2). It will
    pad to times of simd width.
2. the TensorMMap between DenseTensor and DnnTensor. Previous impl
    will allocate DnnTensor when model is created, which will cost too much
    space. So this patch will allocate it at runtime.

* add computshape for some layers and add skip primitives in DnnGraph (intel-analytics#2740)

* add computshape for some layer and add skip primitives in DnnGraph

* meet pr comments

* Improve documentation (intel-analytics#2745)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* include edge case to cover all the data types (#2742)

* layer auto fusion for dnn graph (intel-analytics#2746)

* add auto fusion in dnn graph

* refactor predict for dnn model (intel-analytics#2737)

* refactor predict for dnn model

* remove some unit tests (intel-analytics#2752)

* remove some conflict tests (#2753)

* Update documentation (intel-analytics#2749)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Fix Add operation error when type is Double importing Tensorflow graph (#2721)

* feature: add byte supports for DnnTensor (intel-analytics#2751)

* feat: add byte supports for DnnTensor

* [New Feature] Calculating Scales (#2750)

* [New Feature]Calculating Scales

* recursively update mask for container module (intel-analytics#2754)

* recursively update mask for container module

* [Enhancement] - Speed up BlasWrapper performance under MKL-DNN (intel-analytics#2748)

* add parallel in Blaswrapper

* refactor to support ssd

* meet pr comments

* fix logger serialize

* Loss Function docs improvement (intel-analytics#2757)

* Improve Loss Function docs v2

* change asInstanceOf to toDistirbuted in optimizer (#2755)

* change asInstanceOf to toDistirbuted

* change asInstanceOf to toDistirbuted

* convert scale in blas to dnn (#2758)

* convert scale in blas to dnn

* meet pr comment

* feat: reorder for int8 supports (#2756)

1. Because the new data type, we should add a new attribute called dataType
    to the `MemoryData`.
2. Because we should transfer the scales between FP32->int8 and Int8->FP32.
    we should add two new attributes called `mask` and `scales`.

* fix conversion accuracy (intel-analytics#2760)

*  fix accuracy for saved model

* exclude mkldnn model when conversion

* feature: layer wise supports of int8 (intel-analytics#2762)

Enable the int8 data type in layers, especially for convolutions.
So for a specific layer, it can accept a int8 input. If you want to the fp32
output, should add a reorder.

* feature: mkldnn int8 layer wise supports (intel-analytics#2759)

including 3 steps.

1. generate scales of model.
   need an api like `generateScalesWithMask` to generate the scales of
   fp32 model. and the model returned is an fp32 model too.
2. quantize the model
   the `quantize()` api will be compatible with the `bigquant`
   backend, which will set the quantize flag. And when doing compile,
   the quantized weight, output, input will be generated by mkldnn at
   runtime.
3. do the inference (forward).

* update readme for v1 training (intel-analytics#2763)

* update release doc for preparation (intel-analytics#2764)

* change some docs about mkldnn (intel-analytics#2765)

* add comments about mkldnn

* meet pr comments

* examples for int8 (intel-analytics#2761)

This is an example of how to use mkldnn int8. There're two steps, use
GenInt8Scales to generate the scales first and save the new model. And than you
can use the quantized model as usual.

* enable fustion by default (intel-analytics#2766)

* fix: the influence of default value of fusion (#2768)

* fix: use too much memory of mkldnn models (intel-analytics#2783)

* fix: inplace of input/output and weight dimension error (intel-analytics#2779)

Some layer's input and output use the same memory. We can't do forward in the
`calcScales`. Because at that time, the input has been changed, its scales maybe
not right. Such as,

Seqeuntail().add(Conv).add(ReLU)

it will do two steps, seq.forward(input) first. and when go into the ReLU, it
will do another forward, so the input will be the output. And scales will be
wrong.

For convolution's weight, the dimension always is 5, although the group number
is 1. But for dnn convolution, if there's no group, the weight's dimension
should be 4.

* fix: the blas wrapper has no scales (intel-analytics#2778)

* fix softmax (intel-analytics#2777)

* fix: performance regression on resnet50 (intel-analytics#2774)

the u8 to s8 or s8 to u8 needs no reorder on this case.

* fix log init (#2781)

* fix: dropout should init primitive (#2789)

* Docs update for spark 2.3, build 0.7 and deps exlude (intel-analytics#2671)

* flip to 0.9.0 (intel-analytics#2792)

* Improve Layer documentation v1 (#2767)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Loss Function docs improvement v1

* Improve Loss Function docs v2

* Improve Layers documentation

* Improve documentation on Activations

* minor fix

* Update a code section with python style on Metrics.md (intel-analytics#2665)

* [Fix] doc : some changes for scalaUserGuide and release links according to … (intel-analytics#2791)

* doc : some changes for scalaUserGuide and release links according to v0.8.0 release

* Update build-bigdl-core.md

* Update build-bigdl-core.md

* test: should compare the right grad input (intel-analytics#2794)

* fix the wrong error message (#2800)

* [New feature] Add attention layer and ffn layer (intel-analytics#2795)

* add attention layer

* add ffn layer and more unit tests

* refactor according to pr comments

* add SerializationTest

* fix unit tests

* add python api

* update readme with newly adopted mkl-dnn (#2803)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer (intel-analytics#2802)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer:
Add LARS optimizer: Layer-wise scaled. Also with utility functions to build a set of LARS optim for a container.

Bug fix: The gradient block id of AllReduceParameter is originally composed of {id}{pidTo}gradientBytes{pidFrom}. But the combination of {id}{pidTo} will cause ambiguity. e.g., "112" can be {1}{12} or {11}{2}. Now a "_" is added to separate id from pidTo

* refine documents, correctly set the lrSchedulerOwner bit

* format the added code

* make Lars inherit SGD

* rename Lars -> LarsSGD and reformat

* style changes

* bugfix - set mask for container (intel-analytics#2807)

* bugfix - set mask for container

* bugfix #2805: set dimension mask

* Update Graph.scala

* Update Graph.scala

* change set mask indicator's name

* rename set mask params

* [Enhancement]: Scala Reflection: get default value for constructor parameters (intel-analytics#2808)

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* resolve conflict

* resolve conflict

* code style check

* remove print

* fix typos

fix typos

* replace randomcropper with centercrop for better performance (#2818)

* fix: memory data hash code should contain data type (intel-analytics#2821)

* Optimize backward graph generation and CAddTable (intel-analytics#2817)

* Optimize backward graph generation and caddtable

* refine add table

* change api name

* add layer norm and expand size layers (#2819)

* add layer norm and expand size

* meet pr comments

* feat: enable global average pooling (intel-analytics#2823)

* feat: enable global average pooling

* test: add more unit tests

* Optimizers: use member variable in parent class

* Revert "Optimizers: use member variable in parent class"

This reverts commit 7e47204

* Dilation in MKL-DNN Convolution (intel-analytics#2815)

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* fix typos

fix typos

* make todo all uppercase

* fix: calculate arbitrary mask of scales (intel-analytics#2822)

* Use one AllReduceParameter for multi-optim method  training (intel-analytics#2814)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* change random seed in UT

* [New feature] add transformer layer (intel-analytics#2825)

* add transformer

* refactor class name

* use same embedding for translation

* fix pr comments

* [Bug Fix] Fix Issue 2734 (#2816)

* fix issue 2734

* fix issue 2734

* fix issue 2734

* [Refactor] Reflection Utilization (#2831)

* refactor reflection utils

* refactor reflection utils

* feat: MKLDNN LSTM unidirectional/bidirectional inference support (intel-analytics#2806)

* LSTM draft

* MKLDNN LSTM fixed MD

* added hiddenSize

* setMemoryData NativeData

* weights NativeData format set to ldigo, all 1 test passed

* fixed format any problem

* LSTM weights bias initialisation

* add LSTM2 in nn

* Bidirectional LSTM inference enabled

* modified Bidirectional test

* LSTMSpec input format conversion bug between bigdl and mkldnn fixed, not support random weights, bias

* fixed the last problem 1 3 2 4

* Three inference tests with randomly generated parameters

* Added comments and modified the LSTMSpec (tests using Equivalent.nearequals)

* Deleted nn/LSTM2. Renamed methods. Added a requirement in nn/TimeDistributed

* combined initMemoryDescs() into initFwdPrimitives()

* Add require for input size and hidden size matching if layers of LSTM is more than one

* Refactor RNN

* Add comment on gate order to mkldnn/RNN

* Add unidirectional multilayer test

* add comments/ modify UTs

* phase is not used anymore/ use isTraining() in stead

* operationWant enhanced/ weight init/ release() parameters()

* remove input format check and change some variables names

* input format check / throw exception print info / release code

* comment style and RNNSerialTest

* remove unnecessary comments

* Softmax -> SoftMax (#2837)

* bug fix for cmul (intel-analytics#2836)

* bug fix for cmul

* meet pr comments

* set new storage to weight and bias for weight fusion (intel-analytics#2839)

* Add parameter processor for LARS (#2832)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* implement lars whole layer gradient norm calculation

* change random seed in UT

* add limitation on "trust" of LARS, remove debug output

* reformat

* add tests in DirtriOptimizer for LARS

* reformat

* update parameters in UT

* update parameters in UT

* Add transformer to LM example (intel-analytics#2835)

* add transformer to LM example

* refactor dropout in Transformer

* meet pr comments

* feat: MKLDNN LSTM unidirectional/bidirectional backward support (#2840)

* MKLDNN LSTM backward support with accuracy testing

* fix: require consistent between shape and layout of mkldnn (intel-analytics#2824)

* fix: fusion for multi-group of convolution (intel-analytics#2826)

* fix: support int8 of jointable (#2827)

* fix: support int8 of jointable
* doc: add more docs

* fix: invokeAndWait2 should throw the exception in the tasks (intel-analytics#2843)

* fix acc bug & init dnn thread (intel-analytics#2841)

* support tnc and ntc conversion (intel-analytics#2844)

* support ntc in dnn layer (intel-analytics#2847)

* support ntc in dnn layer

* meet pr comments

* [WIP]Add beam search feature in transformer model (intel-analytics#2834)

* add beam search feature

* Update beam search feature and unit test

* add symbolToLogits function set check

* update clearState and add serial test

* add SequenceBeamSearch to python layers

* add createSequenceBeamSearch method to python api

* feat: add a property to disable omp thread affinity (intel-analytics#2849)

* fix: use treeset to calc topk to upgrade the performance of DetectionOutputSSD (intel-analytics#2853)

* fix: wrong affinity settings (intel-analytics#2857)

* update beam search feature for interface with transformer model (#2855)

* update beam search for padding value and cache structure

* update python API for beam search

* add comments and update python layer

* modify comments format

* modify comments format

* Support converting blas lstm to dnn lstm (#2846)

* convert from blas lstm to dnn lstm

* meet pr comments

* fix load lstm error bug (intel-analytics#2858)

* Add beam search in transformer (intel-analytics#2856)

* Add beam search in transformer

* meet pr comments

* fix: upgrade the performance of normalize (intel-analytics#2854)

* feat: add axis to softmax (intel-analytics#2859)

* add release doc for 0.9 (intel-analytics#2862)

* fix: update core ref to master (intel-analytics#2865)

* flip version to 0.10.0 (intel-analytics#2869)

* [Bug Fix] - Fix module version comparison  (intel-analytics#2871)

* update serialization

* update serialization

* convert IRgraph momentum to mkldnn (intel-analytics#2872)

* tutorial fix (intel-analytics#2879)

* feat: RoiAlign Forward (intel-analytics#2874)

* Add set input output format API in Python (intel-analytics#2880)

* add set input output format

* add static graph check

* feat: Feature Pyramid Networks Forward (intel-analytics#2870)

* fix memory leak for ir graph training (intel-analytics#2895)

* add gemm layer (#2882)

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add gemm layer

* add gemm layer

* add Shape layer (intel-analytics#2885)

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add Gather layer (intel-analytics#2897)

* add gather layer

* [New feature] Add maskhead (intel-analytics#2892)

* support for maskhead

* fix unit tests (intel-analytics#2905)

* modify  predict/predictClass function  (#2868)

* predictClass output modification

* predict/predictClass function modification in Beta Api

* predict/predictClass function modification

* predict/predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* [New feature] Add Boxhead (intel-analytics#2894)

* add boxhead

* add SerialTest

* meet pr comments

* fix: Add TopBlocks to Feature Pyramid Networks (FPN) (#2899)

* Add Mean Average Precision validation method (intel-analytics#2906)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* fix boxhead unit tests (#2912)

* python api nested list input and pooler python api (intel-analytics#2900)

* Auto memory management for MKLDNN (#2867)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* style fixes

* change _implicitMemoryOwner -> _this

* [New feature] Add region proposal (intel-analytics#2896)

* add Regionproposal

* [New feature] add maskrcnn (#2908)

* add maskrcnn

* fix mask head

* move maskrcnn to models

* add maskrcnn serialTest

* Add Onnx Supported Layers (intel-analytics#2902)

* remove duplicated layers

* Update RoiLabel class and add RoiImageFeatureToBatch (intel-analytics#2913)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* update RoiLabel, add RoiImageFeatureToBatch

* fix typo in class name

* updates by suggestions

* minor updates

* Move RoiMiniBatch to MTImageFeatureToBatch.scala

* mask in RoiLabel now have Floats not Bytes

* use IndexedSeq for RoiLabel

* style fix

* add isCrowd and origSize to final target table

* style fix

* isCrowd change to float, add doc

* add tests and bug fixes

* add util getting RoiLabels from ImageFeatures

* add util getting RoiLabels from Table

* comment out the tests

* rename utils in RoiLabel

* feat: MKLDNN GRU forward/backward support (#2893)

* Onnx support: modify unsqueeze function (#2910)

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* add maskutils (intel-analytics#2921)

* add maskutils

* update tests & docs

* fix typo in document

* Fix memory leaks on training (intel-analytics#2914)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* fix memory leak in batch norm

* style fixes

* change _implicitMemoryOwner -> _this

* release submat

* release opencv submats

* support samples with different size  to one mini batch (intel-analytics#2929)

* add to batch with resize

* meet comments

* support batch for mask head and pooler (intel-analytics#2926)

* support batch for mask head

* meet comments

* Onnx support: add a dim parameter to ops.Gather (intel-analytics#2920)

* add dim parameter to ops.Gather

* improve and simplify code

* improve and simplify code

* improve and simplify code

* improve and simplify code

* support batch for regionproposal (#2928)

* support batch for regionproposal

* enable gru blas-to-dnn conversion (intel-analytics#2930)

* Onnx support: add pos parameter to softmax (intel-analytics#2933)

* add pos parameter to softmax

* add pos parameter to softmax

* add pos parameter to softmax

* fix review problem

* fix review problem

* Add resize for segmentation (intel-analytics#2923)

* add resize for segmentation

* meet pr comments

* support batch input for boxhead (#2924)

* boxhead support batch input

* meet pr comments

* COCO SeqFile (intel-analytics#2927)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* ignore non-existing images

* updates based on GH comments

* ONNX Support (#2918)

* onnx dev

* add onnx loader

* clean up

* feat: add precision recall auc (#2941)

* feat: add precision recall auc

* add post processing for maskrcnn model (#2931)

* add mask postprocessing

* put image info to mask model

* fix TimeDistributedCriterion() lack of parameter of dimension issue (intel-analytics#2940)

* revert back api (intel-analytics#2943)

* fix: softmax and bn+scale fusion (intel-analytics#2937)

* feat: multi models support with MKL-DNN backend (intel-analytics#2936)

* feat: multi models support with MKL-DNN backend

* add COCO MAP (#2935)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* add COCO MAP

* revert merge conflict

* ignore non-existing images

* add IOU related API. MAP now parses RLEs

* BBox now inclusive

* updates based on GH comments

* add COCODataset.getImageById

* COCO topK default => -1, remove height: Int, width: Int in GroundTruthRLE

* update imageId2Image

* rename MAPObjectDetection utils, add cocoSegmentationAndBBox, refine formatting

* rename utils

* update documents

* check size of bbox & classes & scores & labels & iscrowd. Handle empty predictions

* add gt and target image size checking, add support for empty target bbox, add UT

* detection sorted before matching with GT. Optimize MAPResult merging. Add UT for merging

* COCO Seq file reader: grey to bgr (intel-analytics#2942)

* grey to bgr

* refactor isGrayScaleImage

* simplify grey scale image checking

* Add the flushing denormal values option on BigDL side (#2934)

* add no argument apply api for softmax (intel-analytics#2945)

* add no argument apply api for softmax

* add no argument apply api for softmax

* ONNX ResNet example (intel-analytics#2939)

* add onnx resnet example

* add doc for onnx

* add doc for onnx

* clean up

* add maskrcnn inference example (intel-analytics#2944)

* add maskrcnn inference example

* meet pr comments

* add model download url

* Update the RoiLabel and MTImageFeatureToBatch (intel-analytics#2925)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* Python MKLDNN examples for CNN(LeNet) and RNN(LSTM) (#2932)

* fix: takeSample only works for dnn backend and get one batch (intel-analytics#2947)

* fix: takeSample only works for dnn backend and get one batch

* edit doc (#2948)

* Rename filesToRoiImageFrame to filesToRoiImageFeatures (intel-analytics#2949)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* filesToRoiImageFrame -> filesToRoiImageFeatures, to public

* fix: move out setMklThreads of MklDnn (intel-analytics#2950)

* memory data cleanup (#2956)

* memory data cleanup

* Onnx support: RoiAlign and TopK parameter update (#2957)

* Topk add dim and increase parameter

* RoiAlign add max pooling mode

* add test cases

* add test cases

* remove masks requirements (intel-analytics#2959)

* fix: the squeeze should not be included in IRElement (intel-analytics#2962)

* enhance COCODataset (#2954)

* enhance COCODataset:
Add COCODataset.loadFromSeqFile
Add COCODataset.toImageFeatures
Add COCOImage.toTable

* rename and polish doc

* fix COCO serialize bug

* fix typo in function name

* typo fix (intel-analytics#2965)

* rename RoiImageFeatureToBatch APIs (#2964)

* RoiMiniBatch enhancement (#2953)

* SerializableIndexedSeq

* allow empty target & image size info

* rename RoiImageFeatureToBatch APIs

* set as private

* change back to array

* MTImageFeatureToBatch without labels

* handle iscrowd

* remove duplication in merge

* feat: add softmax backward (intel-analytics#2967)

* feat: add softmax backward

* fix: fuse bn scale and relu to bn. (intel-analytics#2966)

* fix: fuse bn scale and relu.

* fix mask unit tests (intel-analytics#2973)

* fix: nms stability when using treeset. (intel-analytics#2972)

* flip version to 0.11 (intel-analytics#2974)

* refactor anchor generator (#2963)

* refactor anchor generator

* meet pr comments

* fix code style

* ROIAlign refactor (intel-analytics#2960)

* ROIAlign refactor

* fix unit tests

* fix model load of maskrcnn (intel-analytics#2961)

* fix maskrcnn model load

* delete temp file

* fix maskrcnn tests

* support roialign backward (intel-analytics#2975)

* support roialign backward

* fix sparselinear unit test

* fix: bn nhwc error, the channel should be the last dim (#2981)

* refactor: move torch relevants unit tests to integration tests. (intel-analytics#2971)

* fix: enable integration accuracy tests (intel-analytics#2976)

* fix: softmax dnn backend wrong order of primitive (intel-analytics#2986)

* modify TextClassifier.scala (#2987)

* Add a method to merge nested StaticGraphs (intel-analytics#2985)

* NHWC support when running with MKL-DNN (#2989)

* support NHWC for MKLDNN

* fix unit tests

* Keras with MKL-DNN backend support (#2990)

* Update README.md

* Update README.md

* feat: add distri optimizer v2 (intel-analytics#2992)

* update error message in AllReduceParameter (#2997)

* update error message in AllReduceParameter

* use tensorflow proto jar (#2994)

* fix callBigDLFunc (intel-analytics#3002)

* Remove final for AbstractModule (intel-analytics#3001)

* DistriOptimizerV2 argument (intel-analytics#3003)

* call DistriOptimizerV2

* fix inception (intel-analytics#3010)

* fix top1 and treenn (intel-analytics#3011)

* remove final setExtraParameters (#3014)

* move pretrain in DistriOptimizerV2 (intel-analytics#3016)

* move getData

* rename

* remove time counting

* deprecate dlframe (intel-analytics#3012)

* deprecate dlframe

* fix throughput (#3017)

* fix throughput

* update

* add release doc for 0.10.0 (intel-analytics#3020)

* test examples by distrioptimizerv2 (intel-analytics#3007)

* enable scala examples by distrioptimizerv2

* update example's readme

* update integration test

* test python examples by distriOptimizerV2 (intel-analytics#3008)

* Test python examples by distriOptimizerV2

* deprecate nn.keras (intel-analytics#3013)

* deprecate nn.keras

* fix loss when minibatch size is different (intel-analytics#3021)

* fix loss

* fix ut

* fix style check (intel-analytics#3022)

* specify pyspark version (intel-analytics#3030)

* specify pyspark version

* add release doc for 0.11 (#3026)

* flip version to 0.12 (intel-analytics#3029)



* update

* fix KerasLayer new parameters() (#3034)

* Fix analytics zoo protobuf shading problem (intel-analytics#3033)

* change shade name and remove protobuf-java (already introduced by tf)

* remove protobuf

* add required dependencies (#3047)

* update doc (intel-analytics#3056)

* Updatedoc (#3060)

* Update install-from-pip.md

* [WIP] spark 3.0 (intel-analytics#3054)

* spark 3.0

* add spark3.0 deployment (intel-analytics#3061)

* add spark3.0 deployment

* add warning to remind Optimizer() deprecates (intel-analytics#3062)

* add warning to remind deprecates

* Update scala maven plugin (#3068)

* update scala maven plugin

* change to public (#3064)

* Add big model support (#3067)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* squeeze target dimension (corner case) in ClassNLLCriterion (intel-analytics#3072)

* fix target dimension match error

* update message (#3073)

* flip version to 0.13-snapshot (intel-analytics#3074)

* flip version to 0.13-snapshot

* Uncompressed Tensor  (intel-analytics#3079)

* support no compressing parameter

* address comments

* hotfix ClassNLLCriterion with cloned target (#3081)

* hotfix ClassNLLCriterion with cloned target

* Fix SerializationUtils clone issue of QuantizedTensor (intel-analytics#3088)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* update clone quantizedtensor

* update

* add OptimPredictorShutdownSpec UT in integration test (#3089)

* move integration UT to a general test script (intel-analytics#3094)

* back port master (intel-analytics#3096)

* set seed to avoid random error in PredictionServiceUT (intel-analytics#3097)

* Jdk11 support (intel-analytics#3098)

* update for jdk 11 support and doc

* add serializeUid (intel-analytics#3099)

* update doc (intel-analytics#3104)

* add doc for running in ide (intel-analytics#3106)

* fix callBigDLFunc return a Int while the true return value from java is a byte array. (intel-analytics#3111)

* add list of df support (intel-analytics#3113)

* Update readme (intel-analytics#3118)

* Update index.md

* add 0.12.2 release download (#3122)

* remove DLFrames (intel-analytics#3124)

* remove DLFrames

* update

* update

* update

* rm dlframe example from test script

* Add Utest about dividing zero (#3128)

* Add Utest about dividing zero

* add Utest and zero check of LocalData

* add Utest and zero check of LocalData

* change

* Add Utest about dividing zero

* fix test

* add python3 to Dockerfile (intel-analytics#3132)

* add python3 to Dockerfile

* update

* update jdk

* update

* make default DistriOptimizer as V2 (intel-analytics#3129)

* make default DistriOptimizer as V2

* update

* fix dlframe (intel-analytics#3133)

* DistriOptimizerV2 logger (intel-analytics#3135)

* DistriOptimizerV2 logger

* update

* fix style check

* validate epoch num

* move dlframe SharedParamsApater to AZ and roll back to OptimizerV1 (intel-analytics#3137)

* upgrade spark version (intel-analytics#3138)

* Update deploy-spark2.sh

* 0.13 release doc (#3144)

* upgrade log4j (intel-analytics#3141)

* flip0.14 (intel-analytics#3142)

* flip0.14

* update

* Update deploy-spark3.sh (#3145)

* update

* update

* update

* update

* fix make dist

* migrate path

* update

* update

Co-authored-by: zhangxiaoli73 <380761639@qq.com>
Co-authored-by: Jerry Wu <wzhongyuan@gmail.com>
Co-authored-by: Xin Qiu <qiuxin2012@users.noreply.github.com>
Co-authored-by: Yanzhang Wang <i8run15@gmail.com>
Co-authored-by: GenBrg <34305977+GenBrg@users.noreply.github.com>
Co-authored-by: LeicongLi <leicongli@gmail.com>
Co-authored-by: Emiliano Martinez <emimartinez.sanchez@gmail.com>
Co-authored-by: abdolence <abdulla.abd.m@gmail.com>
Co-authored-by: Enrique Garcia <engapa@gmail.com>
Co-authored-by: Louie Tsai <louie.tsai@intel.com>
Co-authored-by: yaochi <yaochitc@gmail.com>
Co-authored-by: Menooker <Menooker@users.noreply.github.com>
Co-authored-by: Menooker <myjisgreat@live.cn>
Co-authored-by: Firecrackerxox <mengceng.he@intel.com>
Co-authored-by: majing921201 <1834475657@qq.com>
Co-authored-by: jenniew <jenniewang123@gmail.com>
Co-authored-by: Xiao <lingxiao1989@gmail.com>
Co-authored-by: Firecrackerxox <he044646@sina.com>
Co-authored-by: Hui Li <lihuibinghan@sina.com>
Co-authored-by: Jason Dai <jason.dai@intel.com>
Co-authored-by: dding3 <ding.ding@intel.com>
Co-authored-by: Yang Wang <yang3.wang@intel.com>
Co-authored-by: Yina Chen <33650826+cyita@users.noreply.github.com>
Co-authored-by: Hangrui Cao <50705298+DiegoCao@users.noreply.github.com>
Co-authored-by: pinggao18 <44043817+pinggao18@users.noreply.github.com>
Le-Zheng added a commit to Le-Zheng/analytics-zoo that referenced this pull request Sep 14, 2021
* convert static graph to IR graph and build (intel-analytics#2711)

* add static graph to IR graph

* meet pr comments

* [Enhancement] - Enhance unig test to avoid dynamic resource allocation issue by docker (intel-analytics#2713)

* make the core number fixed

* fix local predictor

* add Trigger and/or python API (intel-analytics#2682)

* add spark 2.4 support (intel-analytics#2715)

* update sparse tensor's document (#2714)

* Reserve all state in OptimMethod when calling Optimizer.optimize() multiple times (#2648)

* reserve optimMethod for each worker

* add valdiation throughput

* cache variable previousOptim

* fix: move mkldnn computing to a single thread pool (intel-analytics#2724)

Because if we use the parent thread directly, there will be two bugs,
1. The child threads forked from parent thread will be bound to core 0
because of the affinity settings.
2. The native thread has some unknown thread local variables. So if
the parent thread exits and is recreated, such as the thread from
Executors.newFixedThreadPool. The whole app will be segment fault.
The parent thread means the main thread (Local Mode) or worker thread of
mapPartition (Distributed Mode).

* add ceilMode for Pooling & fix batchNorm evaluate (#2708)

* add ceilMode for Pooling & fix batchNorm evaluate

* add training status for dnn layer

* fix comments

* fix IRGraph init & Add regualizer (#2736)

* fix IRGraph init & Add regualizer

* meet review comments

* fix: update mkldnn version to v0.17 issues. (intel-analytics#2712)

There're two issues,

1. the padding tensor required. mkl-dnn will use a padding tensor which
    will use more memory, such as 4x1x28x28 to 4x8x28x28(avx2). It will
    pad to times of simd width.
2. the TensorMMap between DenseTensor and DnnTensor. Previous impl
    will allocate DnnTensor when model is created, which will cost too much
    space. So this patch will allocate it at runtime.

* add computshape for some layers and add skip primitives in DnnGraph (intel-analytics#2740)

* add computshape for some layer and add skip primitives in DnnGraph

* meet pr comments

* Improve documentation (intel-analytics#2745)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* include edge case to cover all the data types (#2742)

* layer auto fusion for dnn graph (intel-analytics#2746)

* add auto fusion in dnn graph

* refactor predict for dnn model (intel-analytics#2737)

* refactor predict for dnn model

* remove some unit tests (intel-analytics#2752)

* remove some conflict tests (#2753)

* Update documentation (intel-analytics#2749)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Fix Add operation error when type is Double importing Tensorflow graph (#2721)

* feature: add byte supports for DnnTensor (intel-analytics#2751)

* feat: add byte supports for DnnTensor

* [New Feature] Calculating Scales (#2750)

* [New Feature]Calculating Scales

* recursively update mask for container module (intel-analytics#2754)

* recursively update mask for container module

* [Enhancement] - Speed up BlasWrapper performance under MKL-DNN (intel-analytics#2748)

* add parallel in Blaswrapper

* refactor to support ssd

* meet pr comments

* fix logger serialize

* Loss Function docs improvement (intel-analytics#2757)

* Improve Loss Function docs v2

* change asInstanceOf to toDistirbuted in optimizer (#2755)

* change asInstanceOf to toDistirbuted

* change asInstanceOf to toDistirbuted

* convert scale in blas to dnn (#2758)

* convert scale in blas to dnn

* meet pr comment

* feat: reorder for int8 supports (#2756)

1. Because the new data type, we should add a new attribute called dataType
    to the `MemoryData`.
2. Because we should transfer the scales between FP32->int8 and Int8->FP32.
    we should add two new attributes called `mask` and `scales`.

* fix conversion accuracy (intel-analytics#2760)

*  fix accuracy for saved model

* exclude mkldnn model when conversion

* feature: layer wise supports of int8 (intel-analytics#2762)

Enable the int8 data type in layers, especially for convolutions.
So for a specific layer, it can accept a int8 input. If you want to the fp32
output, should add a reorder.

* feature: mkldnn int8 layer wise supports (intel-analytics#2759)

including 3 steps.

1. generate scales of model.
   need an api like `generateScalesWithMask` to generate the scales of
   fp32 model. and the model returned is an fp32 model too.
2. quantize the model
   the `quantize()` api will be compatible with the `bigquant`
   backend, which will set the quantize flag. And when doing compile,
   the quantized weight, output, input will be generated by mkldnn at
   runtime.
3. do the inference (forward).

* update readme for v1 training (intel-analytics#2763)

* update release doc for preparation (intel-analytics#2764)

* change some docs about mkldnn (intel-analytics#2765)

* add comments about mkldnn

* meet pr comments

* examples for int8 (intel-analytics#2761)

This is an example of how to use mkldnn int8. There're two steps, use
GenInt8Scales to generate the scales first and save the new model. And than you
can use the quantized model as usual.

* enable fustion by default (intel-analytics#2766)

* fix: the influence of default value of fusion (#2768)

* fix: use too much memory of mkldnn models (intel-analytics#2783)

* fix: inplace of input/output and weight dimension error (intel-analytics#2779)

Some layer's input and output use the same memory. We can't do forward in the
`calcScales`. Because at that time, the input has been changed, its scales maybe
not right. Such as,

Seqeuntail().add(Conv).add(ReLU)

it will do two steps, seq.forward(input) first. and when go into the ReLU, it
will do another forward, so the input will be the output. And scales will be
wrong.

For convolution's weight, the dimension always is 5, although the group number
is 1. But for dnn convolution, if there's no group, the weight's dimension
should be 4.

* fix: the blas wrapper has no scales (intel-analytics#2778)

* fix softmax (intel-analytics#2777)

* fix: performance regression on resnet50 (intel-analytics#2774)

the u8 to s8 or s8 to u8 needs no reorder on this case.

* fix log init (#2781)

* fix: dropout should init primitive (#2789)

* Docs update for spark 2.3, build 0.7 and deps exlude (intel-analytics#2671)

* flip to 0.9.0 (intel-analytics#2792)

* Improve Layer documentation v1 (#2767)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Loss Function docs improvement v1

* Improve Loss Function docs v2

* Improve Layers documentation

* Improve documentation on Activations

* minor fix

* Update a code section with python style on Metrics.md (intel-analytics#2665)

* [Fix] doc : some changes for scalaUserGuide and release links according to … (intel-analytics#2791)

* doc : some changes for scalaUserGuide and release links according to v0.8.0 release

* Update build-bigdl-core.md

* Update build-bigdl-core.md

* test: should compare the right grad input (intel-analytics#2794)

* fix the wrong error message (#2800)

* [New feature] Add attention layer and ffn layer (intel-analytics#2795)

* add attention layer

* add ffn layer and more unit tests

* refactor according to pr comments

* add SerializationTest

* fix unit tests

* add python api

* update readme with newly adopted mkl-dnn (#2803)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer (intel-analytics#2802)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer:
Add LARS optimizer: Layer-wise scaled. Also with utility functions to build a set of LARS optim for a container.

Bug fix: The gradient block id of AllReduceParameter is originally composed of {id}{pidTo}gradientBytes{pidFrom}. But the combination of {id}{pidTo} will cause ambiguity. e.g., "112" can be {1}{12} or {11}{2}. Now a "_" is added to separate id from pidTo

* refine documents, correctly set the lrSchedulerOwner bit

* format the added code

* make Lars inherit SGD

* rename Lars -> LarsSGD and reformat

* style changes

* bugfix - set mask for container (intel-analytics#2807)

* bugfix - set mask for container

* bugfix #2805: set dimension mask

* Update Graph.scala

* Update Graph.scala

* change set mask indicator's name

* rename set mask params

* [Enhancement]: Scala Reflection: get default value for constructor parameters (intel-analytics#2808)

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* resolve conflict

* resolve conflict

* code style check

* remove print

* fix typos

fix typos

* replace randomcropper with centercrop for better performance (#2818)

* fix: memory data hash code should contain data type (intel-analytics#2821)

* Optimize backward graph generation and CAddTable (intel-analytics#2817)

* Optimize backward graph generation and caddtable

* refine add table

* change api name

* add layer norm and expand size layers (#2819)

* add layer norm and expand size

* meet pr comments

* feat: enable global average pooling (intel-analytics#2823)

* feat: enable global average pooling

* test: add more unit tests

* Optimizers: use member variable in parent class

* Revert "Optimizers: use member variable in parent class"

This reverts commit 7e47204

* Dilation in MKL-DNN Convolution (intel-analytics#2815)

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* fix typos

fix typos

* make todo all uppercase

* fix: calculate arbitrary mask of scales (intel-analytics#2822)

* Use one AllReduceParameter for multi-optim method  training (intel-analytics#2814)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* change random seed in UT

* [New feature] add transformer layer (intel-analytics#2825)

* add transformer

* refactor class name

* use same embedding for translation

* fix pr comments

* [Bug Fix] Fix Issue 2734 (#2816)

* fix issue 2734

* fix issue 2734

* fix issue 2734

* [Refactor] Reflection Utilization (#2831)

* refactor reflection utils

* refactor reflection utils

* feat: MKLDNN LSTM unidirectional/bidirectional inference support (intel-analytics#2806)

* LSTM draft

* MKLDNN LSTM fixed MD

* added hiddenSize

* setMemoryData NativeData

* weights NativeData format set to ldigo, all 1 test passed

* fixed format any problem

* LSTM weights bias initialisation

* add LSTM2 in nn

* Bidirectional LSTM inference enabled

* modified Bidirectional test

* LSTMSpec input format conversion bug between bigdl and mkldnn fixed, not support random weights, bias

* fixed the last problem 1 3 2 4

* Three inference tests with randomly generated parameters

* Added comments and modified the LSTMSpec (tests using Equivalent.nearequals)

* Deleted nn/LSTM2. Renamed methods. Added a requirement in nn/TimeDistributed

* combined initMemoryDescs() into initFwdPrimitives()

* Add require for input size and hidden size matching if layers of LSTM is more than one

* Refactor RNN

* Add comment on gate order to mkldnn/RNN

* Add unidirectional multilayer test

* add comments/ modify UTs

* phase is not used anymore/ use isTraining() in stead

* operationWant enhanced/ weight init/ release() parameters()

* remove input format check and change some variables names

* input format check / throw exception print info / release code

* comment style and RNNSerialTest

* remove unnecessary comments

* Softmax -> SoftMax (#2837)

* bug fix for cmul (intel-analytics#2836)

* bug fix for cmul

* meet pr comments

* set new storage to weight and bias for weight fusion (intel-analytics#2839)

* Add parameter processor for LARS (#2832)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* implement lars whole layer gradient norm calculation

* change random seed in UT

* add limitation on "trust" of LARS, remove debug output

* reformat

* add tests in DirtriOptimizer for LARS

* reformat

* update parameters in UT

* update parameters in UT

* Add transformer to LM example (intel-analytics#2835)

* add transformer to LM example

* refactor dropout in Transformer

* meet pr comments

* feat: MKLDNN LSTM unidirectional/bidirectional backward support (#2840)

* MKLDNN LSTM backward support with accuracy testing

* fix: require consistent between shape and layout of mkldnn (intel-analytics#2824)

* fix: fusion for multi-group of convolution (intel-analytics#2826)

* fix: support int8 of jointable (#2827)

* fix: support int8 of jointable
* doc: add more docs

* fix: invokeAndWait2 should throw the exception in the tasks (intel-analytics#2843)

* fix acc bug & init dnn thread (intel-analytics#2841)

* support tnc and ntc conversion (intel-analytics#2844)

* support ntc in dnn layer (intel-analytics#2847)

* support ntc in dnn layer

* meet pr comments

* [WIP]Add beam search feature in transformer model (intel-analytics#2834)

* add beam search feature

* Update beam search feature and unit test

* add symbolToLogits function set check

* update clearState and add serial test

* add SequenceBeamSearch to python layers

* add createSequenceBeamSearch method to python api

* feat: add a property to disable omp thread affinity (intel-analytics#2849)

* fix: use treeset to calc topk to upgrade the performance of DetectionOutputSSD (intel-analytics#2853)

* fix: wrong affinity settings (intel-analytics#2857)

* update beam search feature for interface with transformer model (#2855)

* update beam search for padding value and cache structure

* update python API for beam search

* add comments and update python layer

* modify comments format

* modify comments format

* Support converting blas lstm to dnn lstm (#2846)

* convert from blas lstm to dnn lstm

* meet pr comments

* fix load lstm error bug (intel-analytics#2858)

* Add beam search in transformer (intel-analytics#2856)

* Add beam search in transformer

* meet pr comments

* fix: upgrade the performance of normalize (intel-analytics#2854)

* feat: add axis to softmax (intel-analytics#2859)

* add release doc for 0.9 (intel-analytics#2862)

* fix: update core ref to master (intel-analytics#2865)

* flip version to 0.10.0 (intel-analytics#2869)

* [Bug Fix] - Fix module version comparison  (intel-analytics#2871)

* update serialization

* update serialization

* convert IRgraph momentum to mkldnn (intel-analytics#2872)

* tutorial fix (intel-analytics#2879)

* feat: RoiAlign Forward (intel-analytics#2874)

* Add set input output format API in Python (intel-analytics#2880)

* add set input output format

* add static graph check

* feat: Feature Pyramid Networks Forward (intel-analytics#2870)

* fix memory leak for ir graph training (intel-analytics#2895)

* add gemm layer (#2882)

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add gemm layer

* add gemm layer

* add Shape layer (intel-analytics#2885)

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add Gather layer (intel-analytics#2897)

* add gather layer

* [New feature] Add maskhead (intel-analytics#2892)

* support for maskhead

* fix unit tests (intel-analytics#2905)

* modify  predict/predictClass function  (#2868)

* predictClass output modification

* predict/predictClass function modification in Beta Api

* predict/predictClass function modification

* predict/predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* [New feature] Add Boxhead (intel-analytics#2894)

* add boxhead

* add SerialTest

* meet pr comments

* fix: Add TopBlocks to Feature Pyramid Networks (FPN) (#2899)

* Add Mean Average Precision validation method (intel-analytics#2906)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* fix boxhead unit tests (#2912)

* python api nested list input and pooler python api (intel-analytics#2900)

* Auto memory management for MKLDNN (#2867)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* style fixes

* change _implicitMemoryOwner -> _this

* [New feature] Add region proposal (intel-analytics#2896)

* add Regionproposal

* [New feature] add maskrcnn (#2908)

* add maskrcnn

* fix mask head

* move maskrcnn to models

* add maskrcnn serialTest

* Add Onnx Supported Layers (intel-analytics#2902)

* remove duplicated layers

* Update RoiLabel class and add RoiImageFeatureToBatch (intel-analytics#2913)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* update RoiLabel, add RoiImageFeatureToBatch

* fix typo in class name

* updates by suggestions

* minor updates

* Move RoiMiniBatch to MTImageFeatureToBatch.scala

* mask in RoiLabel now have Floats not Bytes

* use IndexedSeq for RoiLabel

* style fix

* add isCrowd and origSize to final target table

* style fix

* isCrowd change to float, add doc

* add tests and bug fixes

* add util getting RoiLabels from ImageFeatures

* add util getting RoiLabels from Table

* comment out the tests

* rename utils in RoiLabel

* feat: MKLDNN GRU forward/backward support (#2893)

* Onnx support: modify unsqueeze function (#2910)

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* add maskutils (intel-analytics#2921)

* add maskutils

* update tests & docs

* fix typo in document

* Fix memory leaks on training (intel-analytics#2914)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* fix memory leak in batch norm

* style fixes

* change _implicitMemoryOwner -> _this

* release submat

* release opencv submats

* support samples with different size  to one mini batch (intel-analytics#2929)

* add to batch with resize

* meet comments

* support batch for mask head and pooler (intel-analytics#2926)

* support batch for mask head

* meet comments

* Onnx support: add a dim parameter to ops.Gather (intel-analytics#2920)

* add dim parameter to ops.Gather

* improve and simplify code

* improve and simplify code

* improve and simplify code

* improve and simplify code

* support batch for regionproposal (#2928)

* support batch for regionproposal

* enable gru blas-to-dnn conversion (intel-analytics#2930)

* Onnx support: add pos parameter to softmax (intel-analytics#2933)

* add pos parameter to softmax

* add pos parameter to softmax

* add pos parameter to softmax

* fix review problem

* fix review problem

* Add resize for segmentation (intel-analytics#2923)

* add resize for segmentation

* meet pr comments

* support batch input for boxhead (#2924)

* boxhead support batch input

* meet pr comments

* COCO SeqFile (intel-analytics#2927)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* ignore non-existing images

* updates based on GH comments

* ONNX Support (#2918)

* onnx dev

* add onnx loader

* clean up

* feat: add precision recall auc (#2941)

* feat: add precision recall auc

* add post processing for maskrcnn model (#2931)

* add mask postprocessing

* put image info to mask model

* fix TimeDistributedCriterion() lack of parameter of dimension issue (intel-analytics#2940)

* revert back api (intel-analytics#2943)

* fix: softmax and bn+scale fusion (intel-analytics#2937)

* feat: multi models support with MKL-DNN backend (intel-analytics#2936)

* feat: multi models support with MKL-DNN backend

* add COCO MAP (#2935)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* add COCO MAP

* revert merge conflict

* ignore non-existing images

* add IOU related API. MAP now parses RLEs

* BBox now inclusive

* updates based on GH comments

* add COCODataset.getImageById

* COCO topK default => -1, remove height: Int, width: Int in GroundTruthRLE

* update imageId2Image

* rename MAPObjectDetection utils, add cocoSegmentationAndBBox, refine formatting

* rename utils

* update documents

* check size of bbox & classes & scores & labels & iscrowd. Handle empty predictions

* add gt and target image size checking, add support for empty target bbox, add UT

* detection sorted before matching with GT. Optimize MAPResult merging. Add UT for merging

* COCO Seq file reader: grey to bgr (intel-analytics#2942)

* grey to bgr

* refactor isGrayScaleImage

* simplify grey scale image checking

* Add the flushing denormal values option on BigDL side (#2934)

* add no argument apply api for softmax (intel-analytics#2945)

* add no argument apply api for softmax

* add no argument apply api for softmax

* ONNX ResNet example (intel-analytics#2939)

* add onnx resnet example

* add doc for onnx

* add doc for onnx

* clean up

* add maskrcnn inference example (intel-analytics#2944)

* add maskrcnn inference example

* meet pr comments

* add model download url

* Update the RoiLabel and MTImageFeatureToBatch (intel-analytics#2925)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* Python MKLDNN examples for CNN(LeNet) and RNN(LSTM) (#2932)

* fix: takeSample only works for dnn backend and get one batch (intel-analytics#2947)

* fix: takeSample only works for dnn backend and get one batch

* edit doc (#2948)

* Rename filesToRoiImageFrame to filesToRoiImageFeatures (intel-analytics#2949)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* filesToRoiImageFrame -> filesToRoiImageFeatures, to public

* fix: move out setMklThreads of MklDnn (intel-analytics#2950)

* memory data cleanup (#2956)

* memory data cleanup

* Onnx support: RoiAlign and TopK parameter update (#2957)

* Topk add dim and increase parameter

* RoiAlign add max pooling mode

* add test cases

* add test cases

* remove masks requirements (intel-analytics#2959)

* fix: the squeeze should not be included in IRElement (intel-analytics#2962)

* enhance COCODataset (#2954)

* enhance COCODataset:
Add COCODataset.loadFromSeqFile
Add COCODataset.toImageFeatures
Add COCOImage.toTable

* rename and polish doc

* fix COCO serialize bug

* fix typo in function name

* typo fix (intel-analytics#2965)

* rename RoiImageFeatureToBatch APIs (#2964)

* RoiMiniBatch enhancement (#2953)

* SerializableIndexedSeq

* allow empty target & image size info

* rename RoiImageFeatureToBatch APIs

* set as private

* change back to array

* MTImageFeatureToBatch without labels

* handle iscrowd

* remove duplication in merge

* feat: add softmax backward (intel-analytics#2967)

* feat: add softmax backward

* fix: fuse bn scale and relu to bn. (intel-analytics#2966)

* fix: fuse bn scale and relu.

* fix mask unit tests (intel-analytics#2973)

* fix: nms stability when using treeset. (intel-analytics#2972)

* flip version to 0.11 (intel-analytics#2974)

* refactor anchor generator (#2963)

* refactor anchor generator

* meet pr comments

* fix code style

* ROIAlign refactor (intel-analytics#2960)

* ROIAlign refactor

* fix unit tests

* fix model load of maskrcnn (intel-analytics#2961)

* fix maskrcnn model load

* delete temp file

* fix maskrcnn tests

* support roialign backward (intel-analytics#2975)

* support roialign backward

* fix sparselinear unit test

* fix: bn nhwc error, the channel should be the last dim (#2981)

* refactor: move torch relevants unit tests to integration tests. (intel-analytics#2971)

* fix: enable integration accuracy tests (intel-analytics#2976)

* fix: softmax dnn backend wrong order of primitive (intel-analytics#2986)

* modify TextClassifier.scala (#2987)

* Add a method to merge nested StaticGraphs (intel-analytics#2985)

* NHWC support when running with MKL-DNN (#2989)

* support NHWC for MKLDNN

* fix unit tests

* Keras with MKL-DNN backend support (#2990)

* Update README.md

* Update README.md

* feat: add distri optimizer v2 (intel-analytics#2992)

* update error message in AllReduceParameter (#2997)

* update error message in AllReduceParameter

* use tensorflow proto jar (#2994)

* fix callBigDLFunc (intel-analytics#3002)

* Remove final for AbstractModule (intel-analytics#3001)

* DistriOptimizerV2 argument (intel-analytics#3003)

* call DistriOptimizerV2

* fix inception (intel-analytics#3010)

* fix top1 and treenn (intel-analytics#3011)

* remove final setExtraParameters (#3014)

* move pretrain in DistriOptimizerV2 (intel-analytics#3016)

* move getData

* rename

* remove time counting

* deprecate dlframe (intel-analytics#3012)

* deprecate dlframe

* fix throughput (#3017)

* fix throughput

* update

* add release doc for 0.10.0 (intel-analytics#3020)

* test examples by distrioptimizerv2 (intel-analytics#3007)

* enable scala examples by distrioptimizerv2

* update example's readme

* update integration test

* test python examples by distriOptimizerV2 (intel-analytics#3008)

* Test python examples by distriOptimizerV2

* deprecate nn.keras (intel-analytics#3013)

* deprecate nn.keras

* fix loss when minibatch size is different (intel-analytics#3021)

* fix loss

* fix ut

* fix style check (intel-analytics#3022)

* specify pyspark version (intel-analytics#3030)

* specify pyspark version

* add release doc for 0.11 (#3026)

* flip version to 0.12 (intel-analytics#3029)



* update

* fix KerasLayer new parameters() (#3034)

* Fix analytics zoo protobuf shading problem (intel-analytics#3033)

* change shade name and remove protobuf-java (already introduced by tf)

* remove protobuf

* add required dependencies (#3047)

* update doc (intel-analytics#3056)

* Updatedoc (#3060)

* Update install-from-pip.md

* [WIP] spark 3.0 (intel-analytics#3054)

* spark 3.0

* add spark3.0 deployment (intel-analytics#3061)

* add spark3.0 deployment

* add warning to remind Optimizer() deprecates (intel-analytics#3062)

* add warning to remind deprecates

* Update scala maven plugin (#3068)

* update scala maven plugin

* change to public (#3064)

* Add big model support (#3067)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* squeeze target dimension (corner case) in ClassNLLCriterion (intel-analytics#3072)

* fix target dimension match error

* update message (#3073)

* flip version to 0.13-snapshot (intel-analytics#3074)

* flip version to 0.13-snapshot

* Uncompressed Tensor  (intel-analytics#3079)

* support no compressing parameter

* address comments

* hotfix ClassNLLCriterion with cloned target (#3081)

* hotfix ClassNLLCriterion with cloned target

* Fix SerializationUtils clone issue of QuantizedTensor (intel-analytics#3088)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* update clone quantizedtensor

* update

* add OptimPredictorShutdownSpec UT in integration test (#3089)

* move integration UT to a general test script (intel-analytics#3094)

* back port master (intel-analytics#3096)

* set seed to avoid random error in PredictionServiceUT (intel-analytics#3097)

* Jdk11 support (intel-analytics#3098)

* update for jdk 11 support and doc

* add serializeUid (intel-analytics#3099)

* update doc (intel-analytics#3104)

* add doc for running in ide (intel-analytics#3106)

* fix callBigDLFunc return a Int while the true return value from java is a byte array. (intel-analytics#3111)

* add list of df support (intel-analytics#3113)

* Update readme (intel-analytics#3118)

* Update index.md

* add 0.12.2 release download (#3122)

* remove DLFrames (intel-analytics#3124)

* remove DLFrames

* update

* update

* update

* rm dlframe example from test script

* Add Utest about dividing zero (#3128)

* Add Utest about dividing zero

* add Utest and zero check of LocalData

* add Utest and zero check of LocalData

* change

* Add Utest about dividing zero

* fix test

* add python3 to Dockerfile (intel-analytics#3132)

* add python3 to Dockerfile

* update

* update jdk

* update

* make default DistriOptimizer as V2 (intel-analytics#3129)

* make default DistriOptimizer as V2

* update

* fix dlframe (intel-analytics#3133)

* DistriOptimizerV2 logger (intel-analytics#3135)

* DistriOptimizerV2 logger

* update

* fix style check

* validate epoch num

* move dlframe SharedParamsApater to AZ and roll back to OptimizerV1 (intel-analytics#3137)

* upgrade spark version (intel-analytics#3138)

* Update deploy-spark2.sh

* 0.13 release doc (#3144)

* upgrade log4j (intel-analytics#3141)

* flip0.14 (intel-analytics#3142)

* flip0.14

* update

* Update deploy-spark3.sh (#3145)

* update

* update

* update

* update

* fix make dist

* migrate path

* update

* update

Co-authored-by: zhangxiaoli73 <380761639@qq.com>
Co-authored-by: Jerry Wu <wzhongyuan@gmail.com>
Co-authored-by: Xin Qiu <qiuxin2012@users.noreply.github.com>
Co-authored-by: Yanzhang Wang <i8run15@gmail.com>
Co-authored-by: GenBrg <34305977+GenBrg@users.noreply.github.com>
Co-authored-by: LeicongLi <leicongli@gmail.com>
Co-authored-by: Emiliano Martinez <emimartinez.sanchez@gmail.com>
Co-authored-by: abdolence <abdulla.abd.m@gmail.com>
Co-authored-by: Enrique Garcia <engapa@gmail.com>
Co-authored-by: Louie Tsai <louie.tsai@intel.com>
Co-authored-by: yaochi <yaochitc@gmail.com>
Co-authored-by: Menooker <Menooker@users.noreply.github.com>
Co-authored-by: Menooker <myjisgreat@live.cn>
Co-authored-by: Firecrackerxox <mengceng.he@intel.com>
Co-authored-by: majing921201 <1834475657@qq.com>
Co-authored-by: jenniew <jenniewang123@gmail.com>
Co-authored-by: Xiao <lingxiao1989@gmail.com>
Co-authored-by: Firecrackerxox <he044646@sina.com>
Co-authored-by: Hui Li <lihuibinghan@sina.com>
Co-authored-by: Jason Dai <jason.dai@intel.com>
Co-authored-by: dding3 <ding.ding@intel.com>
Co-authored-by: Yang Wang <yang3.wang@intel.com>
Co-authored-by: Yina Chen <33650826+cyita@users.noreply.github.com>
Co-authored-by: Hangrui Cao <50705298+DiegoCao@users.noreply.github.com>
Co-authored-by: pinggao18 <44043817+pinggao18@users.noreply.github.com>
Le-Zheng added a commit to Le-Zheng/analytics-zoo that referenced this pull request Sep 14, 2021
* convert static graph to IR graph and build (intel-analytics#2711)

* add static graph to IR graph

* meet pr comments

* [Enhancement] - Enhance unig test to avoid dynamic resource allocation issue by docker (intel-analytics#2713)

* make the core number fixed

* fix local predictor

* add Trigger and/or python API (intel-analytics#2682)

* add spark 2.4 support (intel-analytics#2715)

* update sparse tensor's document (#2714)

* Reserve all state in OptimMethod when calling Optimizer.optimize() multiple times (#2648)

* reserve optimMethod for each worker

* add valdiation throughput

* cache variable previousOptim

* fix: move mkldnn computing to a single thread pool (intel-analytics#2724)

Because if we use the parent thread directly, there will be two bugs,
1. The child threads forked from parent thread will be bound to core 0
because of the affinity settings.
2. The native thread has some unknown thread local variables. So if
the parent thread exits and is recreated, such as the thread from
Executors.newFixedThreadPool. The whole app will be segment fault.
The parent thread means the main thread (Local Mode) or worker thread of
mapPartition (Distributed Mode).

* add ceilMode for Pooling & fix batchNorm evaluate (#2708)

* add ceilMode for Pooling & fix batchNorm evaluate

* add training status for dnn layer

* fix comments

* fix IRGraph init & Add regualizer (#2736)

* fix IRGraph init & Add regualizer

* meet review comments

* fix: update mkldnn version to v0.17 issues. (intel-analytics#2712)

There're two issues,

1. the padding tensor required. mkl-dnn will use a padding tensor which
    will use more memory, such as 4x1x28x28 to 4x8x28x28(avx2). It will
    pad to times of simd width.
2. the TensorMMap between DenseTensor and DnnTensor. Previous impl
    will allocate DnnTensor when model is created, which will cost too much
    space. So this patch will allocate it at runtime.

* add computshape for some layers and add skip primitives in DnnGraph (intel-analytics#2740)

* add computshape for some layer and add skip primitives in DnnGraph

* meet pr comments

* Improve documentation (intel-analytics#2745)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* include edge case to cover all the data types (#2742)

* layer auto fusion for dnn graph (intel-analytics#2746)

* add auto fusion in dnn graph

* refactor predict for dnn model (intel-analytics#2737)

* refactor predict for dnn model

* remove some unit tests (intel-analytics#2752)

* remove some conflict tests (#2753)

* Update documentation (intel-analytics#2749)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Fix Add operation error when type is Double importing Tensorflow graph (#2721)

* feature: add byte supports for DnnTensor (intel-analytics#2751)

* feat: add byte supports for DnnTensor

* [New Feature] Calculating Scales (#2750)

* [New Feature]Calculating Scales

* recursively update mask for container module (intel-analytics#2754)

* recursively update mask for container module

* [Enhancement] - Speed up BlasWrapper performance under MKL-DNN (intel-analytics#2748)

* add parallel in Blaswrapper

* refactor to support ssd

* meet pr comments

* fix logger serialize

* Loss Function docs improvement (intel-analytics#2757)

* Improve Loss Function docs v2

* change asInstanceOf to toDistirbuted in optimizer (#2755)

* change asInstanceOf to toDistirbuted

* change asInstanceOf to toDistirbuted

* convert scale in blas to dnn (#2758)

* convert scale in blas to dnn

* meet pr comment

* feat: reorder for int8 supports (#2756)

1. Because the new data type, we should add a new attribute called dataType
    to the `MemoryData`.
2. Because we should transfer the scales between FP32->int8 and Int8->FP32.
    we should add two new attributes called `mask` and `scales`.

* fix conversion accuracy (intel-analytics#2760)

*  fix accuracy for saved model

* exclude mkldnn model when conversion

* feature: layer wise supports of int8 (intel-analytics#2762)

Enable the int8 data type in layers, especially for convolutions.
So for a specific layer, it can accept a int8 input. If you want to the fp32
output, should add a reorder.

* feature: mkldnn int8 layer wise supports (intel-analytics#2759)

including 3 steps.

1. generate scales of model.
   need an api like `generateScalesWithMask` to generate the scales of
   fp32 model. and the model returned is an fp32 model too.
2. quantize the model
   the `quantize()` api will be compatible with the `bigquant`
   backend, which will set the quantize flag. And when doing compile,
   the quantized weight, output, input will be generated by mkldnn at
   runtime.
3. do the inference (forward).

* update readme for v1 training (intel-analytics#2763)

* update release doc for preparation (intel-analytics#2764)

* change some docs about mkldnn (intel-analytics#2765)

* add comments about mkldnn

* meet pr comments

* examples for int8 (intel-analytics#2761)

This is an example of how to use mkldnn int8. There're two steps, use
GenInt8Scales to generate the scales first and save the new model. And than you
can use the quantized model as usual.

* enable fustion by default (intel-analytics#2766)

* fix: the influence of default value of fusion (#2768)

* fix: use too much memory of mkldnn models (intel-analytics#2783)

* fix: inplace of input/output and weight dimension error (intel-analytics#2779)

Some layer's input and output use the same memory. We can't do forward in the
`calcScales`. Because at that time, the input has been changed, its scales maybe
not right. Such as,

Seqeuntail().add(Conv).add(ReLU)

it will do two steps, seq.forward(input) first. and when go into the ReLU, it
will do another forward, so the input will be the output. And scales will be
wrong.

For convolution's weight, the dimension always is 5, although the group number
is 1. But for dnn convolution, if there's no group, the weight's dimension
should be 4.

* fix: the blas wrapper has no scales (intel-analytics#2778)

* fix softmax (intel-analytics#2777)

* fix: performance regression on resnet50 (intel-analytics#2774)

the u8 to s8 or s8 to u8 needs no reorder on this case.

* fix log init (#2781)

* fix: dropout should init primitive (#2789)

* Docs update for spark 2.3, build 0.7 and deps exlude (intel-analytics#2671)

* flip to 0.9.0 (intel-analytics#2792)

* Improve Layer documentation v1 (#2767)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Loss Function docs improvement v1

* Improve Loss Function docs v2

* Improve Layers documentation

* Improve documentation on Activations

* minor fix

* Update a code section with python style on Metrics.md (intel-analytics#2665)

* [Fix] doc : some changes for scalaUserGuide and release links according to … (intel-analytics#2791)

* doc : some changes for scalaUserGuide and release links according to v0.8.0 release

* Update build-bigdl-core.md

* Update build-bigdl-core.md

* test: should compare the right grad input (intel-analytics#2794)

* fix the wrong error message (#2800)

* [New feature] Add attention layer and ffn layer (intel-analytics#2795)

* add attention layer

* add ffn layer and more unit tests

* refactor according to pr comments

* add SerializationTest

* fix unit tests

* add python api

* update readme with newly adopted mkl-dnn (#2803)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer (intel-analytics#2802)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer:
Add LARS optimizer: Layer-wise scaled. Also with utility functions to build a set of LARS optim for a container.

Bug fix: The gradient block id of AllReduceParameter is originally composed of {id}{pidTo}gradientBytes{pidFrom}. But the combination of {id}{pidTo} will cause ambiguity. e.g., "112" can be {1}{12} or {11}{2}. Now a "_" is added to separate id from pidTo

* refine documents, correctly set the lrSchedulerOwner bit

* format the added code

* make Lars inherit SGD

* rename Lars -> LarsSGD and reformat

* style changes

* bugfix - set mask for container (intel-analytics#2807)

* bugfix - set mask for container

* bugfix #2805: set dimension mask

* Update Graph.scala

* Update Graph.scala

* change set mask indicator's name

* rename set mask params

* [Enhancement]: Scala Reflection: get default value for constructor parameters (intel-analytics#2808)

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* resolve conflict

* resolve conflict

* code style check

* remove print

* fix typos

fix typos

* replace randomcropper with centercrop for better performance (#2818)

* fix: memory data hash code should contain data type (intel-analytics#2821)

* Optimize backward graph generation and CAddTable (intel-analytics#2817)

* Optimize backward graph generation and caddtable

* refine add table

* change api name

* add layer norm and expand size layers (#2819)

* add layer norm and expand size

* meet pr comments

* feat: enable global average pooling (intel-analytics#2823)

* feat: enable global average pooling

* test: add more unit tests

* Optimizers: use member variable in parent class

* Revert "Optimizers: use member variable in parent class"

This reverts commit 7e47204

* Dilation in MKL-DNN Convolution (intel-analytics#2815)

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* fix typos

fix typos

* make todo all uppercase

* fix: calculate arbitrary mask of scales (intel-analytics#2822)

* Use one AllReduceParameter for multi-optim method  training (intel-analytics#2814)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* change random seed in UT

* [New feature] add transformer layer (intel-analytics#2825)

* add transformer

* refactor class name

* use same embedding for translation

* fix pr comments

* [Bug Fix] Fix Issue 2734 (#2816)

* fix issue 2734

* fix issue 2734

* fix issue 2734

* [Refactor] Reflection Utilization (#2831)

* refactor reflection utils

* refactor reflection utils

* feat: MKLDNN LSTM unidirectional/bidirectional inference support (intel-analytics#2806)

* LSTM draft

* MKLDNN LSTM fixed MD

* added hiddenSize

* setMemoryData NativeData

* weights NativeData format set to ldigo, all 1 test passed

* fixed format any problem

* LSTM weights bias initialisation

* add LSTM2 in nn

* Bidirectional LSTM inference enabled

* modified Bidirectional test

* LSTMSpec input format conversion bug between bigdl and mkldnn fixed, not support random weights, bias

* fixed the last problem 1 3 2 4

* Three inference tests with randomly generated parameters

* Added comments and modified the LSTMSpec (tests using Equivalent.nearequals)

* Deleted nn/LSTM2. Renamed methods. Added a requirement in nn/TimeDistributed

* combined initMemoryDescs() into initFwdPrimitives()

* Add require for input size and hidden size matching if layers of LSTM is more than one

* Refactor RNN

* Add comment on gate order to mkldnn/RNN

* Add unidirectional multilayer test

* add comments/ modify UTs

* phase is not used anymore/ use isTraining() in stead

* operationWant enhanced/ weight init/ release() parameters()

* remove input format check and change some variables names

* input format check / throw exception print info / release code

* comment style and RNNSerialTest

* remove unnecessary comments

* Softmax -> SoftMax (#2837)

* bug fix for cmul (intel-analytics#2836)

* bug fix for cmul

* meet pr comments

* set new storage to weight and bias for weight fusion (intel-analytics#2839)

* Add parameter processor for LARS (#2832)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* implement lars whole layer gradient norm calculation

* change random seed in UT

* add limitation on "trust" of LARS, remove debug output

* reformat

* add tests in DirtriOptimizer for LARS

* reformat

* update parameters in UT

* update parameters in UT

* Add transformer to LM example (intel-analytics#2835)

* add transformer to LM example

* refactor dropout in Transformer

* meet pr comments

* feat: MKLDNN LSTM unidirectional/bidirectional backward support (#2840)

* MKLDNN LSTM backward support with accuracy testing

* fix: require consistent between shape and layout of mkldnn (intel-analytics#2824)

* fix: fusion for multi-group of convolution (intel-analytics#2826)

* fix: support int8 of jointable (#2827)

* fix: support int8 of jointable
* doc: add more docs

* fix: invokeAndWait2 should throw the exception in the tasks (intel-analytics#2843)

* fix acc bug & init dnn thread (intel-analytics#2841)

* support tnc and ntc conversion (intel-analytics#2844)

* support ntc in dnn layer (intel-analytics#2847)

* support ntc in dnn layer

* meet pr comments

* [WIP]Add beam search feature in transformer model (intel-analytics#2834)

* add beam search feature

* Update beam search feature and unit test

* add symbolToLogits function set check

* update clearState and add serial test

* add SequenceBeamSearch to python layers

* add createSequenceBeamSearch method to python api

* feat: add a property to disable omp thread affinity (intel-analytics#2849)

* fix: use treeset to calc topk to upgrade the performance of DetectionOutputSSD (intel-analytics#2853)

* fix: wrong affinity settings (intel-analytics#2857)

* update beam search feature for interface with transformer model (#2855)

* update beam search for padding value and cache structure

* update python API for beam search

* add comments and update python layer

* modify comments format

* modify comments format

* Support converting blas lstm to dnn lstm (#2846)

* convert from blas lstm to dnn lstm

* meet pr comments

* fix load lstm error bug (intel-analytics#2858)

* Add beam search in transformer (intel-analytics#2856)

* Add beam search in transformer

* meet pr comments

* fix: upgrade the performance of normalize (intel-analytics#2854)

* feat: add axis to softmax (intel-analytics#2859)

* add release doc for 0.9 (intel-analytics#2862)

* fix: update core ref to master (intel-analytics#2865)

* flip version to 0.10.0 (intel-analytics#2869)

* [Bug Fix] - Fix module version comparison  (intel-analytics#2871)

* update serialization

* update serialization

* convert IRgraph momentum to mkldnn (intel-analytics#2872)

* tutorial fix (intel-analytics#2879)

* feat: RoiAlign Forward (intel-analytics#2874)

* Add set input output format API in Python (intel-analytics#2880)

* add set input output format

* add static graph check

* feat: Feature Pyramid Networks Forward (intel-analytics#2870)

* fix memory leak for ir graph training (intel-analytics#2895)

* add gemm layer (#2882)

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add gemm layer

* add gemm layer

* add Shape layer (intel-analytics#2885)

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add Gather layer (intel-analytics#2897)

* add gather layer

* [New feature] Add maskhead (intel-analytics#2892)

* support for maskhead

* fix unit tests (intel-analytics#2905)

* modify  predict/predictClass function  (#2868)

* predictClass output modification

* predict/predictClass function modification in Beta Api

* predict/predictClass function modification

* predict/predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* [New feature] Add Boxhead (intel-analytics#2894)

* add boxhead

* add SerialTest

* meet pr comments

* fix: Add TopBlocks to Feature Pyramid Networks (FPN) (#2899)

* Add Mean Average Precision validation method (intel-analytics#2906)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* fix boxhead unit tests (#2912)

* python api nested list input and pooler python api (intel-analytics#2900)

* Auto memory management for MKLDNN (#2867)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* style fixes

* change _implicitMemoryOwner -> _this

* [New feature] Add region proposal (intel-analytics#2896)

* add Regionproposal

* [New feature] add maskrcnn (#2908)

* add maskrcnn

* fix mask head

* move maskrcnn to models

* add maskrcnn serialTest

* Add Onnx Supported Layers (intel-analytics#2902)

* remove duplicated layers

* Update RoiLabel class and add RoiImageFeatureToBatch (intel-analytics#2913)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* update RoiLabel, add RoiImageFeatureToBatch

* fix typo in class name

* updates by suggestions

* minor updates

* Move RoiMiniBatch to MTImageFeatureToBatch.scala

* mask in RoiLabel now have Floats not Bytes

* use IndexedSeq for RoiLabel

* style fix

* add isCrowd and origSize to final target table

* style fix

* isCrowd change to float, add doc

* add tests and bug fixes

* add util getting RoiLabels from ImageFeatures

* add util getting RoiLabels from Table

* comment out the tests

* rename utils in RoiLabel

* feat: MKLDNN GRU forward/backward support (#2893)

* Onnx support: modify unsqueeze function (#2910)

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* add maskutils (intel-analytics#2921)

* add maskutils

* update tests & docs

* fix typo in document

* Fix memory leaks on training (intel-analytics#2914)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* fix memory leak in batch norm

* style fixes

* change _implicitMemoryOwner -> _this

* release submat

* release opencv submats

* support samples with different size  to one mini batch (intel-analytics#2929)

* add to batch with resize

* meet comments

* support batch for mask head and pooler (intel-analytics#2926)

* support batch for mask head

* meet comments

* Onnx support: add a dim parameter to ops.Gather (intel-analytics#2920)

* add dim parameter to ops.Gather

* improve and simplify code

* improve and simplify code

* improve and simplify code

* improve and simplify code

* support batch for regionproposal (#2928)

* support batch for regionproposal

* enable gru blas-to-dnn conversion (intel-analytics#2930)

* Onnx support: add pos parameter to softmax (intel-analytics#2933)

* add pos parameter to softmax

* add pos parameter to softmax

* add pos parameter to softmax

* fix review problem

* fix review problem

* Add resize for segmentation (intel-analytics#2923)

* add resize for segmentation

* meet pr comments

* support batch input for boxhead (#2924)

* boxhead support batch input

* meet pr comments

* COCO SeqFile (intel-analytics#2927)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* ignore non-existing images

* updates based on GH comments

* ONNX Support (#2918)

* onnx dev

* add onnx loader

* clean up

* feat: add precision recall auc (#2941)

* feat: add precision recall auc

* add post processing for maskrcnn model (#2931)

* add mask postprocessing

* put image info to mask model

* fix TimeDistributedCriterion() lack of parameter of dimension issue (intel-analytics#2940)

* revert back api (intel-analytics#2943)

* fix: softmax and bn+scale fusion (intel-analytics#2937)

* feat: multi models support with MKL-DNN backend (intel-analytics#2936)

* feat: multi models support with MKL-DNN backend

* add COCO MAP (#2935)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* add COCO MAP

* revert merge conflict

* ignore non-existing images

* add IOU related API. MAP now parses RLEs

* BBox now inclusive

* updates based on GH comments

* add COCODataset.getImageById

* COCO topK default => -1, remove height: Int, width: Int in GroundTruthRLE

* update imageId2Image

* rename MAPObjectDetection utils, add cocoSegmentationAndBBox, refine formatting

* rename utils

* update documents

* check size of bbox & classes & scores & labels & iscrowd. Handle empty predictions

* add gt and target image size checking, add support for empty target bbox, add UT

* detection sorted before matching with GT. Optimize MAPResult merging. Add UT for merging

* COCO Seq file reader: grey to bgr (intel-analytics#2942)

* grey to bgr

* refactor isGrayScaleImage

* simplify grey scale image checking

* Add the flushing denormal values option on BigDL side (#2934)

* add no argument apply api for softmax (intel-analytics#2945)

* add no argument apply api for softmax

* add no argument apply api for softmax

* ONNX ResNet example (intel-analytics#2939)

* add onnx resnet example

* add doc for onnx

* add doc for onnx

* clean up

* add maskrcnn inference example (intel-analytics#2944)

* add maskrcnn inference example

* meet pr comments

* add model download url

* Update the RoiLabel and MTImageFeatureToBatch (intel-analytics#2925)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* Python MKLDNN examples for CNN(LeNet) and RNN(LSTM) (#2932)

* fix: takeSample only works for dnn backend and get one batch (intel-analytics#2947)

* fix: takeSample only works for dnn backend and get one batch

* edit doc (#2948)

* Rename filesToRoiImageFrame to filesToRoiImageFeatures (intel-analytics#2949)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* filesToRoiImageFrame -> filesToRoiImageFeatures, to public

* fix: move out setMklThreads of MklDnn (intel-analytics#2950)

* memory data cleanup (#2956)

* memory data cleanup

* Onnx support: RoiAlign and TopK parameter update (#2957)

* Topk add dim and increase parameter

* RoiAlign add max pooling mode

* add test cases

* add test cases

* remove masks requirements (intel-analytics#2959)

* fix: the squeeze should not be included in IRElement (intel-analytics#2962)

* enhance COCODataset (#2954)

* enhance COCODataset:
Add COCODataset.loadFromSeqFile
Add COCODataset.toImageFeatures
Add COCOImage.toTable

* rename and polish doc

* fix COCO serialize bug

* fix typo in function name

* typo fix (intel-analytics#2965)

* rename RoiImageFeatureToBatch APIs (#2964)

* RoiMiniBatch enhancement (#2953)

* SerializableIndexedSeq

* allow empty target & image size info

* rename RoiImageFeatureToBatch APIs

* set as private

* change back to array

* MTImageFeatureToBatch without labels

* handle iscrowd

* remove duplication in merge

* feat: add softmax backward (intel-analytics#2967)

* feat: add softmax backward

* fix: fuse bn scale and relu to bn. (intel-analytics#2966)

* fix: fuse bn scale and relu.

* fix mask unit tests (intel-analytics#2973)

* fix: nms stability when using treeset. (intel-analytics#2972)

* flip version to 0.11 (intel-analytics#2974)

* refactor anchor generator (#2963)

* refactor anchor generator

* meet pr comments

* fix code style

* ROIAlign refactor (intel-analytics#2960)

* ROIAlign refactor

* fix unit tests

* fix model load of maskrcnn (intel-analytics#2961)

* fix maskrcnn model load

* delete temp file

* fix maskrcnn tests

* support roialign backward (intel-analytics#2975)

* support roialign backward

* fix sparselinear unit test

* fix: bn nhwc error, the channel should be the last dim (#2981)

* refactor: move torch relevants unit tests to integration tests. (intel-analytics#2971)

* fix: enable integration accuracy tests (intel-analytics#2976)

* fix: softmax dnn backend wrong order of primitive (intel-analytics#2986)

* modify TextClassifier.scala (#2987)

* Add a method to merge nested StaticGraphs (intel-analytics#2985)

* NHWC support when running with MKL-DNN (#2989)

* support NHWC for MKLDNN

* fix unit tests

* Keras with MKL-DNN backend support (#2990)

* Update README.md

* Update README.md

* feat: add distri optimizer v2 (intel-analytics#2992)

* update error message in AllReduceParameter (#2997)

* update error message in AllReduceParameter

* use tensorflow proto jar (#2994)

* fix callBigDLFunc (intel-analytics#3002)

* Remove final for AbstractModule (intel-analytics#3001)

* DistriOptimizerV2 argument (intel-analytics#3003)

* call DistriOptimizerV2

* fix inception (intel-analytics#3010)

* fix top1 and treenn (intel-analytics#3011)

* remove final setExtraParameters (#3014)

* move pretrain in DistriOptimizerV2 (intel-analytics#3016)

* move getData

* rename

* remove time counting

* deprecate dlframe (intel-analytics#3012)

* deprecate dlframe

* fix throughput (#3017)

* fix throughput

* update

* add release doc for 0.10.0 (intel-analytics#3020)

* test examples by distrioptimizerv2 (intel-analytics#3007)

* enable scala examples by distrioptimizerv2

* update example's readme

* update integration test

* test python examples by distriOptimizerV2 (intel-analytics#3008)

* Test python examples by distriOptimizerV2

* deprecate nn.keras (intel-analytics#3013)

* deprecate nn.keras

* fix loss when minibatch size is different (intel-analytics#3021)

* fix loss

* fix ut

* fix style check (intel-analytics#3022)

* specify pyspark version (intel-analytics#3030)

* specify pyspark version

* add release doc for 0.11 (#3026)

* flip version to 0.12 (intel-analytics#3029)



* update

* fix KerasLayer new parameters() (#3034)

* Fix analytics zoo protobuf shading problem (intel-analytics#3033)

* change shade name and remove protobuf-java (already introduced by tf)

* remove protobuf

* add required dependencies (#3047)

* update doc (intel-analytics#3056)

* Updatedoc (#3060)

* Update install-from-pip.md

* [WIP] spark 3.0 (intel-analytics#3054)

* spark 3.0

* add spark3.0 deployment (intel-analytics#3061)

* add spark3.0 deployment

* add warning to remind Optimizer() deprecates (intel-analytics#3062)

* add warning to remind deprecates

* Update scala maven plugin (#3068)

* update scala maven plugin

* change to public (#3064)

* Add big model support (#3067)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* squeeze target dimension (corner case) in ClassNLLCriterion (intel-analytics#3072)

* fix target dimension match error

* update message (#3073)

* flip version to 0.13-snapshot (intel-analytics#3074)

* flip version to 0.13-snapshot

* Uncompressed Tensor  (intel-analytics#3079)

* support no compressing parameter

* address comments

* hotfix ClassNLLCriterion with cloned target (#3081)

* hotfix ClassNLLCriterion with cloned target

* Fix SerializationUtils clone issue of QuantizedTensor (intel-analytics#3088)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* update clone quantizedtensor

* update

* add OptimPredictorShutdownSpec UT in integration test (#3089)

* move integration UT to a general test script (intel-analytics#3094)

* back port master (intel-analytics#3096)

* set seed to avoid random error in PredictionServiceUT (intel-analytics#3097)

* Jdk11 support (intel-analytics#3098)

* update for jdk 11 support and doc

* add serializeUid (intel-analytics#3099)

* update doc (intel-analytics#3104)

* add doc for running in ide (intel-analytics#3106)

* fix callBigDLFunc return a Int while the true return value from java is a byte array. (intel-analytics#3111)

* add list of df support (intel-analytics#3113)

* Update readme (intel-analytics#3118)

* Update index.md

* add 0.12.2 release download (#3122)

* remove DLFrames (intel-analytics#3124)

* remove DLFrames

* update

* update

* update

* rm dlframe example from test script

* Add Utest about dividing zero (#3128)

* Add Utest about dividing zero

* add Utest and zero check of LocalData

* add Utest and zero check of LocalData

* change

* Add Utest about dividing zero

* fix test

* add python3 to Dockerfile (intel-analytics#3132)

* add python3 to Dockerfile

* update

* update jdk

* update

* make default DistriOptimizer as V2 (intel-analytics#3129)

* make default DistriOptimizer as V2

* update

* fix dlframe (intel-analytics#3133)

* DistriOptimizerV2 logger (intel-analytics#3135)

* DistriOptimizerV2 logger

* update

* fix style check

* validate epoch num

* move dlframe SharedParamsApater to AZ and roll back to OptimizerV1 (intel-analytics#3137)

* upgrade spark version (intel-analytics#3138)

* Update deploy-spark2.sh

* 0.13 release doc (#3144)

* upgrade log4j (intel-analytics#3141)

* flip0.14 (intel-analytics#3142)

* flip0.14

* update

* Update deploy-spark3.sh (#3145)

* update

* update

* update

* update

* fix make dist

* migrate path

* update

* update

Co-authored-by: zhangxiaoli73 <380761639@qq.com>
Co-authored-by: Jerry Wu <wzhongyuan@gmail.com>
Co-authored-by: Xin Qiu <qiuxin2012@users.noreply.github.com>
Co-authored-by: Yanzhang Wang <i8run15@gmail.com>
Co-authored-by: GenBrg <34305977+GenBrg@users.noreply.github.com>
Co-authored-by: LeicongLi <leicongli@gmail.com>
Co-authored-by: Emiliano Martinez <emimartinez.sanchez@gmail.com>
Co-authored-by: abdolence <abdulla.abd.m@gmail.com>
Co-authored-by: Enrique Garcia <engapa@gmail.com>
Co-authored-by: Louie Tsai <louie.tsai@intel.com>
Co-authored-by: yaochi <yaochitc@gmail.com>
Co-authored-by: Menooker <Menooker@users.noreply.github.com>
Co-authored-by: Menooker <myjisgreat@live.cn>
Co-authored-by: Firecrackerxox <mengceng.he@intel.com>
Co-authored-by: majing921201 <1834475657@qq.com>
Co-authored-by: jenniew <jenniewang123@gmail.com>
Co-authored-by: Xiao <lingxiao1989@gmail.com>
Co-authored-by: Firecrackerxox <he044646@sina.com>
Co-authored-by: Hui Li <lihuibinghan@sina.com>
Co-authored-by: Jason Dai <jason.dai@intel.com>
Co-authored-by: dding3 <ding.ding@intel.com>
Co-authored-by: Yang Wang <yang3.wang@intel.com>
Co-authored-by: Yina Chen <33650826+cyita@users.noreply.github.com>
Co-authored-by: Hangrui Cao <50705298+DiegoCao@users.noreply.github.com>
Co-authored-by: pinggao18 <44043817+pinggao18@users.noreply.github.com>
Le-Zheng added a commit to Le-Zheng/analytics-zoo that referenced this pull request Sep 17, 2021
* convert static graph to IR graph and build (intel-analytics#2711)

* add static graph to IR graph

* meet pr comments

* [Enhancement] - Enhance unig test to avoid dynamic resource allocation issue by docker (intel-analytics#2713)

* make the core number fixed

* fix local predictor

* add Trigger and/or python API (intel-analytics#2682)

* add spark 2.4 support (intel-analytics#2715)

* update sparse tensor's document (#2714)

* Reserve all state in OptimMethod when calling Optimizer.optimize() multiple times (#2648)

* reserve optimMethod for each worker

* add valdiation throughput

* cache variable previousOptim

* fix: move mkldnn computing to a single thread pool (intel-analytics#2724)

Because if we use the parent thread directly, there will be two bugs,
1. The child threads forked from parent thread will be bound to core 0
because of the affinity settings.
2. The native thread has some unknown thread local variables. So if
the parent thread exits and is recreated, such as the thread from
Executors.newFixedThreadPool. The whole app will be segment fault.
The parent thread means the main thread (Local Mode) or worker thread of
mapPartition (Distributed Mode).

* add ceilMode for Pooling & fix batchNorm evaluate (#2708)

* add ceilMode for Pooling & fix batchNorm evaluate

* add training status for dnn layer

* fix comments

* fix IRGraph init & Add regualizer (#2736)

* fix IRGraph init & Add regualizer

* meet review comments

* fix: update mkldnn version to v0.17 issues. (intel-analytics#2712)

There're two issues,

1. the padding tensor required. mkl-dnn will use a padding tensor which
    will use more memory, such as 4x1x28x28 to 4x8x28x28(avx2). It will
    pad to times of simd width.
2. the TensorMMap between DenseTensor and DnnTensor. Previous impl
    will allocate DnnTensor when model is created, which will cost too much
    space. So this patch will allocate it at runtime.

* add computshape for some layers and add skip primitives in DnnGraph (intel-analytics#2740)

* add computshape for some layer and add skip primitives in DnnGraph

* meet pr comments

* Improve documentation (intel-analytics#2745)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* include edge case to cover all the data types (#2742)

* layer auto fusion for dnn graph (intel-analytics#2746)

* add auto fusion in dnn graph

* refactor predict for dnn model (intel-analytics#2737)

* refactor predict for dnn model

* remove some unit tests (intel-analytics#2752)

* remove some conflict tests (#2753)

* Update documentation (intel-analytics#2749)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Fix Add operation error when type is Double importing Tensorflow graph (#2721)

* feature: add byte supports for DnnTensor (intel-analytics#2751)

* feat: add byte supports for DnnTensor

* [New Feature] Calculating Scales (#2750)

* [New Feature]Calculating Scales

* recursively update mask for container module (intel-analytics#2754)

* recursively update mask for container module

* [Enhancement] - Speed up BlasWrapper performance under MKL-DNN (intel-analytics#2748)

* add parallel in Blaswrapper

* refactor to support ssd

* meet pr comments

* fix logger serialize

* Loss Function docs improvement (intel-analytics#2757)

* Improve Loss Function docs v2

* change asInstanceOf to toDistirbuted in optimizer (#2755)

* change asInstanceOf to toDistirbuted

* change asInstanceOf to toDistirbuted

* convert scale in blas to dnn (#2758)

* convert scale in blas to dnn

* meet pr comment

* feat: reorder for int8 supports (#2756)

1. Because the new data type, we should add a new attribute called dataType
    to the `MemoryData`.
2. Because we should transfer the scales between FP32->int8 and Int8->FP32.
    we should add two new attributes called `mask` and `scales`.

* fix conversion accuracy (intel-analytics#2760)

*  fix accuracy for saved model

* exclude mkldnn model when conversion

* feature: layer wise supports of int8 (intel-analytics#2762)

Enable the int8 data type in layers, especially for convolutions.
So for a specific layer, it can accept a int8 input. If you want to the fp32
output, should add a reorder.

* feature: mkldnn int8 layer wise supports (intel-analytics#2759)

including 3 steps.

1. generate scales of model.
   need an api like `generateScalesWithMask` to generate the scales of
   fp32 model. and the model returned is an fp32 model too.
2. quantize the model
   the `quantize()` api will be compatible with the `bigquant`
   backend, which will set the quantize flag. And when doing compile,
   the quantized weight, output, input will be generated by mkldnn at
   runtime.
3. do the inference (forward).

* update readme for v1 training (intel-analytics#2763)

* update release doc for preparation (intel-analytics#2764)

* change some docs about mkldnn (intel-analytics#2765)

* add comments about mkldnn

* meet pr comments

* examples for int8 (intel-analytics#2761)

This is an example of how to use mkldnn int8. There're two steps, use
GenInt8Scales to generate the scales first and save the new model. And than you
can use the quantized model as usual.

* enable fustion by default (intel-analytics#2766)

* fix: the influence of default value of fusion (#2768)

* fix: use too much memory of mkldnn models (intel-analytics#2783)

* fix: inplace of input/output and weight dimension error (intel-analytics#2779)

Some layer's input and output use the same memory. We can't do forward in the
`calcScales`. Because at that time, the input has been changed, its scales maybe
not right. Such as,

Seqeuntail().add(Conv).add(ReLU)

it will do two steps, seq.forward(input) first. and when go into the ReLU, it
will do another forward, so the input will be the output. And scales will be
wrong.

For convolution's weight, the dimension always is 5, although the group number
is 1. But for dnn convolution, if there's no group, the weight's dimension
should be 4.

* fix: the blas wrapper has no scales (intel-analytics#2778)

* fix softmax (intel-analytics#2777)

* fix: performance regression on resnet50 (intel-analytics#2774)

the u8 to s8 or s8 to u8 needs no reorder on this case.

* fix log init (#2781)

* fix: dropout should init primitive (#2789)

* Docs update for spark 2.3, build 0.7 and deps exlude (intel-analytics#2671)

* flip to 0.9.0 (intel-analytics#2792)

* Improve Layer documentation v1 (#2767)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Loss Function docs improvement v1

* Improve Loss Function docs v2

* Improve Layers documentation

* Improve documentation on Activations

* minor fix

* Update a code section with python style on Metrics.md (intel-analytics#2665)

* [Fix] doc : some changes for scalaUserGuide and release links according to … (intel-analytics#2791)

* doc : some changes for scalaUserGuide and release links according to v0.8.0 release

* Update build-bigdl-core.md

* Update build-bigdl-core.md

* test: should compare the right grad input (intel-analytics#2794)

* fix the wrong error message (#2800)

* [New feature] Add attention layer and ffn layer (intel-analytics#2795)

* add attention layer

* add ffn layer and more unit tests

* refactor according to pr comments

* add SerializationTest

* fix unit tests

* add python api

* update readme with newly adopted mkl-dnn (#2803)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer (intel-analytics#2802)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer:
Add LARS optimizer: Layer-wise scaled. Also with utility functions to build a set of LARS optim for a container.

Bug fix: The gradient block id of AllReduceParameter is originally composed of {id}{pidTo}gradientBytes{pidFrom}. But the combination of {id}{pidTo} will cause ambiguity. e.g., "112" can be {1}{12} or {11}{2}. Now a "_" is added to separate id from pidTo

* refine documents, correctly set the lrSchedulerOwner bit

* format the added code

* make Lars inherit SGD

* rename Lars -> LarsSGD and reformat

* style changes

* bugfix - set mask for container (intel-analytics#2807)

* bugfix - set mask for container

* bugfix #2805: set dimension mask

* Update Graph.scala

* Update Graph.scala

* change set mask indicator's name

* rename set mask params

* [Enhancement]: Scala Reflection: get default value for constructor parameters (intel-analytics#2808)

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* resolve conflict

* resolve conflict

* code style check

* remove print

* fix typos

fix typos

* replace randomcropper with centercrop for better performance (#2818)

* fix: memory data hash code should contain data type (intel-analytics#2821)

* Optimize backward graph generation and CAddTable (intel-analytics#2817)

* Optimize backward graph generation and caddtable

* refine add table

* change api name

* add layer norm and expand size layers (#2819)

* add layer norm and expand size

* meet pr comments

* feat: enable global average pooling (intel-analytics#2823)

* feat: enable global average pooling

* test: add more unit tests

* Optimizers: use member variable in parent class

* Revert "Optimizers: use member variable in parent class"

This reverts commit 7e47204

* Dilation in MKL-DNN Convolution (intel-analytics#2815)

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* fix typos

fix typos

* make todo all uppercase

* fix: calculate arbitrary mask of scales (intel-analytics#2822)

* Use one AllReduceParameter for multi-optim method  training (intel-analytics#2814)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* change random seed in UT

* [New feature] add transformer layer (intel-analytics#2825)

* add transformer

* refactor class name

* use same embedding for translation

* fix pr comments

* [Bug Fix] Fix Issue 2734 (#2816)

* fix issue 2734

* fix issue 2734

* fix issue 2734

* [Refactor] Reflection Utilization (#2831)

* refactor reflection utils

* refactor reflection utils

* feat: MKLDNN LSTM unidirectional/bidirectional inference support (intel-analytics#2806)

* LSTM draft

* MKLDNN LSTM fixed MD

* added hiddenSize

* setMemoryData NativeData

* weights NativeData format set to ldigo, all 1 test passed

* fixed format any problem

* LSTM weights bias initialisation

* add LSTM2 in nn

* Bidirectional LSTM inference enabled

* modified Bidirectional test

* LSTMSpec input format conversion bug between bigdl and mkldnn fixed, not support random weights, bias

* fixed the last problem 1 3 2 4

* Three inference tests with randomly generated parameters

* Added comments and modified the LSTMSpec (tests using Equivalent.nearequals)

* Deleted nn/LSTM2. Renamed methods. Added a requirement in nn/TimeDistributed

* combined initMemoryDescs() into initFwdPrimitives()

* Add require for input size and hidden size matching if layers of LSTM is more than one

* Refactor RNN

* Add comment on gate order to mkldnn/RNN

* Add unidirectional multilayer test

* add comments/ modify UTs

* phase is not used anymore/ use isTraining() in stead

* operationWant enhanced/ weight init/ release() parameters()

* remove input format check and change some variables names

* input format check / throw exception print info / release code

* comment style and RNNSerialTest

* remove unnecessary comments

* Softmax -> SoftMax (#2837)

* bug fix for cmul (intel-analytics#2836)

* bug fix for cmul

* meet pr comments

* set new storage to weight and bias for weight fusion (intel-analytics#2839)

* Add parameter processor for LARS (#2832)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* implement lars whole layer gradient norm calculation

* change random seed in UT

* add limitation on "trust" of LARS, remove debug output

* reformat

* add tests in DirtriOptimizer for LARS

* reformat

* update parameters in UT

* update parameters in UT

* Add transformer to LM example (intel-analytics#2835)

* add transformer to LM example

* refactor dropout in Transformer

* meet pr comments

* feat: MKLDNN LSTM unidirectional/bidirectional backward support (#2840)

* MKLDNN LSTM backward support with accuracy testing

* fix: require consistent between shape and layout of mkldnn (intel-analytics#2824)

* fix: fusion for multi-group of convolution (intel-analytics#2826)

* fix: support int8 of jointable (#2827)

* fix: support int8 of jointable
* doc: add more docs

* fix: invokeAndWait2 should throw the exception in the tasks (intel-analytics#2843)

* fix acc bug & init dnn thread (intel-analytics#2841)

* support tnc and ntc conversion (intel-analytics#2844)

* support ntc in dnn layer (intel-analytics#2847)

* support ntc in dnn layer

* meet pr comments

* [WIP]Add beam search feature in transformer model (intel-analytics#2834)

* add beam search feature

* Update beam search feature and unit test

* add symbolToLogits function set check

* update clearState and add serial test

* add SequenceBeamSearch to python layers

* add createSequenceBeamSearch method to python api

* feat: add a property to disable omp thread affinity (intel-analytics#2849)

* fix: use treeset to calc topk to upgrade the performance of DetectionOutputSSD (intel-analytics#2853)

* fix: wrong affinity settings (intel-analytics#2857)

* update beam search feature for interface with transformer model (#2855)

* update beam search for padding value and cache structure

* update python API for beam search

* add comments and update python layer

* modify comments format

* modify comments format

* Support converting blas lstm to dnn lstm (#2846)

* convert from blas lstm to dnn lstm

* meet pr comments

* fix load lstm error bug (intel-analytics#2858)

* Add beam search in transformer (intel-analytics#2856)

* Add beam search in transformer

* meet pr comments

* fix: upgrade the performance of normalize (intel-analytics#2854)

* feat: add axis to softmax (intel-analytics#2859)

* add release doc for 0.9 (intel-analytics#2862)

* fix: update core ref to master (intel-analytics#2865)

* flip version to 0.10.0 (intel-analytics#2869)

* [Bug Fix] - Fix module version comparison  (intel-analytics#2871)

* update serialization

* update serialization

* convert IRgraph momentum to mkldnn (intel-analytics#2872)

* tutorial fix (intel-analytics#2879)

* feat: RoiAlign Forward (intel-analytics#2874)

* Add set input output format API in Python (intel-analytics#2880)

* add set input output format

* add static graph check

* feat: Feature Pyramid Networks Forward (intel-analytics#2870)

* fix memory leak for ir graph training (intel-analytics#2895)

* add gemm layer (#2882)

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add gemm layer

* add gemm layer

* add Shape layer (intel-analytics#2885)

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add Gather layer (intel-analytics#2897)

* add gather layer

* [New feature] Add maskhead (intel-analytics#2892)

* support for maskhead

* fix unit tests (intel-analytics#2905)

* modify  predict/predictClass function  (#2868)

* predictClass output modification

* predict/predictClass function modification in Beta Api

* predict/predictClass function modification

* predict/predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* [New feature] Add Boxhead (intel-analytics#2894)

* add boxhead

* add SerialTest

* meet pr comments

* fix: Add TopBlocks to Feature Pyramid Networks (FPN) (#2899)

* Add Mean Average Precision validation method (intel-analytics#2906)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* fix boxhead unit tests (#2912)

* python api nested list input and pooler python api (intel-analytics#2900)

* Auto memory management for MKLDNN (#2867)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* style fixes

* change _implicitMemoryOwner -> _this

* [New feature] Add region proposal (intel-analytics#2896)

* add Regionproposal

* [New feature] add maskrcnn (#2908)

* add maskrcnn

* fix mask head

* move maskrcnn to models

* add maskrcnn serialTest

* Add Onnx Supported Layers (intel-analytics#2902)

* remove duplicated layers

* Update RoiLabel class and add RoiImageFeatureToBatch (intel-analytics#2913)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* update RoiLabel, add RoiImageFeatureToBatch

* fix typo in class name

* updates by suggestions

* minor updates

* Move RoiMiniBatch to MTImageFeatureToBatch.scala

* mask in RoiLabel now have Floats not Bytes

* use IndexedSeq for RoiLabel

* style fix

* add isCrowd and origSize to final target table

* style fix

* isCrowd change to float, add doc

* add tests and bug fixes

* add util getting RoiLabels from ImageFeatures

* add util getting RoiLabels from Table

* comment out the tests

* rename utils in RoiLabel

* feat: MKLDNN GRU forward/backward support (#2893)

* Onnx support: modify unsqueeze function (#2910)

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* add maskutils (intel-analytics#2921)

* add maskutils

* update tests & docs

* fix typo in document

* Fix memory leaks on training (intel-analytics#2914)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* fix memory leak in batch norm

* style fixes

* change _implicitMemoryOwner -> _this

* release submat

* release opencv submats

* support samples with different size  to one mini batch (intel-analytics#2929)

* add to batch with resize

* meet comments

* support batch for mask head and pooler (intel-analytics#2926)

* support batch for mask head

* meet comments

* Onnx support: add a dim parameter to ops.Gather (intel-analytics#2920)

* add dim parameter to ops.Gather

* improve and simplify code

* improve and simplify code

* improve and simplify code

* improve and simplify code

* support batch for regionproposal (#2928)

* support batch for regionproposal

* enable gru blas-to-dnn conversion (intel-analytics#2930)

* Onnx support: add pos parameter to softmax (intel-analytics#2933)

* add pos parameter to softmax

* add pos parameter to softmax

* add pos parameter to softmax

* fix review problem

* fix review problem

* Add resize for segmentation (intel-analytics#2923)

* add resize for segmentation

* meet pr comments

* support batch input for boxhead (#2924)

* boxhead support batch input

* meet pr comments

* COCO SeqFile (intel-analytics#2927)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* ignore non-existing images

* updates based on GH comments

* ONNX Support (#2918)

* onnx dev

* add onnx loader

* clean up

* feat: add precision recall auc (#2941)

* feat: add precision recall auc

* add post processing for maskrcnn model (#2931)

* add mask postprocessing

* put image info to mask model

* fix TimeDistributedCriterion() lack of parameter of dimension issue (intel-analytics#2940)

* revert back api (intel-analytics#2943)

* fix: softmax and bn+scale fusion (intel-analytics#2937)

* feat: multi models support with MKL-DNN backend (intel-analytics#2936)

* feat: multi models support with MKL-DNN backend

* add COCO MAP (#2935)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* add COCO MAP

* revert merge conflict

* ignore non-existing images

* add IOU related API. MAP now parses RLEs

* BBox now inclusive

* updates based on GH comments

* add COCODataset.getImageById

* COCO topK default => -1, remove height: Int, width: Int in GroundTruthRLE

* update imageId2Image

* rename MAPObjectDetection utils, add cocoSegmentationAndBBox, refine formatting

* rename utils

* update documents

* check size of bbox & classes & scores & labels & iscrowd. Handle empty predictions

* add gt and target image size checking, add support for empty target bbox, add UT

* detection sorted before matching with GT. Optimize MAPResult merging. Add UT for merging

* COCO Seq file reader: grey to bgr (intel-analytics#2942)

* grey to bgr

* refactor isGrayScaleImage

* simplify grey scale image checking

* Add the flushing denormal values option on BigDL side (#2934)

* add no argument apply api for softmax (intel-analytics#2945)

* add no argument apply api for softmax

* add no argument apply api for softmax

* ONNX ResNet example (intel-analytics#2939)

* add onnx resnet example

* add doc for onnx

* add doc for onnx

* clean up

* add maskrcnn inference example (intel-analytics#2944)

* add maskrcnn inference example

* meet pr comments

* add model download url

* Update the RoiLabel and MTImageFeatureToBatch (intel-analytics#2925)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* Python MKLDNN examples for CNN(LeNet) and RNN(LSTM) (#2932)

* fix: takeSample only works for dnn backend and get one batch (intel-analytics#2947)

* fix: takeSample only works for dnn backend and get one batch

* edit doc (#2948)

* Rename filesToRoiImageFrame to filesToRoiImageFeatures (intel-analytics#2949)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* filesToRoiImageFrame -> filesToRoiImageFeatures, to public

* fix: move out setMklThreads of MklDnn (intel-analytics#2950)

* memory data cleanup (#2956)

* memory data cleanup

* Onnx support: RoiAlign and TopK parameter update (#2957)

* Topk add dim and increase parameter

* RoiAlign add max pooling mode

* add test cases

* add test cases

* remove masks requirements (intel-analytics#2959)

* fix: the squeeze should not be included in IRElement (intel-analytics#2962)

* enhance COCODataset (#2954)

* enhance COCODataset:
Add COCODataset.loadFromSeqFile
Add COCODataset.toImageFeatures
Add COCOImage.toTable

* rename and polish doc

* fix COCO serialize bug

* fix typo in function name

* typo fix (intel-analytics#2965)

* rename RoiImageFeatureToBatch APIs (#2964)

* RoiMiniBatch enhancement (#2953)

* SerializableIndexedSeq

* allow empty target & image size info

* rename RoiImageFeatureToBatch APIs

* set as private

* change back to array

* MTImageFeatureToBatch without labels

* handle iscrowd

* remove duplication in merge

* feat: add softmax backward (intel-analytics#2967)

* feat: add softmax backward

* fix: fuse bn scale and relu to bn. (intel-analytics#2966)

* fix: fuse bn scale and relu.

* fix mask unit tests (intel-analytics#2973)

* fix: nms stability when using treeset. (intel-analytics#2972)

* flip version to 0.11 (intel-analytics#2974)

* refactor anchor generator (#2963)

* refactor anchor generator

* meet pr comments

* fix code style

* ROIAlign refactor (intel-analytics#2960)

* ROIAlign refactor

* fix unit tests

* fix model load of maskrcnn (intel-analytics#2961)

* fix maskrcnn model load

* delete temp file

* fix maskrcnn tests

* support roialign backward (intel-analytics#2975)

* support roialign backward

* fix sparselinear unit test

* fix: bn nhwc error, the channel should be the last dim (#2981)

* refactor: move torch relevants unit tests to integration tests. (intel-analytics#2971)

* fix: enable integration accuracy tests (intel-analytics#2976)

* fix: softmax dnn backend wrong order of primitive (intel-analytics#2986)

* modify TextClassifier.scala (#2987)

* Add a method to merge nested StaticGraphs (intel-analytics#2985)

* NHWC support when running with MKL-DNN (#2989)

* support NHWC for MKLDNN

* fix unit tests

* Keras with MKL-DNN backend support (#2990)

* Update README.md

* Update README.md

* feat: add distri optimizer v2 (intel-analytics#2992)

* update error message in AllReduceParameter (#2997)

* update error message in AllReduceParameter

* use tensorflow proto jar (#2994)

* fix callBigDLFunc (intel-analytics#3002)

* Remove final for AbstractModule (intel-analytics#3001)

* DistriOptimizerV2 argument (intel-analytics#3003)

* call DistriOptimizerV2

* fix inception (intel-analytics#3010)

* fix top1 and treenn (intel-analytics#3011)

* remove final setExtraParameters (#3014)

* move pretrain in DistriOptimizerV2 (intel-analytics#3016)

* move getData

* rename

* remove time counting

* deprecate dlframe (intel-analytics#3012)

* deprecate dlframe

* fix throughput (#3017)

* fix throughput

* update

* add release doc for 0.10.0 (intel-analytics#3020)

* test examples by distrioptimizerv2 (intel-analytics#3007)

* enable scala examples by distrioptimizerv2

* update example's readme

* update integration test

* test python examples by distriOptimizerV2 (intel-analytics#3008)

* Test python examples by distriOptimizerV2

* deprecate nn.keras (intel-analytics#3013)

* deprecate nn.keras

* fix loss when minibatch size is different (intel-analytics#3021)

* fix loss

* fix ut

* fix style check (intel-analytics#3022)

* specify pyspark version (intel-analytics#3030)

* specify pyspark version

* add release doc for 0.11 (#3026)

* flip version to 0.12 (intel-analytics#3029)



* update

* fix KerasLayer new parameters() (#3034)

* Fix analytics zoo protobuf shading problem (intel-analytics#3033)

* change shade name and remove protobuf-java (already introduced by tf)

* remove protobuf

* add required dependencies (#3047)

* update doc (intel-analytics#3056)

* Updatedoc (#3060)

* Update install-from-pip.md

* [WIP] spark 3.0 (intel-analytics#3054)

* spark 3.0

* add spark3.0 deployment (intel-analytics#3061)

* add spark3.0 deployment

* add warning to remind Optimizer() deprecates (intel-analytics#3062)

* add warning to remind deprecates

* Update scala maven plugin (#3068)

* update scala maven plugin

* change to public (#3064)

* Add big model support (#3067)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* squeeze target dimension (corner case) in ClassNLLCriterion (intel-analytics#3072)

* fix target dimension match error

* update message (#3073)

* flip version to 0.13-snapshot (intel-analytics#3074)

* flip version to 0.13-snapshot

* Uncompressed Tensor  (intel-analytics#3079)

* support no compressing parameter

* address comments

* hotfix ClassNLLCriterion with cloned target (#3081)

* hotfix ClassNLLCriterion with cloned target

* Fix SerializationUtils clone issue of QuantizedTensor (intel-analytics#3088)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* update clone quantizedtensor

* update

* add OptimPredictorShutdownSpec UT in integration test (#3089)

* move integration UT to a general test script (intel-analytics#3094)

* back port master (intel-analytics#3096)

* set seed to avoid random error in PredictionServiceUT (intel-analytics#3097)

* Jdk11 support (intel-analytics#3098)

* update for jdk 11 support and doc

* add serializeUid (intel-analytics#3099)

* update doc (intel-analytics#3104)

* add doc for running in ide (intel-analytics#3106)

* fix callBigDLFunc return a Int while the true return value from java is a byte array. (intel-analytics#3111)

* add list of df support (intel-analytics#3113)

* Update readme (intel-analytics#3118)

* Update index.md

* add 0.12.2 release download (#3122)

* remove DLFrames (intel-analytics#3124)

* remove DLFrames

* update

* update

* update

* rm dlframe example from test script

* Add Utest about dividing zero (#3128)

* Add Utest about dividing zero

* add Utest and zero check of LocalData

* add Utest and zero check of LocalData

* change

* Add Utest about dividing zero

* fix test

* add python3 to Dockerfile (intel-analytics#3132)

* add python3 to Dockerfile

* update

* update jdk

* update

* make default DistriOptimizer as V2 (intel-analytics#3129)

* make default DistriOptimizer as V2

* update

* fix dlframe (intel-analytics#3133)

* DistriOptimizerV2 logger (intel-analytics#3135)

* DistriOptimizerV2 logger

* update

* fix style check

* validate epoch num

* move dlframe SharedParamsApater to AZ and roll back to OptimizerV1 (intel-analytics#3137)

* upgrade spark version (intel-analytics#3138)

* Update deploy-spark2.sh

* 0.13 release doc (#3144)

* upgrade log4j (intel-analytics#3141)

* flip0.14 (intel-analytics#3142)

* flip0.14

* update

* Update deploy-spark3.sh (#3145)

* update

* update

* update

* update

* fix make dist

* migrate path

* update

* update

Co-authored-by: zhangxiaoli73 <380761639@qq.com>
Co-authored-by: Jerry Wu <wzhongyuan@gmail.com>
Co-authored-by: Xin Qiu <qiuxin2012@users.noreply.github.com>
Co-authored-by: Yanzhang Wang <i8run15@gmail.com>
Co-authored-by: GenBrg <34305977+GenBrg@users.noreply.github.com>
Co-authored-by: LeicongLi <leicongli@gmail.com>
Co-authored-by: Emiliano Martinez <emimartinez.sanchez@gmail.com>
Co-authored-by: abdolence <abdulla.abd.m@gmail.com>
Co-authored-by: Enrique Garcia <engapa@gmail.com>
Co-authored-by: Louie Tsai <louie.tsai@intel.com>
Co-authored-by: yaochi <yaochitc@gmail.com>
Co-authored-by: Menooker <Menooker@users.noreply.github.com>
Co-authored-by: Menooker <myjisgreat@live.cn>
Co-authored-by: Firecrackerxox <mengceng.he@intel.com>
Co-authored-by: majing921201 <1834475657@qq.com>
Co-authored-by: jenniew <jenniewang123@gmail.com>
Co-authored-by: Xiao <lingxiao1989@gmail.com>
Co-authored-by: Firecrackerxox <he044646@sina.com>
Co-authored-by: Hui Li <lihuibinghan@sina.com>
Co-authored-by: Jason Dai <jason.dai@intel.com>
Co-authored-by: dding3 <ding.ding@intel.com>
Co-authored-by: Yang Wang <yang3.wang@intel.com>
Co-authored-by: Yina Chen <33650826+cyita@users.noreply.github.com>
Co-authored-by: Hangrui Cao <50705298+DiegoCao@users.noreply.github.com>
Co-authored-by: pinggao18 <44043817+pinggao18@users.noreply.github.com>
Le-Zheng pushed a commit to Le-Zheng/analytics-zoo that referenced this pull request Sep 17, 2021
Le-Zheng added a commit to Le-Zheng/analytics-zoo that referenced this pull request Sep 22, 2021
* convert static graph to IR graph and build (intel-analytics#2711)

* add static graph to IR graph

* meet pr comments

* [Enhancement] - Enhance unig test to avoid dynamic resource allocation issue by docker (intel-analytics#2713)

* make the core number fixed

* fix local predictor

* add Trigger and/or python API (intel-analytics#2682)

* add spark 2.4 support (intel-analytics#2715)

* update sparse tensor's document (#2714)

* Reserve all state in OptimMethod when calling Optimizer.optimize() multiple times (#2648)

* reserve optimMethod for each worker

* add valdiation throughput

* cache variable previousOptim

* fix: move mkldnn computing to a single thread pool (intel-analytics#2724)

Because if we use the parent thread directly, there will be two bugs,
1. The child threads forked from parent thread will be bound to core 0
because of the affinity settings.
2. The native thread has some unknown thread local variables. So if
the parent thread exits and is recreated, such as the thread from
Executors.newFixedThreadPool. The whole app will be segment fault.
The parent thread means the main thread (Local Mode) or worker thread of
mapPartition (Distributed Mode).

* add ceilMode for Pooling & fix batchNorm evaluate (#2708)

* add ceilMode for Pooling & fix batchNorm evaluate

* add training status for dnn layer

* fix comments

* fix IRGraph init & Add regualizer (#2736)

* fix IRGraph init & Add regualizer

* meet review comments

* fix: update mkldnn version to v0.17 issues. (intel-analytics#2712)

There're two issues,

1. the padding tensor required. mkl-dnn will use a padding tensor which
    will use more memory, such as 4x1x28x28 to 4x8x28x28(avx2). It will
    pad to times of simd width.
2. the TensorMMap between DenseTensor and DnnTensor. Previous impl
    will allocate DnnTensor when model is created, which will cost too much
    space. So this patch will allocate it at runtime.

* add computshape for some layers and add skip primitives in DnnGraph (intel-analytics#2740)

* add computshape for some layer and add skip primitives in DnnGraph

* meet pr comments

* Improve documentation (intel-analytics#2745)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* include edge case to cover all the data types (#2742)

* layer auto fusion for dnn graph (intel-analytics#2746)

* add auto fusion in dnn graph

* refactor predict for dnn model (intel-analytics#2737)

* refactor predict for dnn model

* remove some unit tests (intel-analytics#2752)

* remove some conflict tests (#2753)

* Update documentation (intel-analytics#2749)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Fix Add operation error when type is Double importing Tensorflow graph (#2721)

* feature: add byte supports for DnnTensor (intel-analytics#2751)

* feat: add byte supports for DnnTensor

* [New Feature] Calculating Scales (#2750)

* [New Feature]Calculating Scales

* recursively update mask for container module (intel-analytics#2754)

* recursively update mask for container module

* [Enhancement] - Speed up BlasWrapper performance under MKL-DNN (intel-analytics#2748)

* add parallel in Blaswrapper

* refactor to support ssd

* meet pr comments

* fix logger serialize

* Loss Function docs improvement (intel-analytics#2757)

* Improve Loss Function docs v2

* change asInstanceOf to toDistirbuted in optimizer (#2755)

* change asInstanceOf to toDistirbuted

* change asInstanceOf to toDistirbuted

* convert scale in blas to dnn (#2758)

* convert scale in blas to dnn

* meet pr comment

* feat: reorder for int8 supports (#2756)

1. Because the new data type, we should add a new attribute called dataType
    to the `MemoryData`.
2. Because we should transfer the scales between FP32->int8 and Int8->FP32.
    we should add two new attributes called `mask` and `scales`.

* fix conversion accuracy (intel-analytics#2760)

*  fix accuracy for saved model

* exclude mkldnn model when conversion

* feature: layer wise supports of int8 (intel-analytics#2762)

Enable the int8 data type in layers, especially for convolutions.
So for a specific layer, it can accept a int8 input. If you want to the fp32
output, should add a reorder.

* feature: mkldnn int8 layer wise supports (intel-analytics#2759)

including 3 steps.

1. generate scales of model.
   need an api like `generateScalesWithMask` to generate the scales of
   fp32 model. and the model returned is an fp32 model too.
2. quantize the model
   the `quantize()` api will be compatible with the `bigquant`
   backend, which will set the quantize flag. And when doing compile,
   the quantized weight, output, input will be generated by mkldnn at
   runtime.
3. do the inference (forward).

* update readme for v1 training (intel-analytics#2763)

* update release doc for preparation (intel-analytics#2764)

* change some docs about mkldnn (intel-analytics#2765)

* add comments about mkldnn

* meet pr comments

* examples for int8 (intel-analytics#2761)

This is an example of how to use mkldnn int8. There're two steps, use
GenInt8Scales to generate the scales first and save the new model. And than you
can use the quantized model as usual.

* enable fustion by default (intel-analytics#2766)

* fix: the influence of default value of fusion (#2768)

* fix: use too much memory of mkldnn models (intel-analytics#2783)

* fix: inplace of input/output and weight dimension error (intel-analytics#2779)

Some layer's input and output use the same memory. We can't do forward in the
`calcScales`. Because at that time, the input has been changed, its scales maybe
not right. Such as,

Seqeuntail().add(Conv).add(ReLU)

it will do two steps, seq.forward(input) first. and when go into the ReLU, it
will do another forward, so the input will be the output. And scales will be
wrong.

For convolution's weight, the dimension always is 5, although the group number
is 1. But for dnn convolution, if there's no group, the weight's dimension
should be 4.

* fix: the blas wrapper has no scales (intel-analytics#2778)

* fix softmax (intel-analytics#2777)

* fix: performance regression on resnet50 (intel-analytics#2774)

the u8 to s8 or s8 to u8 needs no reorder on this case.

* fix log init (#2781)

* fix: dropout should init primitive (#2789)

* Docs update for spark 2.3, build 0.7 and deps exlude (intel-analytics#2671)

* flip to 0.9.0 (intel-analytics#2792)

* Improve Layer documentation v1 (#2767)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Loss Function docs improvement v1

* Improve Loss Function docs v2

* Improve Layers documentation

* Improve documentation on Activations

* minor fix

* Update a code section with python style on Metrics.md (intel-analytics#2665)

* [Fix] doc : some changes for scalaUserGuide and release links according to … (intel-analytics#2791)

* doc : some changes for scalaUserGuide and release links according to v0.8.0 release

* Update build-bigdl-core.md

* Update build-bigdl-core.md

* test: should compare the right grad input (intel-analytics#2794)

* fix the wrong error message (#2800)

* [New feature] Add attention layer and ffn layer (intel-analytics#2795)

* add attention layer

* add ffn layer and more unit tests

* refactor according to pr comments

* add SerializationTest

* fix unit tests

* add python api

* update readme with newly adopted mkl-dnn (#2803)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer (intel-analytics#2802)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer:
Add LARS optimizer: Layer-wise scaled. Also with utility functions to build a set of LARS optim for a container.

Bug fix: The gradient block id of AllReduceParameter is originally composed of {id}{pidTo}gradientBytes{pidFrom}. But the combination of {id}{pidTo} will cause ambiguity. e.g., "112" can be {1}{12} or {11}{2}. Now a "_" is added to separate id from pidTo

* refine documents, correctly set the lrSchedulerOwner bit

* format the added code

* make Lars inherit SGD

* rename Lars -> LarsSGD and reformat

* style changes

* bugfix - set mask for container (intel-analytics#2807)

* bugfix - set mask for container

* bugfix #2805: set dimension mask

* Update Graph.scala

* Update Graph.scala

* change set mask indicator's name

* rename set mask params

* [Enhancement]: Scala Reflection: get default value for constructor parameters (intel-analytics#2808)

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* resolve conflict

* resolve conflict

* code style check

* remove print

* fix typos

fix typos

* replace randomcropper with centercrop for better performance (#2818)

* fix: memory data hash code should contain data type (intel-analytics#2821)

* Optimize backward graph generation and CAddTable (intel-analytics#2817)

* Optimize backward graph generation and caddtable

* refine add table

* change api name

* add layer norm and expand size layers (#2819)

* add layer norm and expand size

* meet pr comments

* feat: enable global average pooling (intel-analytics#2823)

* feat: enable global average pooling

* test: add more unit tests

* Optimizers: use member variable in parent class

* Revert "Optimizers: use member variable in parent class"

This reverts commit 7e47204

* Dilation in MKL-DNN Convolution (intel-analytics#2815)

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* fix typos

fix typos

* make todo all uppercase

* fix: calculate arbitrary mask of scales (intel-analytics#2822)

* Use one AllReduceParameter for multi-optim method  training (intel-analytics#2814)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* change random seed in UT

* [New feature] add transformer layer (intel-analytics#2825)

* add transformer

* refactor class name

* use same embedding for translation

* fix pr comments

* [Bug Fix] Fix Issue 2734 (#2816)

* fix issue 2734

* fix issue 2734

* fix issue 2734

* [Refactor] Reflection Utilization (#2831)

* refactor reflection utils

* refactor reflection utils

* feat: MKLDNN LSTM unidirectional/bidirectional inference support (intel-analytics#2806)

* LSTM draft

* MKLDNN LSTM fixed MD

* added hiddenSize

* setMemoryData NativeData

* weights NativeData format set to ldigo, all 1 test passed

* fixed format any problem

* LSTM weights bias initialisation

* add LSTM2 in nn

* Bidirectional LSTM inference enabled

* modified Bidirectional test

* LSTMSpec input format conversion bug between bigdl and mkldnn fixed, not support random weights, bias

* fixed the last problem 1 3 2 4

* Three inference tests with randomly generated parameters

* Added comments and modified the LSTMSpec (tests using Equivalent.nearequals)

* Deleted nn/LSTM2. Renamed methods. Added a requirement in nn/TimeDistributed

* combined initMemoryDescs() into initFwdPrimitives()

* Add require for input size and hidden size matching if layers of LSTM is more than one

* Refactor RNN

* Add comment on gate order to mkldnn/RNN

* Add unidirectional multilayer test

* add comments/ modify UTs

* phase is not used anymore/ use isTraining() in stead

* operationWant enhanced/ weight init/ release() parameters()

* remove input format check and change some variables names

* input format check / throw exception print info / release code

* comment style and RNNSerialTest

* remove unnecessary comments

* Softmax -> SoftMax (#2837)

* bug fix for cmul (intel-analytics#2836)

* bug fix for cmul

* meet pr comments

* set new storage to weight and bias for weight fusion (intel-analytics#2839)

* Add parameter processor for LARS (#2832)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* implement lars whole layer gradient norm calculation

* change random seed in UT

* add limitation on "trust" of LARS, remove debug output

* reformat

* add tests in DirtriOptimizer for LARS

* reformat

* update parameters in UT

* update parameters in UT

* Add transformer to LM example (intel-analytics#2835)

* add transformer to LM example

* refactor dropout in Transformer

* meet pr comments

* feat: MKLDNN LSTM unidirectional/bidirectional backward support (#2840)

* MKLDNN LSTM backward support with accuracy testing

* fix: require consistent between shape and layout of mkldnn (intel-analytics#2824)

* fix: fusion for multi-group of convolution (intel-analytics#2826)

* fix: support int8 of jointable (#2827)

* fix: support int8 of jointable
* doc: add more docs

* fix: invokeAndWait2 should throw the exception in the tasks (intel-analytics#2843)

* fix acc bug & init dnn thread (intel-analytics#2841)

* support tnc and ntc conversion (intel-analytics#2844)

* support ntc in dnn layer (intel-analytics#2847)

* support ntc in dnn layer

* meet pr comments

* [WIP]Add beam search feature in transformer model (intel-analytics#2834)

* add beam search feature

* Update beam search feature and unit test

* add symbolToLogits function set check

* update clearState and add serial test

* add SequenceBeamSearch to python layers

* add createSequenceBeamSearch method to python api

* feat: add a property to disable omp thread affinity (intel-analytics#2849)

* fix: use treeset to calc topk to upgrade the performance of DetectionOutputSSD (intel-analytics#2853)

* fix: wrong affinity settings (intel-analytics#2857)

* update beam search feature for interface with transformer model (#2855)

* update beam search for padding value and cache structure

* update python API for beam search

* add comments and update python layer

* modify comments format

* modify comments format

* Support converting blas lstm to dnn lstm (#2846)

* convert from blas lstm to dnn lstm

* meet pr comments

* fix load lstm error bug (intel-analytics#2858)

* Add beam search in transformer (intel-analytics#2856)

* Add beam search in transformer

* meet pr comments

* fix: upgrade the performance of normalize (intel-analytics#2854)

* feat: add axis to softmax (intel-analytics#2859)

* add release doc for 0.9 (intel-analytics#2862)

* fix: update core ref to master (intel-analytics#2865)

* flip version to 0.10.0 (intel-analytics#2869)

* [Bug Fix] - Fix module version comparison  (intel-analytics#2871)

* update serialization

* update serialization

* convert IRgraph momentum to mkldnn (intel-analytics#2872)

* tutorial fix (intel-analytics#2879)

* feat: RoiAlign Forward (intel-analytics#2874)

* Add set input output format API in Python (intel-analytics#2880)

* add set input output format

* add static graph check

* feat: Feature Pyramid Networks Forward (intel-analytics#2870)

* fix memory leak for ir graph training (intel-analytics#2895)

* add gemm layer (#2882)

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add gemm layer

* add gemm layer

* add Shape layer (intel-analytics#2885)

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add Gather layer (intel-analytics#2897)

* add gather layer

* [New feature] Add maskhead (intel-analytics#2892)

* support for maskhead

* fix unit tests (intel-analytics#2905)

* modify  predict/predictClass function  (#2868)

* predictClass output modification

* predict/predictClass function modification in Beta Api

* predict/predictClass function modification

* predict/predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* [New feature] Add Boxhead (intel-analytics#2894)

* add boxhead

* add SerialTest

* meet pr comments

* fix: Add TopBlocks to Feature Pyramid Networks (FPN) (#2899)

* Add Mean Average Precision validation method (intel-analytics#2906)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* fix boxhead unit tests (#2912)

* python api nested list input and pooler python api (intel-analytics#2900)

* Auto memory management for MKLDNN (#2867)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* style fixes

* change _implicitMemoryOwner -> _this

* [New feature] Add region proposal (intel-analytics#2896)

* add Regionproposal

* [New feature] add maskrcnn (#2908)

* add maskrcnn

* fix mask head

* move maskrcnn to models

* add maskrcnn serialTest

* Add Onnx Supported Layers (intel-analytics#2902)

* remove duplicated layers

* Update RoiLabel class and add RoiImageFeatureToBatch (intel-analytics#2913)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* update RoiLabel, add RoiImageFeatureToBatch

* fix typo in class name

* updates by suggestions

* minor updates

* Move RoiMiniBatch to MTImageFeatureToBatch.scala

* mask in RoiLabel now have Floats not Bytes

* use IndexedSeq for RoiLabel

* style fix

* add isCrowd and origSize to final target table

* style fix

* isCrowd change to float, add doc

* add tests and bug fixes

* add util getting RoiLabels from ImageFeatures

* add util getting RoiLabels from Table

* comment out the tests

* rename utils in RoiLabel

* feat: MKLDNN GRU forward/backward support (#2893)

* Onnx support: modify unsqueeze function (#2910)

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* add maskutils (intel-analytics#2921)

* add maskutils

* update tests & docs

* fix typo in document

* Fix memory leaks on training (intel-analytics#2914)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* fix memory leak in batch norm

* style fixes

* change _implicitMemoryOwner -> _this

* release submat

* release opencv submats

* support samples with different size  to one mini batch (intel-analytics#2929)

* add to batch with resize

* meet comments

* support batch for mask head and pooler (intel-analytics#2926)

* support batch for mask head

* meet comments

* Onnx support: add a dim parameter to ops.Gather (intel-analytics#2920)

* add dim parameter to ops.Gather

* improve and simplify code

* improve and simplify code

* improve and simplify code

* improve and simplify code

* support batch for regionproposal (#2928)

* support batch for regionproposal

* enable gru blas-to-dnn conversion (intel-analytics#2930)

* Onnx support: add pos parameter to softmax (intel-analytics#2933)

* add pos parameter to softmax

* add pos parameter to softmax

* add pos parameter to softmax

* fix review problem

* fix review problem

* Add resize for segmentation (intel-analytics#2923)

* add resize for segmentation

* meet pr comments

* support batch input for boxhead (#2924)

* boxhead support batch input

* meet pr comments

* COCO SeqFile (intel-analytics#2927)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* ignore non-existing images

* updates based on GH comments

* ONNX Support (#2918)

* onnx dev

* add onnx loader

* clean up

* feat: add precision recall auc (#2941)

* feat: add precision recall auc

* add post processing for maskrcnn model (#2931)

* add mask postprocessing

* put image info to mask model

* fix TimeDistributedCriterion() lack of parameter of dimension issue (intel-analytics#2940)

* revert back api (intel-analytics#2943)

* fix: softmax and bn+scale fusion (intel-analytics#2937)

* feat: multi models support with MKL-DNN backend (intel-analytics#2936)

* feat: multi models support with MKL-DNN backend

* add COCO MAP (#2935)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* add COCO MAP

* revert merge conflict

* ignore non-existing images

* add IOU related API. MAP now parses RLEs

* BBox now inclusive

* updates based on GH comments

* add COCODataset.getImageById

* COCO topK default => -1, remove height: Int, width: Int in GroundTruthRLE

* update imageId2Image

* rename MAPObjectDetection utils, add cocoSegmentationAndBBox, refine formatting

* rename utils

* update documents

* check size of bbox & classes & scores & labels & iscrowd. Handle empty predictions

* add gt and target image size checking, add support for empty target bbox, add UT

* detection sorted before matching with GT. Optimize MAPResult merging. Add UT for merging

* COCO Seq file reader: grey to bgr (intel-analytics#2942)

* grey to bgr

* refactor isGrayScaleImage

* simplify grey scale image checking

* Add the flushing denormal values option on BigDL side (#2934)

* add no argument apply api for softmax (intel-analytics#2945)

* add no argument apply api for softmax

* add no argument apply api for softmax

* ONNX ResNet example (intel-analytics#2939)

* add onnx resnet example

* add doc for onnx

* add doc for onnx

* clean up

* add maskrcnn inference example (intel-analytics#2944)

* add maskrcnn inference example

* meet pr comments

* add model download url

* Update the RoiLabel and MTImageFeatureToBatch (intel-analytics#2925)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* Python MKLDNN examples for CNN(LeNet) and RNN(LSTM) (#2932)

* fix: takeSample only works for dnn backend and get one batch (intel-analytics#2947)

* fix: takeSample only works for dnn backend and get one batch

* edit doc (#2948)

* Rename filesToRoiImageFrame to filesToRoiImageFeatures (intel-analytics#2949)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* filesToRoiImageFrame -> filesToRoiImageFeatures, to public

* fix: move out setMklThreads of MklDnn (intel-analytics#2950)

* memory data cleanup (#2956)

* memory data cleanup

* Onnx support: RoiAlign and TopK parameter update (#2957)

* Topk add dim and increase parameter

* RoiAlign add max pooling mode

* add test cases

* add test cases

* remove masks requirements (intel-analytics#2959)

* fix: the squeeze should not be included in IRElement (intel-analytics#2962)

* enhance COCODataset (#2954)

* enhance COCODataset:
Add COCODataset.loadFromSeqFile
Add COCODataset.toImageFeatures
Add COCOImage.toTable

* rename and polish doc

* fix COCO serialize bug

* fix typo in function name

* typo fix (intel-analytics#2965)

* rename RoiImageFeatureToBatch APIs (#2964)

* RoiMiniBatch enhancement (#2953)

* SerializableIndexedSeq

* allow empty target & image size info

* rename RoiImageFeatureToBatch APIs

* set as private

* change back to array

* MTImageFeatureToBatch without labels

* handle iscrowd

* remove duplication in merge

* feat: add softmax backward (intel-analytics#2967)

* feat: add softmax backward

* fix: fuse bn scale and relu to bn. (intel-analytics#2966)

* fix: fuse bn scale and relu.

* fix mask unit tests (intel-analytics#2973)

* fix: nms stability when using treeset. (intel-analytics#2972)

* flip version to 0.11 (intel-analytics#2974)

* refactor anchor generator (#2963)

* refactor anchor generator

* meet pr comments

* fix code style

* ROIAlign refactor (intel-analytics#2960)

* ROIAlign refactor

* fix unit tests

* fix model load of maskrcnn (intel-analytics#2961)

* fix maskrcnn model load

* delete temp file

* fix maskrcnn tests

* support roialign backward (intel-analytics#2975)

* support roialign backward

* fix sparselinear unit test

* fix: bn nhwc error, the channel should be the last dim (#2981)

* refactor: move torch relevants unit tests to integration tests. (intel-analytics#2971)

* fix: enable integration accuracy tests (intel-analytics#2976)

* fix: softmax dnn backend wrong order of primitive (intel-analytics#2986)

* modify TextClassifier.scala (#2987)

* Add a method to merge nested StaticGraphs (intel-analytics#2985)

* NHWC support when running with MKL-DNN (#2989)

* support NHWC for MKLDNN

* fix unit tests

* Keras with MKL-DNN backend support (#2990)

* Update README.md

* Update README.md

* feat: add distri optimizer v2 (intel-analytics#2992)

* update error message in AllReduceParameter (#2997)

* update error message in AllReduceParameter

* use tensorflow proto jar (#2994)

* fix callBigDLFunc (intel-analytics#3002)

* Remove final for AbstractModule (intel-analytics#3001)

* DistriOptimizerV2 argument (intel-analytics#3003)

* call DistriOptimizerV2

* fix inception (intel-analytics#3010)

* fix top1 and treenn (intel-analytics#3011)

* remove final setExtraParameters (#3014)

* move pretrain in DistriOptimizerV2 (intel-analytics#3016)

* move getData

* rename

* remove time counting

* deprecate dlframe (intel-analytics#3012)

* deprecate dlframe

* fix throughput (#3017)

* fix throughput

* update

* add release doc for 0.10.0 (intel-analytics#3020)

* test examples by distrioptimizerv2 (intel-analytics#3007)

* enable scala examples by distrioptimizerv2

* update example's readme

* update integration test

* test python examples by distriOptimizerV2 (intel-analytics#3008)

* Test python examples by distriOptimizerV2

* deprecate nn.keras (intel-analytics#3013)

* deprecate nn.keras

* fix loss when minibatch size is different (intel-analytics#3021)

* fix loss

* fix ut

* fix style check (intel-analytics#3022)

* specify pyspark version (intel-analytics#3030)

* specify pyspark version

* add release doc for 0.11 (#3026)

* flip version to 0.12 (intel-analytics#3029)



* update

* fix KerasLayer new parameters() (#3034)

* Fix analytics zoo protobuf shading problem (intel-analytics#3033)

* change shade name and remove protobuf-java (already introduced by tf)

* remove protobuf

* add required dependencies (#3047)

* update doc (intel-analytics#3056)

* Updatedoc (#3060)

* Update install-from-pip.md

* [WIP] spark 3.0 (intel-analytics#3054)

* spark 3.0

* add spark3.0 deployment (intel-analytics#3061)

* add spark3.0 deployment

* add warning to remind Optimizer() deprecates (intel-analytics#3062)

* add warning to remind deprecates

* Update scala maven plugin (#3068)

* update scala maven plugin

* change to public (#3064)

* Add big model support (#3067)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* squeeze target dimension (corner case) in ClassNLLCriterion (intel-analytics#3072)

* fix target dimension match error

* update message (#3073)

* flip version to 0.13-snapshot (intel-analytics#3074)

* flip version to 0.13-snapshot

* Uncompressed Tensor  (intel-analytics#3079)

* support no compressing parameter

* address comments

* hotfix ClassNLLCriterion with cloned target (#3081)

* hotfix ClassNLLCriterion with cloned target

* Fix SerializationUtils clone issue of QuantizedTensor (intel-analytics#3088)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* update clone quantizedtensor

* update

* add OptimPredictorShutdownSpec UT in integration test (#3089)

* move integration UT to a general test script (intel-analytics#3094)

* back port master (intel-analytics#3096)

* set seed to avoid random error in PredictionServiceUT (intel-analytics#3097)

* Jdk11 support (intel-analytics#3098)

* update for jdk 11 support and doc

* add serializeUid (intel-analytics#3099)

* update doc (intel-analytics#3104)

* add doc for running in ide (intel-analytics#3106)

* fix callBigDLFunc return a Int while the true return value from java is a byte array. (intel-analytics#3111)

* add list of df support (intel-analytics#3113)

* Update readme (intel-analytics#3118)

* Update index.md

* add 0.12.2 release download (#3122)

* remove DLFrames (intel-analytics#3124)

* remove DLFrames

* update

* update

* update

* rm dlframe example from test script

* Add Utest about dividing zero (#3128)

* Add Utest about dividing zero

* add Utest and zero check of LocalData

* add Utest and zero check of LocalData

* change

* Add Utest about dividing zero

* fix test

* add python3 to Dockerfile (intel-analytics#3132)

* add python3 to Dockerfile

* update

* update jdk

* update

* make default DistriOptimizer as V2 (intel-analytics#3129)

* make default DistriOptimizer as V2

* update

* fix dlframe (intel-analytics#3133)

* DistriOptimizerV2 logger (intel-analytics#3135)

* DistriOptimizerV2 logger

* update

* fix style check

* validate epoch num

* move dlframe SharedParamsApater to AZ and roll back to OptimizerV1 (intel-analytics#3137)

* upgrade spark version (intel-analytics#3138)

* Update deploy-spark2.sh

* 0.13 release doc (#3144)

* upgrade log4j (intel-analytics#3141)

* flip0.14 (intel-analytics#3142)

* flip0.14

* update

* Update deploy-spark3.sh (#3145)

* update

* update

* update

* update

* fix make dist

* migrate path

* update

* update

Co-authored-by: zhangxiaoli73 <380761639@qq.com>
Co-authored-by: Jerry Wu <wzhongyuan@gmail.com>
Co-authored-by: Xin Qiu <qiuxin2012@users.noreply.github.com>
Co-authored-by: Yanzhang Wang <i8run15@gmail.com>
Co-authored-by: GenBrg <34305977+GenBrg@users.noreply.github.com>
Co-authored-by: LeicongLi <leicongli@gmail.com>
Co-authored-by: Emiliano Martinez <emimartinez.sanchez@gmail.com>
Co-authored-by: abdolence <abdulla.abd.m@gmail.com>
Co-authored-by: Enrique Garcia <engapa@gmail.com>
Co-authored-by: Louie Tsai <louie.tsai@intel.com>
Co-authored-by: yaochi <yaochitc@gmail.com>
Co-authored-by: Menooker <Menooker@users.noreply.github.com>
Co-authored-by: Menooker <myjisgreat@live.cn>
Co-authored-by: Firecrackerxox <mengceng.he@intel.com>
Co-authored-by: majing921201 <1834475657@qq.com>
Co-authored-by: jenniew <jenniewang123@gmail.com>
Co-authored-by: Xiao <lingxiao1989@gmail.com>
Co-authored-by: Firecrackerxox <he044646@sina.com>
Co-authored-by: Hui Li <lihuibinghan@sina.com>
Co-authored-by: Jason Dai <jason.dai@intel.com>
Co-authored-by: dding3 <ding.ding@intel.com>
Co-authored-by: Yang Wang <yang3.wang@intel.com>
Co-authored-by: Yina Chen <33650826+cyita@users.noreply.github.com>
Co-authored-by: Hangrui Cao <50705298+DiegoCao@users.noreply.github.com>
Co-authored-by: pinggao18 <44043817+pinggao18@users.noreply.github.com>
Le-Zheng added a commit to Le-Zheng/analytics-zoo that referenced this pull request Sep 22, 2021
* convert static graph to IR graph and build (intel-analytics#2711)

* add static graph to IR graph

* meet pr comments

* [Enhancement] - Enhance unig test to avoid dynamic resource allocation issue by docker (intel-analytics#2713)

* make the core number fixed

* fix local predictor

* add Trigger and/or python API (intel-analytics#2682)

* add spark 2.4 support (intel-analytics#2715)

* update sparse tensor's document (#2714)

* Reserve all state in OptimMethod when calling Optimizer.optimize() multiple times (#2648)

* reserve optimMethod for each worker

* add valdiation throughput

* cache variable previousOptim

* fix: move mkldnn computing to a single thread pool (intel-analytics#2724)

Because if we use the parent thread directly, there will be two bugs,
1. The child threads forked from parent thread will be bound to core 0
because of the affinity settings.
2. The native thread has some unknown thread local variables. So if
the parent thread exits and is recreated, such as the thread from
Executors.newFixedThreadPool. The whole app will be segment fault.
The parent thread means the main thread (Local Mode) or worker thread of
mapPartition (Distributed Mode).

* add ceilMode for Pooling & fix batchNorm evaluate (#2708)

* add ceilMode for Pooling & fix batchNorm evaluate

* add training status for dnn layer

* fix comments

* fix IRGraph init & Add regualizer (#2736)

* fix IRGraph init & Add regualizer

* meet review comments

* fix: update mkldnn version to v0.17 issues. (intel-analytics#2712)

There're two issues,

1. the padding tensor required. mkl-dnn will use a padding tensor which
    will use more memory, such as 4x1x28x28 to 4x8x28x28(avx2). It will
    pad to times of simd width.
2. the TensorMMap between DenseTensor and DnnTensor. Previous impl
    will allocate DnnTensor when model is created, which will cost too much
    space. So this patch will allocate it at runtime.

* add computshape for some layers and add skip primitives in DnnGraph (intel-analytics#2740)

* add computshape for some layer and add skip primitives in DnnGraph

* meet pr comments

* Improve documentation (intel-analytics#2745)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* include edge case to cover all the data types (#2742)

* layer auto fusion for dnn graph (intel-analytics#2746)

* add auto fusion in dnn graph

* refactor predict for dnn model (intel-analytics#2737)

* refactor predict for dnn model

* remove some unit tests (intel-analytics#2752)

* remove some conflict tests (#2753)

* Update documentation (intel-analytics#2749)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Fix Add operation error when type is Double importing Tensorflow graph (#2721)

* feature: add byte supports for DnnTensor (intel-analytics#2751)

* feat: add byte supports for DnnTensor

* [New Feature] Calculating Scales (#2750)

* [New Feature]Calculating Scales

* recursively update mask for container module (intel-analytics#2754)

* recursively update mask for container module

* [Enhancement] - Speed up BlasWrapper performance under MKL-DNN (intel-analytics#2748)

* add parallel in Blaswrapper

* refactor to support ssd

* meet pr comments

* fix logger serialize

* Loss Function docs improvement (intel-analytics#2757)

* Improve Loss Function docs v2

* change asInstanceOf to toDistirbuted in optimizer (#2755)

* change asInstanceOf to toDistirbuted

* change asInstanceOf to toDistirbuted

* convert scale in blas to dnn (#2758)

* convert scale in blas to dnn

* meet pr comment

* feat: reorder for int8 supports (#2756)

1. Because the new data type, we should add a new attribute called dataType
    to the `MemoryData`.
2. Because we should transfer the scales between FP32->int8 and Int8->FP32.
    we should add two new attributes called `mask` and `scales`.

* fix conversion accuracy (intel-analytics#2760)

*  fix accuracy for saved model

* exclude mkldnn model when conversion

* feature: layer wise supports of int8 (intel-analytics#2762)

Enable the int8 data type in layers, especially for convolutions.
So for a specific layer, it can accept a int8 input. If you want to the fp32
output, should add a reorder.

* feature: mkldnn int8 layer wise supports (intel-analytics#2759)

including 3 steps.

1. generate scales of model.
   need an api like `generateScalesWithMask` to generate the scales of
   fp32 model. and the model returned is an fp32 model too.
2. quantize the model
   the `quantize()` api will be compatible with the `bigquant`
   backend, which will set the quantize flag. And when doing compile,
   the quantized weight, output, input will be generated by mkldnn at
   runtime.
3. do the inference (forward).

* update readme for v1 training (intel-analytics#2763)

* update release doc for preparation (intel-analytics#2764)

* change some docs about mkldnn (intel-analytics#2765)

* add comments about mkldnn

* meet pr comments

* examples for int8 (intel-analytics#2761)

This is an example of how to use mkldnn int8. There're two steps, use
GenInt8Scales to generate the scales first and save the new model. And than you
can use the quantized model as usual.

* enable fustion by default (intel-analytics#2766)

* fix: the influence of default value of fusion (#2768)

* fix: use too much memory of mkldnn models (intel-analytics#2783)

* fix: inplace of input/output and weight dimension error (intel-analytics#2779)

Some layer's input and output use the same memory. We can't do forward in the
`calcScales`. Because at that time, the input has been changed, its scales maybe
not right. Such as,

Seqeuntail().add(Conv).add(ReLU)

it will do two steps, seq.forward(input) first. and when go into the ReLU, it
will do another forward, so the input will be the output. And scales will be
wrong.

For convolution's weight, the dimension always is 5, although the group number
is 1. But for dnn convolution, if there's no group, the weight's dimension
should be 4.

* fix: the blas wrapper has no scales (intel-analytics#2778)

* fix softmax (intel-analytics#2777)

* fix: performance regression on resnet50 (intel-analytics#2774)

the u8 to s8 or s8 to u8 needs no reorder on this case.

* fix log init (#2781)

* fix: dropout should init primitive (#2789)

* Docs update for spark 2.3, build 0.7 and deps exlude (intel-analytics#2671)

* flip to 0.9.0 (intel-analytics#2792)

* Improve Layer documentation v1 (#2767)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Loss Function docs improvement v1

* Improve Loss Function docs v2

* Improve Layers documentation

* Improve documentation on Activations

* minor fix

* Update a code section with python style on Metrics.md (intel-analytics#2665)

* [Fix] doc : some changes for scalaUserGuide and release links according to … (intel-analytics#2791)

* doc : some changes for scalaUserGuide and release links according to v0.8.0 release

* Update build-bigdl-core.md

* Update build-bigdl-core.md

* test: should compare the right grad input (intel-analytics#2794)

* fix the wrong error message (#2800)

* [New feature] Add attention layer and ffn layer (intel-analytics#2795)

* add attention layer

* add ffn layer and more unit tests

* refactor according to pr comments

* add SerializationTest

* fix unit tests

* add python api

* update readme with newly adopted mkl-dnn (#2803)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer (intel-analytics#2802)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer:
Add LARS optimizer: Layer-wise scaled. Also with utility functions to build a set of LARS optim for a container.

Bug fix: The gradient block id of AllReduceParameter is originally composed of {id}{pidTo}gradientBytes{pidFrom}. But the combination of {id}{pidTo} will cause ambiguity. e.g., "112" can be {1}{12} or {11}{2}. Now a "_" is added to separate id from pidTo

* refine documents, correctly set the lrSchedulerOwner bit

* format the added code

* make Lars inherit SGD

* rename Lars -> LarsSGD and reformat

* style changes

* bugfix - set mask for container (intel-analytics#2807)

* bugfix - set mask for container

* bugfix #2805: set dimension mask

* Update Graph.scala

* Update Graph.scala

* change set mask indicator's name

* rename set mask params

* [Enhancement]: Scala Reflection: get default value for constructor parameters (intel-analytics#2808)

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* resolve conflict

* resolve conflict

* code style check

* remove print

* fix typos

fix typos

* replace randomcropper with centercrop for better performance (#2818)

* fix: memory data hash code should contain data type (intel-analytics#2821)

* Optimize backward graph generation and CAddTable (intel-analytics#2817)

* Optimize backward graph generation and caddtable

* refine add table

* change api name

* add layer norm and expand size layers (#2819)

* add layer norm and expand size

* meet pr comments

* feat: enable global average pooling (intel-analytics#2823)

* feat: enable global average pooling

* test: add more unit tests

* Optimizers: use member variable in parent class

* Revert "Optimizers: use member variable in parent class"

This reverts commit 7e47204

* Dilation in MKL-DNN Convolution (intel-analytics#2815)

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* fix typos

fix typos

* make todo all uppercase

* fix: calculate arbitrary mask of scales (intel-analytics#2822)

* Use one AllReduceParameter for multi-optim method  training (intel-analytics#2814)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* change random seed in UT

* [New feature] add transformer layer (intel-analytics#2825)

* add transformer

* refactor class name

* use same embedding for translation

* fix pr comments

* [Bug Fix] Fix Issue 2734 (#2816)

* fix issue 2734

* fix issue 2734

* fix issue 2734

* [Refactor] Reflection Utilization (#2831)

* refactor reflection utils

* refactor reflection utils

* feat: MKLDNN LSTM unidirectional/bidirectional inference support (intel-analytics#2806)

* LSTM draft

* MKLDNN LSTM fixed MD

* added hiddenSize

* setMemoryData NativeData

* weights NativeData format set to ldigo, all 1 test passed

* fixed format any problem

* LSTM weights bias initialisation

* add LSTM2 in nn

* Bidirectional LSTM inference enabled

* modified Bidirectional test

* LSTMSpec input format conversion bug between bigdl and mkldnn fixed, not support random weights, bias

* fixed the last problem 1 3 2 4

* Three inference tests with randomly generated parameters

* Added comments and modified the LSTMSpec (tests using Equivalent.nearequals)

* Deleted nn/LSTM2. Renamed methods. Added a requirement in nn/TimeDistributed

* combined initMemoryDescs() into initFwdPrimitives()

* Add require for input size and hidden size matching if layers of LSTM is more than one

* Refactor RNN

* Add comment on gate order to mkldnn/RNN

* Add unidirectional multilayer test

* add comments/ modify UTs

* phase is not used anymore/ use isTraining() in stead

* operationWant enhanced/ weight init/ release() parameters()

* remove input format check and change some variables names

* input format check / throw exception print info / release code

* comment style and RNNSerialTest

* remove unnecessary comments

* Softmax -> SoftMax (#2837)

* bug fix for cmul (intel-analytics#2836)

* bug fix for cmul

* meet pr comments

* set new storage to weight and bias for weight fusion (intel-analytics#2839)

* Add parameter processor for LARS (#2832)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* implement lars whole layer gradient norm calculation

* change random seed in UT

* add limitation on "trust" of LARS, remove debug output

* reformat

* add tests in DirtriOptimizer for LARS

* reformat

* update parameters in UT

* update parameters in UT

* Add transformer to LM example (intel-analytics#2835)

* add transformer to LM example

* refactor dropout in Transformer

* meet pr comments

* feat: MKLDNN LSTM unidirectional/bidirectional backward support (#2840)

* MKLDNN LSTM backward support with accuracy testing

* fix: require consistent between shape and layout of mkldnn (intel-analytics#2824)

* fix: fusion for multi-group of convolution (intel-analytics#2826)

* fix: support int8 of jointable (#2827)

* fix: support int8 of jointable
* doc: add more docs

* fix: invokeAndWait2 should throw the exception in the tasks (intel-analytics#2843)

* fix acc bug & init dnn thread (intel-analytics#2841)

* support tnc and ntc conversion (intel-analytics#2844)

* support ntc in dnn layer (intel-analytics#2847)

* support ntc in dnn layer

* meet pr comments

* [WIP]Add beam search feature in transformer model (intel-analytics#2834)

* add beam search feature

* Update beam search feature and unit test

* add symbolToLogits function set check

* update clearState and add serial test

* add SequenceBeamSearch to python layers

* add createSequenceBeamSearch method to python api

* feat: add a property to disable omp thread affinity (intel-analytics#2849)

* fix: use treeset to calc topk to upgrade the performance of DetectionOutputSSD (intel-analytics#2853)

* fix: wrong affinity settings (intel-analytics#2857)

* update beam search feature for interface with transformer model (#2855)

* update beam search for padding value and cache structure

* update python API for beam search

* add comments and update python layer

* modify comments format

* modify comments format

* Support converting blas lstm to dnn lstm (#2846)

* convert from blas lstm to dnn lstm

* meet pr comments

* fix load lstm error bug (intel-analytics#2858)

* Add beam search in transformer (intel-analytics#2856)

* Add beam search in transformer

* meet pr comments

* fix: upgrade the performance of normalize (intel-analytics#2854)

* feat: add axis to softmax (intel-analytics#2859)

* add release doc for 0.9 (intel-analytics#2862)

* fix: update core ref to master (intel-analytics#2865)

* flip version to 0.10.0 (intel-analytics#2869)

* [Bug Fix] - Fix module version comparison  (intel-analytics#2871)

* update serialization

* update serialization

* convert IRgraph momentum to mkldnn (intel-analytics#2872)

* tutorial fix (intel-analytics#2879)

* feat: RoiAlign Forward (intel-analytics#2874)

* Add set input output format API in Python (intel-analytics#2880)

* add set input output format

* add static graph check

* feat: Feature Pyramid Networks Forward (intel-analytics#2870)

* fix memory leak for ir graph training (intel-analytics#2895)

* add gemm layer (#2882)

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add gemm layer

* add gemm layer

* add Shape layer (intel-analytics#2885)

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add Gather layer (intel-analytics#2897)

* add gather layer

* [New feature] Add maskhead (intel-analytics#2892)

* support for maskhead

* fix unit tests (intel-analytics#2905)

* modify  predict/predictClass function  (#2868)

* predictClass output modification

* predict/predictClass function modification in Beta Api

* predict/predictClass function modification

* predict/predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* [New feature] Add Boxhead (intel-analytics#2894)

* add boxhead

* add SerialTest

* meet pr comments

* fix: Add TopBlocks to Feature Pyramid Networks (FPN) (#2899)

* Add Mean Average Precision validation method (intel-analytics#2906)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* fix boxhead unit tests (#2912)

* python api nested list input and pooler python api (intel-analytics#2900)

* Auto memory management for MKLDNN (#2867)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* style fixes

* change _implicitMemoryOwner -> _this

* [New feature] Add region proposal (intel-analytics#2896)

* add Regionproposal

* [New feature] add maskrcnn (#2908)

* add maskrcnn

* fix mask head

* move maskrcnn to models

* add maskrcnn serialTest

* Add Onnx Supported Layers (intel-analytics#2902)

* remove duplicated layers

* Update RoiLabel class and add RoiImageFeatureToBatch (intel-analytics#2913)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* update RoiLabel, add RoiImageFeatureToBatch

* fix typo in class name

* updates by suggestions

* minor updates

* Move RoiMiniBatch to MTImageFeatureToBatch.scala

* mask in RoiLabel now have Floats not Bytes

* use IndexedSeq for RoiLabel

* style fix

* add isCrowd and origSize to final target table

* style fix

* isCrowd change to float, add doc

* add tests and bug fixes

* add util getting RoiLabels from ImageFeatures

* add util getting RoiLabels from Table

* comment out the tests

* rename utils in RoiLabel

* feat: MKLDNN GRU forward/backward support (#2893)

* Onnx support: modify unsqueeze function (#2910)

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* add maskutils (intel-analytics#2921)

* add maskutils

* update tests & docs

* fix typo in document

* Fix memory leaks on training (intel-analytics#2914)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* fix memory leak in batch norm

* style fixes

* change _implicitMemoryOwner -> _this

* release submat

* release opencv submats

* support samples with different size  to one mini batch (intel-analytics#2929)

* add to batch with resize

* meet comments

* support batch for mask head and pooler (intel-analytics#2926)

* support batch for mask head

* meet comments

* Onnx support: add a dim parameter to ops.Gather (intel-analytics#2920)

* add dim parameter to ops.Gather

* improve and simplify code

* improve and simplify code

* improve and simplify code

* improve and simplify code

* support batch for regionproposal (#2928)

* support batch for regionproposal

* enable gru blas-to-dnn conversion (intel-analytics#2930)

* Onnx support: add pos parameter to softmax (intel-analytics#2933)

* add pos parameter to softmax

* add pos parameter to softmax

* add pos parameter to softmax

* fix review problem

* fix review problem

* Add resize for segmentation (intel-analytics#2923)

* add resize for segmentation

* meet pr comments

* support batch input for boxhead (#2924)

* boxhead support batch input

* meet pr comments

* COCO SeqFile (intel-analytics#2927)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* ignore non-existing images

* updates based on GH comments

* ONNX Support (#2918)

* onnx dev

* add onnx loader

* clean up

* feat: add precision recall auc (#2941)

* feat: add precision recall auc

* add post processing for maskrcnn model (#2931)

* add mask postprocessing

* put image info to mask model

* fix TimeDistributedCriterion() lack of parameter of dimension issue (intel-analytics#2940)

* revert back api (intel-analytics#2943)

* fix: softmax and bn+scale fusion (intel-analytics#2937)

* feat: multi models support with MKL-DNN backend (intel-analytics#2936)

* feat: multi models support with MKL-DNN backend

* add COCO MAP (#2935)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* add COCO MAP

* revert merge conflict

* ignore non-existing images

* add IOU related API. MAP now parses RLEs

* BBox now inclusive

* updates based on GH comments

* add COCODataset.getImageById

* COCO topK default => -1, remove height: Int, width: Int in GroundTruthRLE

* update imageId2Image

* rename MAPObjectDetection utils, add cocoSegmentationAndBBox, refine formatting

* rename utils

* update documents

* check size of bbox & classes & scores & labels & iscrowd. Handle empty predictions

* add gt and target image size checking, add support for empty target bbox, add UT

* detection sorted before matching with GT. Optimize MAPResult merging. Add UT for merging

* COCO Seq file reader: grey to bgr (intel-analytics#2942)

* grey to bgr

* refactor isGrayScaleImage

* simplify grey scale image checking

* Add the flushing denormal values option on BigDL side (#2934)

* add no argument apply api for softmax (intel-analytics#2945)

* add no argument apply api for softmax

* add no argument apply api for softmax

* ONNX ResNet example (intel-analytics#2939)

* add onnx resnet example

* add doc for onnx

* add doc for onnx

* clean up

* add maskrcnn inference example (intel-analytics#2944)

* add maskrcnn inference example

* meet pr comments

* add model download url

* Update the RoiLabel and MTImageFeatureToBatch (intel-analytics#2925)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* Python MKLDNN examples for CNN(LeNet) and RNN(LSTM) (#2932)

* fix: takeSample only works for dnn backend and get one batch (intel-analytics#2947)

* fix: takeSample only works for dnn backend and get one batch

* edit doc (#2948)

* Rename filesToRoiImageFrame to filesToRoiImageFeatures (intel-analytics#2949)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* filesToRoiImageFrame -> filesToRoiImageFeatures, to public

* fix: move out setMklThreads of MklDnn (intel-analytics#2950)

* memory data cleanup (#2956)

* memory data cleanup

* Onnx support: RoiAlign and TopK parameter update (#2957)

* Topk add dim and increase parameter

* RoiAlign add max pooling mode

* add test cases

* add test cases

* remove masks requirements (intel-analytics#2959)

* fix: the squeeze should not be included in IRElement (intel-analytics#2962)

* enhance COCODataset (#2954)

* enhance COCODataset:
Add COCODataset.loadFromSeqFile
Add COCODataset.toImageFeatures
Add COCOImage.toTable

* rename and polish doc

* fix COCO serialize bug

* fix typo in function name

* typo fix (intel-analytics#2965)

* rename RoiImageFeatureToBatch APIs (#2964)

* RoiMiniBatch enhancement (#2953)

* SerializableIndexedSeq

* allow empty target & image size info

* rename RoiImageFeatureToBatch APIs

* set as private

* change back to array

* MTImageFeatureToBatch without labels

* handle iscrowd

* remove duplication in merge

* feat: add softmax backward (intel-analytics#2967)

* feat: add softmax backward

* fix: fuse bn scale and relu to bn. (intel-analytics#2966)

* fix: fuse bn scale and relu.

* fix mask unit tests (intel-analytics#2973)

* fix: nms stability when using treeset. (intel-analytics#2972)

* flip version to 0.11 (intel-analytics#2974)

* refactor anchor generator (#2963)

* refactor anchor generator

* meet pr comments

* fix code style

* ROIAlign refactor (intel-analytics#2960)

* ROIAlign refactor

* fix unit tests

* fix model load of maskrcnn (intel-analytics#2961)

* fix maskrcnn model load

* delete temp file

* fix maskrcnn tests

* support roialign backward (intel-analytics#2975)

* support roialign backward

* fix sparselinear unit test

* fix: bn nhwc error, the channel should be the last dim (#2981)

* refactor: move torch relevants unit tests to integration tests. (intel-analytics#2971)

* fix: enable integration accuracy tests (intel-analytics#2976)

* fix: softmax dnn backend wrong order of primitive (intel-analytics#2986)

* modify TextClassifier.scala (#2987)

* Add a method to merge nested StaticGraphs (intel-analytics#2985)

* NHWC support when running with MKL-DNN (#2989)

* support NHWC for MKLDNN

* fix unit tests

* Keras with MKL-DNN backend support (#2990)

* Update README.md

* Update README.md

* feat: add distri optimizer v2 (intel-analytics#2992)

* update error message in AllReduceParameter (#2997)

* update error message in AllReduceParameter

* use tensorflow proto jar (#2994)

* fix callBigDLFunc (intel-analytics#3002)

* Remove final for AbstractModule (intel-analytics#3001)

* DistriOptimizerV2 argument (intel-analytics#3003)

* call DistriOptimizerV2

* fix inception (intel-analytics#3010)

* fix top1 and treenn (intel-analytics#3011)

* remove final setExtraParameters (#3014)

* move pretrain in DistriOptimizerV2 (intel-analytics#3016)

* move getData

* rename

* remove time counting

* deprecate dlframe (intel-analytics#3012)

* deprecate dlframe

* fix throughput (#3017)

* fix throughput

* update

* add release doc for 0.10.0 (intel-analytics#3020)

* test examples by distrioptimizerv2 (intel-analytics#3007)

* enable scala examples by distrioptimizerv2

* update example's readme

* update integration test

* test python examples by distriOptimizerV2 (intel-analytics#3008)

* Test python examples by distriOptimizerV2

* deprecate nn.keras (intel-analytics#3013)

* deprecate nn.keras

* fix loss when minibatch size is different (intel-analytics#3021)

* fix loss

* fix ut

* fix style check (intel-analytics#3022)

* specify pyspark version (intel-analytics#3030)

* specify pyspark version

* add release doc for 0.11 (#3026)

* flip version to 0.12 (intel-analytics#3029)



* update

* fix KerasLayer new parameters() (#3034)

* Fix analytics zoo protobuf shading problem (intel-analytics#3033)

* change shade name and remove protobuf-java (already introduced by tf)

* remove protobuf

* add required dependencies (#3047)

* update doc (intel-analytics#3056)

* Updatedoc (#3060)

* Update install-from-pip.md

* [WIP] spark 3.0 (intel-analytics#3054)

* spark 3.0

* add spark3.0 deployment (intel-analytics#3061)

* add spark3.0 deployment

* add warning to remind Optimizer() deprecates (intel-analytics#3062)

* add warning to remind deprecates

* Update scala maven plugin (#3068)

* update scala maven plugin

* change to public (#3064)

* Add big model support (#3067)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* squeeze target dimension (corner case) in ClassNLLCriterion (intel-analytics#3072)

* fix target dimension match error

* update message (#3073)

* flip version to 0.13-snapshot (intel-analytics#3074)

* flip version to 0.13-snapshot

* Uncompressed Tensor  (intel-analytics#3079)

* support no compressing parameter

* address comments

* hotfix ClassNLLCriterion with cloned target (#3081)

* hotfix ClassNLLCriterion with cloned target

* Fix SerializationUtils clone issue of QuantizedTensor (intel-analytics#3088)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* update clone quantizedtensor

* update

* add OptimPredictorShutdownSpec UT in integration test (#3089)

* move integration UT to a general test script (intel-analytics#3094)

* back port master (intel-analytics#3096)

* set seed to avoid random error in PredictionServiceUT (intel-analytics#3097)

* Jdk11 support (intel-analytics#3098)

* update for jdk 11 support and doc

* add serializeUid (intel-analytics#3099)

* update doc (intel-analytics#3104)

* add doc for running in ide (intel-analytics#3106)

* fix callBigDLFunc return a Int while the true return value from java is a byte array. (intel-analytics#3111)

* add list of df support (intel-analytics#3113)

* Update readme (intel-analytics#3118)

* Update index.md

* add 0.12.2 release download (#3122)

* remove DLFrames (intel-analytics#3124)

* remove DLFrames

* update

* update

* update

* rm dlframe example from test script

* Add Utest about dividing zero (#3128)

* Add Utest about dividing zero

* add Utest and zero check of LocalData

* add Utest and zero check of LocalData

* change

* Add Utest about dividing zero

* fix test

* add python3 to Dockerfile (intel-analytics#3132)

* add python3 to Dockerfile

* update

* update jdk

* update

* make default DistriOptimizer as V2 (intel-analytics#3129)

* make default DistriOptimizer as V2

* update

* fix dlframe (intel-analytics#3133)

* DistriOptimizerV2 logger (intel-analytics#3135)

* DistriOptimizerV2 logger

* update

* fix style check

* validate epoch num

* move dlframe SharedParamsApater to AZ and roll back to OptimizerV1 (intel-analytics#3137)

* upgrade spark version (intel-analytics#3138)

* Update deploy-spark2.sh

* 0.13 release doc (#3144)

* upgrade log4j (intel-analytics#3141)

* flip0.14 (intel-analytics#3142)

* flip0.14

* update

* Update deploy-spark3.sh (#3145)

* update

* update

* update

* update

* fix make dist

* migrate path

* update

* update

Co-authored-by: zhangxiaoli73 <380761639@qq.com>
Co-authored-by: Jerry Wu <wzhongyuan@gmail.com>
Co-authored-by: Xin Qiu <qiuxin2012@users.noreply.github.com>
Co-authored-by: Yanzhang Wang <i8run15@gmail.com>
Co-authored-by: GenBrg <34305977+GenBrg@users.noreply.github.com>
Co-authored-by: LeicongLi <leicongli@gmail.com>
Co-authored-by: Emiliano Martinez <emimartinez.sanchez@gmail.com>
Co-authored-by: abdolence <abdulla.abd.m@gmail.com>
Co-authored-by: Enrique Garcia <engapa@gmail.com>
Co-authored-by: Louie Tsai <louie.tsai@intel.com>
Co-authored-by: yaochi <yaochitc@gmail.com>
Co-authored-by: Menooker <Menooker@users.noreply.github.com>
Co-authored-by: Menooker <myjisgreat@live.cn>
Co-authored-by: Firecrackerxox <mengceng.he@intel.com>
Co-authored-by: majing921201 <1834475657@qq.com>
Co-authored-by: jenniew <jenniewang123@gmail.com>
Co-authored-by: Xiao <lingxiao1989@gmail.com>
Co-authored-by: Firecrackerxox <he044646@sina.com>
Co-authored-by: Hui Li <lihuibinghan@sina.com>
Co-authored-by: Jason Dai <jason.dai@intel.com>
Co-authored-by: dding3 <ding.ding@intel.com>
Co-authored-by: Yang Wang <yang3.wang@intel.com>
Co-authored-by: Yina Chen <33650826+cyita@users.noreply.github.com>
Co-authored-by: Hangrui Cao <50705298+DiegoCao@users.noreply.github.com>
Co-authored-by: pinggao18 <44043817+pinggao18@users.noreply.github.com>
Le-Zheng added a commit to Le-Zheng/analytics-zoo that referenced this pull request Sep 22, 2021
* convert static graph to IR graph and build (intel-analytics#2711)

* add static graph to IR graph

* meet pr comments

* [Enhancement] - Enhance unig test to avoid dynamic resource allocation issue by docker (intel-analytics#2713)

* make the core number fixed

* fix local predictor

* add Trigger and/or python API (intel-analytics#2682)

* add spark 2.4 support (intel-analytics#2715)

* update sparse tensor's document (#2714)

* Reserve all state in OptimMethod when calling Optimizer.optimize() multiple times (#2648)

* reserve optimMethod for each worker

* add valdiation throughput

* cache variable previousOptim

* fix: move mkldnn computing to a single thread pool (intel-analytics#2724)

Because if we use the parent thread directly, there will be two bugs,
1. The child threads forked from parent thread will be bound to core 0
because of the affinity settings.
2. The native thread has some unknown thread local variables. So if
the parent thread exits and is recreated, such as the thread from
Executors.newFixedThreadPool. The whole app will be segment fault.
The parent thread means the main thread (Local Mode) or worker thread of
mapPartition (Distributed Mode).

* add ceilMode for Pooling & fix batchNorm evaluate (#2708)

* add ceilMode for Pooling & fix batchNorm evaluate

* add training status for dnn layer

* fix comments

* fix IRGraph init & Add regualizer (#2736)

* fix IRGraph init & Add regualizer

* meet review comments

* fix: update mkldnn version to v0.17 issues. (intel-analytics#2712)

There're two issues,

1. the padding tensor required. mkl-dnn will use a padding tensor which
    will use more memory, such as 4x1x28x28 to 4x8x28x28(avx2). It will
    pad to times of simd width.
2. the TensorMMap between DenseTensor and DnnTensor. Previous impl
    will allocate DnnTensor when model is created, which will cost too much
    space. So this patch will allocate it at runtime.

* add computshape for some layers and add skip primitives in DnnGraph (intel-analytics#2740)

* add computshape for some layer and add skip primitives in DnnGraph

* meet pr comments

* Improve documentation (intel-analytics#2745)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* include edge case to cover all the data types (#2742)

* layer auto fusion for dnn graph (intel-analytics#2746)

* add auto fusion in dnn graph

* refactor predict for dnn model (intel-analytics#2737)

* refactor predict for dnn model

* remove some unit tests (intel-analytics#2752)

* remove some conflict tests (#2753)

* Update documentation (intel-analytics#2749)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Fix Add operation error when type is Double importing Tensorflow graph (#2721)

* feature: add byte supports for DnnTensor (intel-analytics#2751)

* feat: add byte supports for DnnTensor

* [New Feature] Calculating Scales (#2750)

* [New Feature]Calculating Scales

* recursively update mask for container module (intel-analytics#2754)

* recursively update mask for container module

* [Enhancement] - Speed up BlasWrapper performance under MKL-DNN (intel-analytics#2748)

* add parallel in Blaswrapper

* refactor to support ssd

* meet pr comments

* fix logger serialize

* Loss Function docs improvement (intel-analytics#2757)

* Improve Loss Function docs v2

* change asInstanceOf to toDistirbuted in optimizer (#2755)

* change asInstanceOf to toDistirbuted

* change asInstanceOf to toDistirbuted

* convert scale in blas to dnn (#2758)

* convert scale in blas to dnn

* meet pr comment

* feat: reorder for int8 supports (#2756)

1. Because the new data type, we should add a new attribute called dataType
    to the `MemoryData`.
2. Because we should transfer the scales between FP32->int8 and Int8->FP32.
    we should add two new attributes called `mask` and `scales`.

* fix conversion accuracy (intel-analytics#2760)

*  fix accuracy for saved model

* exclude mkldnn model when conversion

* feature: layer wise supports of int8 (intel-analytics#2762)

Enable the int8 data type in layers, especially for convolutions.
So for a specific layer, it can accept a int8 input. If you want to the fp32
output, should add a reorder.

* feature: mkldnn int8 layer wise supports (intel-analytics#2759)

including 3 steps.

1. generate scales of model.
   need an api like `generateScalesWithMask` to generate the scales of
   fp32 model. and the model returned is an fp32 model too.
2. quantize the model
   the `quantize()` api will be compatible with the `bigquant`
   backend, which will set the quantize flag. And when doing compile,
   the quantized weight, output, input will be generated by mkldnn at
   runtime.
3. do the inference (forward).

* update readme for v1 training (intel-analytics#2763)

* update release doc for preparation (intel-analytics#2764)

* change some docs about mkldnn (intel-analytics#2765)

* add comments about mkldnn

* meet pr comments

* examples for int8 (intel-analytics#2761)

This is an example of how to use mkldnn int8. There're two steps, use
GenInt8Scales to generate the scales first and save the new model. And than you
can use the quantized model as usual.

* enable fustion by default (intel-analytics#2766)

* fix: the influence of default value of fusion (#2768)

* fix: use too much memory of mkldnn models (intel-analytics#2783)

* fix: inplace of input/output and weight dimension error (intel-analytics#2779)

Some layer's input and output use the same memory. We can't do forward in the
`calcScales`. Because at that time, the input has been changed, its scales maybe
not right. Such as,

Seqeuntail().add(Conv).add(ReLU)

it will do two steps, seq.forward(input) first. and when go into the ReLU, it
will do another forward, so the input will be the output. And scales will be
wrong.

For convolution's weight, the dimension always is 5, although the group number
is 1. But for dnn convolution, if there's no group, the weight's dimension
should be 4.

* fix: the blas wrapper has no scales (intel-analytics#2778)

* fix softmax (intel-analytics#2777)

* fix: performance regression on resnet50 (intel-analytics#2774)

the u8 to s8 or s8 to u8 needs no reorder on this case.

* fix log init (#2781)

* fix: dropout should init primitive (#2789)

* Docs update for spark 2.3, build 0.7 and deps exlude (intel-analytics#2671)

* flip to 0.9.0 (intel-analytics#2792)

* Improve Layer documentation v1 (#2767)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Loss Function docs improvement v1

* Improve Loss Function docs v2

* Improve Layers documentation

* Improve documentation on Activations

* minor fix

* Update a code section with python style on Metrics.md (intel-analytics#2665)

* [Fix] doc : some changes for scalaUserGuide and release links according to … (intel-analytics#2791)

* doc : some changes for scalaUserGuide and release links according to v0.8.0 release

* Update build-bigdl-core.md

* Update build-bigdl-core.md

* test: should compare the right grad input (intel-analytics#2794)

* fix the wrong error message (#2800)

* [New feature] Add attention layer and ffn layer (intel-analytics#2795)

* add attention layer

* add ffn layer and more unit tests

* refactor according to pr comments

* add SerializationTest

* fix unit tests

* add python api

* update readme with newly adopted mkl-dnn (#2803)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer (intel-analytics#2802)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer:
Add LARS optimizer: Layer-wise scaled. Also with utility functions to build a set of LARS optim for a container.

Bug fix: The gradient block id of AllReduceParameter is originally composed of {id}{pidTo}gradientBytes{pidFrom}. But the combination of {id}{pidTo} will cause ambiguity. e.g., "112" can be {1}{12} or {11}{2}. Now a "_" is added to separate id from pidTo

* refine documents, correctly set the lrSchedulerOwner bit

* format the added code

* make Lars inherit SGD

* rename Lars -> LarsSGD and reformat

* style changes

* bugfix - set mask for container (intel-analytics#2807)

* bugfix - set mask for container

* bugfix #2805: set dimension mask

* Update Graph.scala

* Update Graph.scala

* change set mask indicator's name

* rename set mask params

* [Enhancement]: Scala Reflection: get default value for constructor parameters (intel-analytics#2808)

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* resolve conflict

* resolve conflict

* code style check

* remove print

* fix typos

fix typos

* replace randomcropper with centercrop for better performance (#2818)

* fix: memory data hash code should contain data type (intel-analytics#2821)

* Optimize backward graph generation and CAddTable (intel-analytics#2817)

* Optimize backward graph generation and caddtable

* refine add table

* change api name

* add layer norm and expand size layers (#2819)

* add layer norm and expand size

* meet pr comments

* feat: enable global average pooling (intel-analytics#2823)

* feat: enable global average pooling

* test: add more unit tests

* Optimizers: use member variable in parent class

* Revert "Optimizers: use member variable in parent class"

This reverts commit 7e47204

* Dilation in MKL-DNN Convolution (intel-analytics#2815)

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* fix typos

fix typos

* make todo all uppercase

* fix: calculate arbitrary mask of scales (intel-analytics#2822)

* Use one AllReduceParameter for multi-optim method  training (intel-analytics#2814)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* change random seed in UT

* [New feature] add transformer layer (intel-analytics#2825)

* add transformer

* refactor class name

* use same embedding for translation

* fix pr comments

* [Bug Fix] Fix Issue 2734 (#2816)

* fix issue 2734

* fix issue 2734

* fix issue 2734

* [Refactor] Reflection Utilization (#2831)

* refactor reflection utils

* refactor reflection utils

* feat: MKLDNN LSTM unidirectional/bidirectional inference support (intel-analytics#2806)

* LSTM draft

* MKLDNN LSTM fixed MD

* added hiddenSize

* setMemoryData NativeData

* weights NativeData format set to ldigo, all 1 test passed

* fixed format any problem

* LSTM weights bias initialisation

* add LSTM2 in nn

* Bidirectional LSTM inference enabled

* modified Bidirectional test

* LSTMSpec input format conversion bug between bigdl and mkldnn fixed, not support random weights, bias

* fixed the last problem 1 3 2 4

* Three inference tests with randomly generated parameters

* Added comments and modified the LSTMSpec (tests using Equivalent.nearequals)

* Deleted nn/LSTM2. Renamed methods. Added a requirement in nn/TimeDistributed

* combined initMemoryDescs() into initFwdPrimitives()

* Add require for input size and hidden size matching if layers of LSTM is more than one

* Refactor RNN

* Add comment on gate order to mkldnn/RNN

* Add unidirectional multilayer test

* add comments/ modify UTs

* phase is not used anymore/ use isTraining() in stead

* operationWant enhanced/ weight init/ release() parameters()

* remove input format check and change some variables names

* input format check / throw exception print info / release code

* comment style and RNNSerialTest

* remove unnecessary comments

* Softmax -> SoftMax (#2837)

* bug fix for cmul (intel-analytics#2836)

* bug fix for cmul

* meet pr comments

* set new storage to weight and bias for weight fusion (intel-analytics#2839)

* Add parameter processor for LARS (#2832)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* implement lars whole layer gradient norm calculation

* change random seed in UT

* add limitation on "trust" of LARS, remove debug output

* reformat

* add tests in DirtriOptimizer for LARS

* reformat

* update parameters in UT

* update parameters in UT

* Add transformer to LM example (intel-analytics#2835)

* add transformer to LM example

* refactor dropout in Transformer

* meet pr comments

* feat: MKLDNN LSTM unidirectional/bidirectional backward support (#2840)

* MKLDNN LSTM backward support with accuracy testing

* fix: require consistent between shape and layout of mkldnn (intel-analytics#2824)

* fix: fusion for multi-group of convolution (intel-analytics#2826)

* fix: support int8 of jointable (#2827)

* fix: support int8 of jointable
* doc: add more docs

* fix: invokeAndWait2 should throw the exception in the tasks (intel-analytics#2843)

* fix acc bug & init dnn thread (intel-analytics#2841)

* support tnc and ntc conversion (intel-analytics#2844)

* support ntc in dnn layer (intel-analytics#2847)

* support ntc in dnn layer

* meet pr comments

* [WIP]Add beam search feature in transformer model (intel-analytics#2834)

* add beam search feature

* Update beam search feature and unit test

* add symbolToLogits function set check

* update clearState and add serial test

* add SequenceBeamSearch to python layers

* add createSequenceBeamSearch method to python api

* feat: add a property to disable omp thread affinity (intel-analytics#2849)

* fix: use treeset to calc topk to upgrade the performance of DetectionOutputSSD (intel-analytics#2853)

* fix: wrong affinity settings (intel-analytics#2857)

* update beam search feature for interface with transformer model (#2855)

* update beam search for padding value and cache structure

* update python API for beam search

* add comments and update python layer

* modify comments format

* modify comments format

* Support converting blas lstm to dnn lstm (#2846)

* convert from blas lstm to dnn lstm

* meet pr comments

* fix load lstm error bug (intel-analytics#2858)

* Add beam search in transformer (intel-analytics#2856)

* Add beam search in transformer

* meet pr comments

* fix: upgrade the performance of normalize (intel-analytics#2854)

* feat: add axis to softmax (intel-analytics#2859)

* add release doc for 0.9 (intel-analytics#2862)

* fix: update core ref to master (intel-analytics#2865)

* flip version to 0.10.0 (intel-analytics#2869)

* [Bug Fix] - Fix module version comparison  (intel-analytics#2871)

* update serialization

* update serialization

* convert IRgraph momentum to mkldnn (intel-analytics#2872)

* tutorial fix (intel-analytics#2879)

* feat: RoiAlign Forward (intel-analytics#2874)

* Add set input output format API in Python (intel-analytics#2880)

* add set input output format

* add static graph check

* feat: Feature Pyramid Networks Forward (intel-analytics#2870)

* fix memory leak for ir graph training (intel-analytics#2895)

* add gemm layer (#2882)

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add gemm layer

* add gemm layer

* add Shape layer (intel-analytics#2885)

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add Gather layer (intel-analytics#2897)

* add gather layer

* [New feature] Add maskhead (intel-analytics#2892)

* support for maskhead

* fix unit tests (intel-analytics#2905)

* modify  predict/predictClass function  (#2868)

* predictClass output modification

* predict/predictClass function modification in Beta Api

* predict/predictClass function modification

* predict/predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* [New feature] Add Boxhead (intel-analytics#2894)

* add boxhead

* add SerialTest

* meet pr comments

* fix: Add TopBlocks to Feature Pyramid Networks (FPN) (#2899)

* Add Mean Average Precision validation method (intel-analytics#2906)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* fix boxhead unit tests (#2912)

* python api nested list input and pooler python api (intel-analytics#2900)

* Auto memory management for MKLDNN (#2867)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* style fixes

* change _implicitMemoryOwner -> _this

* [New feature] Add region proposal (intel-analytics#2896)

* add Regionproposal

* [New feature] add maskrcnn (#2908)

* add maskrcnn

* fix mask head

* move maskrcnn to models

* add maskrcnn serialTest

* Add Onnx Supported Layers (intel-analytics#2902)

* remove duplicated layers

* Update RoiLabel class and add RoiImageFeatureToBatch (intel-analytics#2913)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* update RoiLabel, add RoiImageFeatureToBatch

* fix typo in class name

* updates by suggestions

* minor updates

* Move RoiMiniBatch to MTImageFeatureToBatch.scala

* mask in RoiLabel now have Floats not Bytes

* use IndexedSeq for RoiLabel

* style fix

* add isCrowd and origSize to final target table

* style fix

* isCrowd change to float, add doc

* add tests and bug fixes

* add util getting RoiLabels from ImageFeatures

* add util getting RoiLabels from Table

* comment out the tests

* rename utils in RoiLabel

* feat: MKLDNN GRU forward/backward support (#2893)

* Onnx support: modify unsqueeze function (#2910)

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* add maskutils (intel-analytics#2921)

* add maskutils

* update tests & docs

* fix typo in document

* Fix memory leaks on training (intel-analytics#2914)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* fix memory leak in batch norm

* style fixes

* change _implicitMemoryOwner -> _this

* release submat

* release opencv submats

* support samples with different size  to one mini batch (intel-analytics#2929)

* add to batch with resize

* meet comments

* support batch for mask head and pooler (intel-analytics#2926)

* support batch for mask head

* meet comments

* Onnx support: add a dim parameter to ops.Gather (intel-analytics#2920)

* add dim parameter to ops.Gather

* improve and simplify code

* improve and simplify code

* improve and simplify code

* improve and simplify code

* support batch for regionproposal (#2928)

* support batch for regionproposal

* enable gru blas-to-dnn conversion (intel-analytics#2930)

* Onnx support: add pos parameter to softmax (intel-analytics#2933)

* add pos parameter to softmax

* add pos parameter to softmax

* add pos parameter to softmax

* fix review problem

* fix review problem

* Add resize for segmentation (intel-analytics#2923)

* add resize for segmentation

* meet pr comments

* support batch input for boxhead (#2924)

* boxhead support batch input

* meet pr comments

* COCO SeqFile (intel-analytics#2927)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* ignore non-existing images

* updates based on GH comments

* ONNX Support (#2918)

* onnx dev

* add onnx loader

* clean up

* feat: add precision recall auc (#2941)

* feat: add precision recall auc

* add post processing for maskrcnn model (#2931)

* add mask postprocessing

* put image info to mask model

* fix TimeDistributedCriterion() lack of parameter of dimension issue (intel-analytics#2940)

* revert back api (intel-analytics#2943)

* fix: softmax and bn+scale fusion (intel-analytics#2937)

* feat: multi models support with MKL-DNN backend (intel-analytics#2936)

* feat: multi models support with MKL-DNN backend

* add COCO MAP (#2935)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* add COCO MAP

* revert merge conflict

* ignore non-existing images

* add IOU related API. MAP now parses RLEs

* BBox now inclusive

* updates based on GH comments

* add COCODataset.getImageById

* COCO topK default => -1, remove height: Int, width: Int in GroundTruthRLE

* update imageId2Image

* rename MAPObjectDetection utils, add cocoSegmentationAndBBox, refine formatting

* rename utils

* update documents

* check size of bbox & classes & scores & labels & iscrowd. Handle empty predictions

* add gt and target image size checking, add support for empty target bbox, add UT

* detection sorted before matching with GT. Optimize MAPResult merging. Add UT for merging

* COCO Seq file reader: grey to bgr (intel-analytics#2942)

* grey to bgr

* refactor isGrayScaleImage

* simplify grey scale image checking

* Add the flushing denormal values option on BigDL side (#2934)

* add no argument apply api for softmax (intel-analytics#2945)

* add no argument apply api for softmax

* add no argument apply api for softmax

* ONNX ResNet example (intel-analytics#2939)

* add onnx resnet example

* add doc for onnx

* add doc for onnx

* clean up

* add maskrcnn inference example (intel-analytics#2944)

* add maskrcnn inference example

* meet pr comments

* add model download url

* Update the RoiLabel and MTImageFeatureToBatch (intel-analytics#2925)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* Python MKLDNN examples for CNN(LeNet) and RNN(LSTM) (#2932)

* fix: takeSample only works for dnn backend and get one batch (intel-analytics#2947)

* fix: takeSample only works for dnn backend and get one batch

* edit doc (#2948)

* Rename filesToRoiImageFrame to filesToRoiImageFeatures (intel-analytics#2949)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* filesToRoiImageFrame -> filesToRoiImageFeatures, to public

* fix: move out setMklThreads of MklDnn (intel-analytics#2950)

* memory data cleanup (#2956)

* memory data cleanup

* Onnx support: RoiAlign and TopK parameter update (#2957)

* Topk add dim and increase parameter

* RoiAlign add max pooling mode

* add test cases

* add test cases

* remove masks requirements (intel-analytics#2959)

* fix: the squeeze should not be included in IRElement (intel-analytics#2962)

* enhance COCODataset (#2954)

* enhance COCODataset:
Add COCODataset.loadFromSeqFile
Add COCODataset.toImageFeatures
Add COCOImage.toTable

* rename and polish doc

* fix COCO serialize bug

* fix typo in function name

* typo fix (intel-analytics#2965)

* rename RoiImageFeatureToBatch APIs (#2964)

* RoiMiniBatch enhancement (#2953)

* SerializableIndexedSeq

* allow empty target & image size info

* rename RoiImageFeatureToBatch APIs

* set as private

* change back to array

* MTImageFeatureToBatch without labels

* handle iscrowd

* remove duplication in merge

* feat: add softmax backward (intel-analytics#2967)

* feat: add softmax backward

* fix: fuse bn scale and relu to bn. (intel-analytics#2966)

* fix: fuse bn scale and relu.

* fix mask unit tests (intel-analytics#2973)

* fix: nms stability when using treeset. (intel-analytics#2972)

* flip version to 0.11 (intel-analytics#2974)

* refactor anchor generator (#2963)

* refactor anchor generator

* meet pr comments

* fix code style

* ROIAlign refactor (intel-analytics#2960)

* ROIAlign refactor

* fix unit tests

* fix model load of maskrcnn (intel-analytics#2961)

* fix maskrcnn model load

* delete temp file

* fix maskrcnn tests

* support roialign backward (intel-analytics#2975)

* support roialign backward

* fix sparselinear unit test

* fix: bn nhwc error, the channel should be the last dim (#2981)

* refactor: move torch relevants unit tests to integration tests. (intel-analytics#2971)

* fix: enable integration accuracy tests (intel-analytics#2976)

* fix: softmax dnn backend wrong order of primitive (intel-analytics#2986)

* modify TextClassifier.scala (#2987)

* Add a method to merge nested StaticGraphs (intel-analytics#2985)

* NHWC support when running with MKL-DNN (#2989)

* support NHWC for MKLDNN

* fix unit tests

* Keras with MKL-DNN backend support (#2990)

* Update README.md

* Update README.md

* feat: add distri optimizer v2 (intel-analytics#2992)

* update error message in AllReduceParameter (#2997)

* update error message in AllReduceParameter

* use tensorflow proto jar (#2994)

* fix callBigDLFunc (intel-analytics#3002)

* Remove final for AbstractModule (intel-analytics#3001)

* DistriOptimizerV2 argument (intel-analytics#3003)

* call DistriOptimizerV2

* fix inception (intel-analytics#3010)

* fix top1 and treenn (intel-analytics#3011)

* remove final setExtraParameters (#3014)

* move pretrain in DistriOptimizerV2 (intel-analytics#3016)

* move getData

* rename

* remove time counting

* deprecate dlframe (intel-analytics#3012)

* deprecate dlframe

* fix throughput (#3017)

* fix throughput

* update

* add release doc for 0.10.0 (intel-analytics#3020)

* test examples by distrioptimizerv2 (intel-analytics#3007)

* enable scala examples by distrioptimizerv2

* update example's readme

* update integration test

* test python examples by distriOptimizerV2 (intel-analytics#3008)

* Test python examples by distriOptimizerV2

* deprecate nn.keras (intel-analytics#3013)

* deprecate nn.keras

* fix loss when minibatch size is different (intel-analytics#3021)

* fix loss

* fix ut

* fix style check (intel-analytics#3022)

* specify pyspark version (intel-analytics#3030)

* specify pyspark version

* add release doc for 0.11 (#3026)

* flip version to 0.12 (intel-analytics#3029)



* update

* fix KerasLayer new parameters() (#3034)

* Fix analytics zoo protobuf shading problem (intel-analytics#3033)

* change shade name and remove protobuf-java (already introduced by tf)

* remove protobuf

* add required dependencies (#3047)

* update doc (intel-analytics#3056)

* Updatedoc (#3060)

* Update install-from-pip.md

* [WIP] spark 3.0 (intel-analytics#3054)

* spark 3.0

* add spark3.0 deployment (intel-analytics#3061)

* add spark3.0 deployment

* add warning to remind Optimizer() deprecates (intel-analytics#3062)

* add warning to remind deprecates

* Update scala maven plugin (#3068)

* update scala maven plugin

* change to public (#3064)

* Add big model support (#3067)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* squeeze target dimension (corner case) in ClassNLLCriterion (intel-analytics#3072)

* fix target dimension match error

* update message (#3073)

* flip version to 0.13-snapshot (intel-analytics#3074)

* flip version to 0.13-snapshot

* Uncompressed Tensor  (intel-analytics#3079)

* support no compressing parameter

* address comments

* hotfix ClassNLLCriterion with cloned target (#3081)

* hotfix ClassNLLCriterion with cloned target

* Fix SerializationUtils clone issue of QuantizedTensor (intel-analytics#3088)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* update clone quantizedtensor

* update

* add OptimPredictorShutdownSpec UT in integration test (#3089)

* move integration UT to a general test script (intel-analytics#3094)

* back port master (intel-analytics#3096)

* set seed to avoid random error in PredictionServiceUT (intel-analytics#3097)

* Jdk11 support (intel-analytics#3098)

* update for jdk 11 support and doc

* add serializeUid (intel-analytics#3099)

* update doc (intel-analytics#3104)

* add doc for running in ide (intel-analytics#3106)

* fix callBigDLFunc return a Int while the true return value from java is a byte array. (intel-analytics#3111)

* add list of df support (intel-analytics#3113)

* Update readme (intel-analytics#3118)

* Update index.md

* add 0.12.2 release download (#3122)

* remove DLFrames (intel-analytics#3124)

* remove DLFrames

* update

* update

* update

* rm dlframe example from test script

* Add Utest about dividing zero (#3128)

* Add Utest about dividing zero

* add Utest and zero check of LocalData

* add Utest and zero check of LocalData

* change

* Add Utest about dividing zero

* fix test

* add python3 to Dockerfile (intel-analytics#3132)

* add python3 to Dockerfile

* update

* update jdk

* update

* make default DistriOptimizer as V2 (intel-analytics#3129)

* make default DistriOptimizer as V2

* update

* fix dlframe (intel-analytics#3133)

* DistriOptimizerV2 logger (intel-analytics#3135)

* DistriOptimizerV2 logger

* update

* fix style check

* validate epoch num

* move dlframe SharedParamsApater to AZ and roll back to OptimizerV1 (intel-analytics#3137)

* upgrade spark version (intel-analytics#3138)

* Update deploy-spark2.sh

* 0.13 release doc (#3144)

* upgrade log4j (intel-analytics#3141)

* flip0.14 (intel-analytics#3142)

* flip0.14

* update

* Update deploy-spark3.sh (#3145)

* update

* update

* update

* update

* fix make dist

* migrate path

* update

* update

Co-authored-by: zhangxiaoli73 <380761639@qq.com>
Co-authored-by: Jerry Wu <wzhongyuan@gmail.com>
Co-authored-by: Xin Qiu <qiuxin2012@users.noreply.github.com>
Co-authored-by: Yanzhang Wang <i8run15@gmail.com>
Co-authored-by: GenBrg <34305977+GenBrg@users.noreply.github.com>
Co-authored-by: LeicongLi <leicongli@gmail.com>
Co-authored-by: Emiliano Martinez <emimartinez.sanchez@gmail.com>
Co-authored-by: abdolence <abdulla.abd.m@gmail.com>
Co-authored-by: Enrique Garcia <engapa@gmail.com>
Co-authored-by: Louie Tsai <louie.tsai@intel.com>
Co-authored-by: yaochi <yaochitc@gmail.com>
Co-authored-by: Menooker <Menooker@users.noreply.github.com>
Co-authored-by: Menooker <myjisgreat@live.cn>
Co-authored-by: Firecrackerxox <mengceng.he@intel.com>
Co-authored-by: majing921201 <1834475657@qq.com>
Co-authored-by: jenniew <jenniewang123@gmail.com>
Co-authored-by: Xiao <lingxiao1989@gmail.com>
Co-authored-by: Firecrackerxox <he044646@sina.com>
Co-authored-by: Hui Li <lihuibinghan@sina.com>
Co-authored-by: Jason Dai <jason.dai@intel.com>
Co-authored-by: dding3 <ding.ding@intel.com>
Co-authored-by: Yang Wang <yang3.wang@intel.com>
Co-authored-by: Yina Chen <33650826+cyita@users.noreply.github.com>
Co-authored-by: Hangrui Cao <50705298+DiegoCao@users.noreply.github.com>
Co-authored-by: pinggao18 <44043817+pinggao18@users.noreply.github.com>
Le-Zheng added a commit to Le-Zheng/analytics-zoo that referenced this pull request Sep 22, 2021
* convert static graph to IR graph and build (intel-analytics#2711)

* add static graph to IR graph

* meet pr comments

* [Enhancement] - Enhance unig test to avoid dynamic resource allocation issue by docker (intel-analytics#2713)

* make the core number fixed

* fix local predictor

* add Trigger and/or python API (intel-analytics#2682)

* add spark 2.4 support (intel-analytics#2715)

* update sparse tensor's document (#2714)

* Reserve all state in OptimMethod when calling Optimizer.optimize() multiple times (#2648)

* reserve optimMethod for each worker

* add valdiation throughput

* cache variable previousOptim

* fix: move mkldnn computing to a single thread pool (intel-analytics#2724)

Because if we use the parent thread directly, there will be two bugs,
1. The child threads forked from parent thread will be bound to core 0
because of the affinity settings.
2. The native thread has some unknown thread local variables. So if
the parent thread exits and is recreated, such as the thread from
Executors.newFixedThreadPool. The whole app will be segment fault.
The parent thread means the main thread (Local Mode) or worker thread of
mapPartition (Distributed Mode).

* add ceilMode for Pooling & fix batchNorm evaluate (#2708)

* add ceilMode for Pooling & fix batchNorm evaluate

* add training status for dnn layer

* fix comments

* fix IRGraph init & Add regualizer (#2736)

* fix IRGraph init & Add regualizer

* meet review comments

* fix: update mkldnn version to v0.17 issues. (intel-analytics#2712)

There're two issues,

1. the padding tensor required. mkl-dnn will use a padding tensor which
    will use more memory, such as 4x1x28x28 to 4x8x28x28(avx2). It will
    pad to times of simd width.
2. the TensorMMap between DenseTensor and DnnTensor. Previous impl
    will allocate DnnTensor when model is created, which will cost too much
    space. So this patch will allocate it at runtime.

* add computshape for some layers and add skip primitives in DnnGraph (intel-analytics#2740)

* add computshape for some layer and add skip primitives in DnnGraph

* meet pr comments

* Improve documentation (intel-analytics#2745)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* include edge case to cover all the data types (#2742)

* layer auto fusion for dnn graph (intel-analytics#2746)

* add auto fusion in dnn graph

* refactor predict for dnn model (intel-analytics#2737)

* refactor predict for dnn model

* remove some unit tests (intel-analytics#2752)

* remove some conflict tests (#2753)

* Update documentation (intel-analytics#2749)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Fix Add operation error when type is Double importing Tensorflow graph (#2721)

* feature: add byte supports for DnnTensor (intel-analytics#2751)

* feat: add byte supports for DnnTensor

* [New Feature] Calculating Scales (#2750)

* [New Feature]Calculating Scales

* recursively update mask for container module (intel-analytics#2754)

* recursively update mask for container module

* [Enhancement] - Speed up BlasWrapper performance under MKL-DNN (intel-analytics#2748)

* add parallel in Blaswrapper

* refactor to support ssd

* meet pr comments

* fix logger serialize

* Loss Function docs improvement (intel-analytics#2757)

* Improve Loss Function docs v2

* change asInstanceOf to toDistirbuted in optimizer (#2755)

* change asInstanceOf to toDistirbuted

* change asInstanceOf to toDistirbuted

* convert scale in blas to dnn (#2758)

* convert scale in blas to dnn

* meet pr comment

* feat: reorder for int8 supports (#2756)

1. Because the new data type, we should add a new attribute called dataType
    to the `MemoryData`.
2. Because we should transfer the scales between FP32->int8 and Int8->FP32.
    we should add two new attributes called `mask` and `scales`.

* fix conversion accuracy (intel-analytics#2760)

*  fix accuracy for saved model

* exclude mkldnn model when conversion

* feature: layer wise supports of int8 (intel-analytics#2762)

Enable the int8 data type in layers, especially for convolutions.
So for a specific layer, it can accept a int8 input. If you want to the fp32
output, should add a reorder.

* feature: mkldnn int8 layer wise supports (intel-analytics#2759)

including 3 steps.

1. generate scales of model.
   need an api like `generateScalesWithMask` to generate the scales of
   fp32 model. and the model returned is an fp32 model too.
2. quantize the model
   the `quantize()` api will be compatible with the `bigquant`
   backend, which will set the quantize flag. And when doing compile,
   the quantized weight, output, input will be generated by mkldnn at
   runtime.
3. do the inference (forward).

* update readme for v1 training (intel-analytics#2763)

* update release doc for preparation (intel-analytics#2764)

* change some docs about mkldnn (intel-analytics#2765)

* add comments about mkldnn

* meet pr comments

* examples for int8 (intel-analytics#2761)

This is an example of how to use mkldnn int8. There're two steps, use
GenInt8Scales to generate the scales first and save the new model. And than you
can use the quantized model as usual.

* enable fustion by default (intel-analytics#2766)

* fix: the influence of default value of fusion (#2768)

* fix: use too much memory of mkldnn models (intel-analytics#2783)

* fix: inplace of input/output and weight dimension error (intel-analytics#2779)

Some layer's input and output use the same memory. We can't do forward in the
`calcScales`. Because at that time, the input has been changed, its scales maybe
not right. Such as,

Seqeuntail().add(Conv).add(ReLU)

it will do two steps, seq.forward(input) first. and when go into the ReLU, it
will do another forward, so the input will be the output. And scales will be
wrong.

For convolution's weight, the dimension always is 5, although the group number
is 1. But for dnn convolution, if there's no group, the weight's dimension
should be 4.

* fix: the blas wrapper has no scales (intel-analytics#2778)

* fix softmax (intel-analytics#2777)

* fix: performance regression on resnet50 (intel-analytics#2774)

the u8 to s8 or s8 to u8 needs no reorder on this case.

* fix log init (#2781)

* fix: dropout should init primitive (#2789)

* Docs update for spark 2.3, build 0.7 and deps exlude (intel-analytics#2671)

* flip to 0.9.0 (intel-analytics#2792)

* Improve Layer documentation v1 (#2767)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Loss Function docs improvement v1

* Improve Loss Function docs v2

* Improve Layers documentation

* Improve documentation on Activations

* minor fix

* Update a code section with python style on Metrics.md (intel-analytics#2665)

* [Fix] doc : some changes for scalaUserGuide and release links according to … (intel-analytics#2791)

* doc : some changes for scalaUserGuide and release links according to v0.8.0 release

* Update build-bigdl-core.md

* Update build-bigdl-core.md

* test: should compare the right grad input (intel-analytics#2794)

* fix the wrong error message (#2800)

* [New feature] Add attention layer and ffn layer (intel-analytics#2795)

* add attention layer

* add ffn layer and more unit tests

* refactor according to pr comments

* add SerializationTest

* fix unit tests

* add python api

* update readme with newly adopted mkl-dnn (#2803)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer (intel-analytics#2802)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer:
Add LARS optimizer: Layer-wise scaled. Also with utility functions to build a set of LARS optim for a container.

Bug fix: The gradient block id of AllReduceParameter is originally composed of {id}{pidTo}gradientBytes{pidFrom}. But the combination of {id}{pidTo} will cause ambiguity. e.g., "112" can be {1}{12} or {11}{2}. Now a "_" is added to separate id from pidTo

* refine documents, correctly set the lrSchedulerOwner bit

* format the added code

* make Lars inherit SGD

* rename Lars -> LarsSGD and reformat

* style changes

* bugfix - set mask for container (intel-analytics#2807)

* bugfix - set mask for container

* bugfix #2805: set dimension mask

* Update Graph.scala

* Update Graph.scala

* change set mask indicator's name

* rename set mask params

* [Enhancement]: Scala Reflection: get default value for constructor parameters (intel-analytics#2808)

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* resolve conflict

* resolve conflict

* code style check

* remove print

* fix typos

fix typos

* replace randomcropper with centercrop for better performance (#2818)

* fix: memory data hash code should contain data type (intel-analytics#2821)

* Optimize backward graph generation and CAddTable (intel-analytics#2817)

* Optimize backward graph generation and caddtable

* refine add table

* change api name

* add layer norm and expand size layers (#2819)

* add layer norm and expand size

* meet pr comments

* feat: enable global average pooling (intel-analytics#2823)

* feat: enable global average pooling

* test: add more unit tests

* Optimizers: use member variable in parent class

* Revert "Optimizers: use member variable in parent class"

This reverts commit 7e47204

* Dilation in MKL-DNN Convolution (intel-analytics#2815)

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* fix typos

fix typos

* make todo all uppercase

* fix: calculate arbitrary mask of scales (intel-analytics#2822)

* Use one AllReduceParameter for multi-optim method  training (intel-analytics#2814)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* change random seed in UT

* [New feature] add transformer layer (intel-analytics#2825)

* add transformer

* refactor class name

* use same embedding for translation

* fix pr comments

* [Bug Fix] Fix Issue 2734 (#2816)

* fix issue 2734

* fix issue 2734

* fix issue 2734

* [Refactor] Reflection Utilization (#2831)

* refactor reflection utils

* refactor reflection utils

* feat: MKLDNN LSTM unidirectional/bidirectional inference support (intel-analytics#2806)

* LSTM draft

* MKLDNN LSTM fixed MD

* added hiddenSize

* setMemoryData NativeData

* weights NativeData format set to ldigo, all 1 test passed

* fixed format any problem

* LSTM weights bias initialisation

* add LSTM2 in nn

* Bidirectional LSTM inference enabled

* modified Bidirectional test

* LSTMSpec input format conversion bug between bigdl and mkldnn fixed, not support random weights, bias

* fixed the last problem 1 3 2 4

* Three inference tests with randomly generated parameters

* Added comments and modified the LSTMSpec (tests using Equivalent.nearequals)

* Deleted nn/LSTM2. Renamed methods. Added a requirement in nn/TimeDistributed

* combined initMemoryDescs() into initFwdPrimitives()

* Add require for input size and hidden size matching if layers of LSTM is more than one

* Refactor RNN

* Add comment on gate order to mkldnn/RNN

* Add unidirectional multilayer test

* add comments/ modify UTs

* phase is not used anymore/ use isTraining() in stead

* operationWant enhanced/ weight init/ release() parameters()

* remove input format check and change some variables names

* input format check / throw exception print info / release code

* comment style and RNNSerialTest

* remove unnecessary comments

* Softmax -> SoftMax (#2837)

* bug fix for cmul (intel-analytics#2836)

* bug fix for cmul

* meet pr comments

* set new storage to weight and bias for weight fusion (intel-analytics#2839)

* Add parameter processor for LARS (#2832)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* implement lars whole layer gradient norm calculation

* change random seed in UT

* add limitation on "trust" of LARS, remove debug output

* reformat

* add tests in DirtriOptimizer for LARS

* reformat

* update parameters in UT

* update parameters in UT

* Add transformer to LM example (intel-analytics#2835)

* add transformer to LM example

* refactor dropout in Transformer

* meet pr comments

* feat: MKLDNN LSTM unidirectional/bidirectional backward support (#2840)

* MKLDNN LSTM backward support with accuracy testing

* fix: require consistent between shape and layout of mkldnn (intel-analytics#2824)

* fix: fusion for multi-group of convolution (intel-analytics#2826)

* fix: support int8 of jointable (#2827)

* fix: support int8 of jointable
* doc: add more docs

* fix: invokeAndWait2 should throw the exception in the tasks (intel-analytics#2843)

* fix acc bug & init dnn thread (intel-analytics#2841)

* support tnc and ntc conversion (intel-analytics#2844)

* support ntc in dnn layer (intel-analytics#2847)

* support ntc in dnn layer

* meet pr comments

* [WIP]Add beam search feature in transformer model (intel-analytics#2834)

* add beam search feature

* Update beam search feature and unit test

* add symbolToLogits function set check

* update clearState and add serial test

* add SequenceBeamSearch to python layers

* add createSequenceBeamSearch method to python api

* feat: add a property to disable omp thread affinity (intel-analytics#2849)

* fix: use treeset to calc topk to upgrade the performance of DetectionOutputSSD (intel-analytics#2853)

* fix: wrong affinity settings (intel-analytics#2857)

* update beam search feature for interface with transformer model (#2855)

* update beam search for padding value and cache structure

* update python API for beam search

* add comments and update python layer

* modify comments format

* modify comments format

* Support converting blas lstm to dnn lstm (#2846)

* convert from blas lstm to dnn lstm

* meet pr comments

* fix load lstm error bug (intel-analytics#2858)

* Add beam search in transformer (intel-analytics#2856)

* Add beam search in transformer

* meet pr comments

* fix: upgrade the performance of normalize (intel-analytics#2854)

* feat: add axis to softmax (intel-analytics#2859)

* add release doc for 0.9 (intel-analytics#2862)

* fix: update core ref to master (intel-analytics#2865)

* flip version to 0.10.0 (intel-analytics#2869)

* [Bug Fix] - Fix module version comparison  (intel-analytics#2871)

* update serialization

* update serialization

* convert IRgraph momentum to mkldnn (intel-analytics#2872)

* tutorial fix (intel-analytics#2879)

* feat: RoiAlign Forward (intel-analytics#2874)

* Add set input output format API in Python (intel-analytics#2880)

* add set input output format

* add static graph check

* feat: Feature Pyramid Networks Forward (intel-analytics#2870)

* fix memory leak for ir graph training (intel-analytics#2895)

* add gemm layer (#2882)

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add gemm layer

* add gemm layer

* add Shape layer (intel-analytics#2885)

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add Gather layer (intel-analytics#2897)

* add gather layer

* [New feature] Add maskhead (intel-analytics#2892)

* support for maskhead

* fix unit tests (intel-analytics#2905)

* modify  predict/predictClass function  (#2868)

* predictClass output modification

* predict/predictClass function modification in Beta Api

* predict/predictClass function modification

* predict/predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* [New feature] Add Boxhead (intel-analytics#2894)

* add boxhead

* add SerialTest

* meet pr comments

* fix: Add TopBlocks to Feature Pyramid Networks (FPN) (#2899)

* Add Mean Average Precision validation method (intel-analytics#2906)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* fix boxhead unit tests (#2912)

* python api nested list input and pooler python api (intel-analytics#2900)

* Auto memory management for MKLDNN (#2867)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* style fixes

* change _implicitMemoryOwner -> _this

* [New feature] Add region proposal (intel-analytics#2896)

* add Regionproposal

* [New feature] add maskrcnn (#2908)

* add maskrcnn

* fix mask head

* move maskrcnn to models

* add maskrcnn serialTest

* Add Onnx Supported Layers (intel-analytics#2902)

* remove duplicated layers

* Update RoiLabel class and add RoiImageFeatureToBatch (intel-analytics#2913)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* update RoiLabel, add RoiImageFeatureToBatch

* fix typo in class name

* updates by suggestions

* minor updates

* Move RoiMiniBatch to MTImageFeatureToBatch.scala

* mask in RoiLabel now have Floats not Bytes

* use IndexedSeq for RoiLabel

* style fix

* add isCrowd and origSize to final target table

* style fix

* isCrowd change to float, add doc

* add tests and bug fixes

* add util getting RoiLabels from ImageFeatures

* add util getting RoiLabels from Table

* comment out the tests

* rename utils in RoiLabel

* feat: MKLDNN GRU forward/backward support (#2893)

* Onnx support: modify unsqueeze function (#2910)

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* add maskutils (intel-analytics#2921)

* add maskutils

* update tests & docs

* fix typo in document

* Fix memory leaks on training (intel-analytics#2914)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* fix memory leak in batch norm

* style fixes

* change _implicitMemoryOwner -> _this

* release submat

* release opencv submats

* support samples with different size  to one mini batch (intel-analytics#2929)

* add to batch with resize

* meet comments

* support batch for mask head and pooler (intel-analytics#2926)

* support batch for mask head

* meet comments

* Onnx support: add a dim parameter to ops.Gather (intel-analytics#2920)

* add dim parameter to ops.Gather

* improve and simplify code

* improve and simplify code

* improve and simplify code

* improve and simplify code

* support batch for regionproposal (#2928)

* support batch for regionproposal

* enable gru blas-to-dnn conversion (intel-analytics#2930)

* Onnx support: add pos parameter to softmax (intel-analytics#2933)

* add pos parameter to softmax

* add pos parameter to softmax

* add pos parameter to softmax

* fix review problem

* fix review problem

* Add resize for segmentation (intel-analytics#2923)

* add resize for segmentation

* meet pr comments

* support batch input for boxhead (#2924)

* boxhead support batch input

* meet pr comments

* COCO SeqFile (intel-analytics#2927)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* ignore non-existing images

* updates based on GH comments

* ONNX Support (#2918)

* onnx dev

* add onnx loader

* clean up

* feat: add precision recall auc (#2941)

* feat: add precision recall auc

* add post processing for maskrcnn model (#2931)

* add mask postprocessing

* put image info to mask model

* fix TimeDistributedCriterion() lack of parameter of dimension issue (intel-analytics#2940)

* revert back api (intel-analytics#2943)

* fix: softmax and bn+scale fusion (intel-analytics#2937)

* feat: multi models support with MKL-DNN backend (intel-analytics#2936)

* feat: multi models support with MKL-DNN backend

* add COCO MAP (#2935)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* add COCO MAP

* revert merge conflict

* ignore non-existing images

* add IOU related API. MAP now parses RLEs

* BBox now inclusive

* updates based on GH comments

* add COCODataset.getImageById

* COCO topK default => -1, remove height: Int, width: Int in GroundTruthRLE

* update imageId2Image

* rename MAPObjectDetection utils, add cocoSegmentationAndBBox, refine formatting

* rename utils

* update documents

* check size of bbox & classes & scores & labels & iscrowd. Handle empty predictions

* add gt and target image size checking, add support for empty target bbox, add UT

* detection sorted before matching with GT. Optimize MAPResult merging. Add UT for merging

* COCO Seq file reader: grey to bgr (intel-analytics#2942)

* grey to bgr

* refactor isGrayScaleImage

* simplify grey scale image checking

* Add the flushing denormal values option on BigDL side (#2934)

* add no argument apply api for softmax (intel-analytics#2945)

* add no argument apply api for softmax

* add no argument apply api for softmax

* ONNX ResNet example (intel-analytics#2939)

* add onnx resnet example

* add doc for onnx

* add doc for onnx

* clean up

* add maskrcnn inference example (intel-analytics#2944)

* add maskrcnn inference example

* meet pr comments

* add model download url

* Update the RoiLabel and MTImageFeatureToBatch (intel-analytics#2925)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* Python MKLDNN examples for CNN(LeNet) and RNN(LSTM) (#2932)

* fix: takeSample only works for dnn backend and get one batch (intel-analytics#2947)

* fix: takeSample only works for dnn backend and get one batch

* edit doc (#2948)

* Rename filesToRoiImageFrame to filesToRoiImageFeatures (intel-analytics#2949)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* filesToRoiImageFrame -> filesToRoiImageFeatures, to public

* fix: move out setMklThreads of MklDnn (intel-analytics#2950)

* memory data cleanup (#2956)

* memory data cleanup

* Onnx support: RoiAlign and TopK parameter update (#2957)

* Topk add dim and increase parameter

* RoiAlign add max pooling mode

* add test cases

* add test cases

* remove masks requirements (intel-analytics#2959)

* fix: the squeeze should not be included in IRElement (intel-analytics#2962)

* enhance COCODataset (#2954)

* enhance COCODataset:
Add COCODataset.loadFromSeqFile
Add COCODataset.toImageFeatures
Add COCOImage.toTable

* rename and polish doc

* fix COCO serialize bug

* fix typo in function name

* typo fix (intel-analytics#2965)

* rename RoiImageFeatureToBatch APIs (#2964)

* RoiMiniBatch enhancement (#2953)

* SerializableIndexedSeq

* allow empty target & image size info

* rename RoiImageFeatureToBatch APIs

* set as private

* change back to array

* MTImageFeatureToBatch without labels

* handle iscrowd

* remove duplication in merge

* feat: add softmax backward (intel-analytics#2967)

* feat: add softmax backward

* fix: fuse bn scale and relu to bn. (intel-analytics#2966)

* fix: fuse bn scale and relu.

* fix mask unit tests (intel-analytics#2973)

* fix: nms stability when using treeset. (intel-analytics#2972)

* flip version to 0.11 (intel-analytics#2974)

* refactor anchor generator (#2963)

* refactor anchor generator

* meet pr comments

* fix code style

* ROIAlign refactor (intel-analytics#2960)

* ROIAlign refactor

* fix unit tests

* fix model load of maskrcnn (intel-analytics#2961)

* fix maskrcnn model load

* delete temp file

* fix maskrcnn tests

* support roialign backward (intel-analytics#2975)

* support roialign backward

* fix sparselinear unit test

* fix: bn nhwc error, the channel should be the last dim (#2981)

* refactor: move torch relevants unit tests to integration tests. (intel-analytics#2971)

* fix: enable integration accuracy tests (intel-analytics#2976)

* fix: softmax dnn backend wrong order of primitive (intel-analytics#2986)

* modify TextClassifier.scala (#2987)

* Add a method to merge nested StaticGraphs (intel-analytics#2985)

* NHWC support when running with MKL-DNN (#2989)

* support NHWC for MKLDNN

* fix unit tests

* Keras with MKL-DNN backend support (#2990)

* Update README.md

* Update README.md

* feat: add distri optimizer v2 (intel-analytics#2992)

* update error message in AllReduceParameter (#2997)

* update error message in AllReduceParameter

* use tensorflow proto jar (#2994)

* fix callBigDLFunc (intel-analytics#3002)

* Remove final for AbstractModule (intel-analytics#3001)

* DistriOptimizerV2 argument (intel-analytics#3003)

* call DistriOptimizerV2

* fix inception (intel-analytics#3010)

* fix top1 and treenn (intel-analytics#3011)

* remove final setExtraParameters (#3014)

* move pretrain in DistriOptimizerV2 (intel-analytics#3016)

* move getData

* rename

* remove time counting

* deprecate dlframe (intel-analytics#3012)

* deprecate dlframe

* fix throughput (#3017)

* fix throughput

* update

* add release doc for 0.10.0 (intel-analytics#3020)

* test examples by distrioptimizerv2 (intel-analytics#3007)

* enable scala examples by distrioptimizerv2

* update example's readme

* update integration test

* test python examples by distriOptimizerV2 (intel-analytics#3008)

* Test python examples by distriOptimizerV2

* deprecate nn.keras (intel-analytics#3013)

* deprecate nn.keras

* fix loss when minibatch size is different (intel-analytics#3021)

* fix loss

* fix ut

* fix style check (intel-analytics#3022)

* specify pyspark version (intel-analytics#3030)

* specify pyspark version

* add release doc for 0.11 (#3026)

* flip version to 0.12 (intel-analytics#3029)



* update

* fix KerasLayer new parameters() (#3034)

* Fix analytics zoo protobuf shading problem (intel-analytics#3033)

* change shade name and remove protobuf-java (already introduced by tf)

* remove protobuf

* add required dependencies (#3047)

* update doc (intel-analytics#3056)

* Updatedoc (#3060)

* Update install-from-pip.md

* [WIP] spark 3.0 (intel-analytics#3054)

* spark 3.0

* add spark3.0 deployment (intel-analytics#3061)

* add spark3.0 deployment

* add warning to remind Optimizer() deprecates (intel-analytics#3062)

* add warning to remind deprecates

* Update scala maven plugin (#3068)

* update scala maven plugin

* change to public (#3064)

* Add big model support (#3067)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* squeeze target dimension (corner case) in ClassNLLCriterion (intel-analytics#3072)

* fix target dimension match error

* update message (#3073)

* flip version to 0.13-snapshot (intel-analytics#3074)

* flip version to 0.13-snapshot

* Uncompressed Tensor  (intel-analytics#3079)

* support no compressing parameter

* address comments

* hotfix ClassNLLCriterion with cloned target (#3081)

* hotfix ClassNLLCriterion with cloned target

* Fix SerializationUtils clone issue of QuantizedTensor (intel-analytics#3088)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* update clone quantizedtensor

* update

* add OptimPredictorShutdownSpec UT in integration test (#3089)

* move integration UT to a general test script (intel-analytics#3094)

* back port master (intel-analytics#3096)

* set seed to avoid random error in PredictionServiceUT (intel-analytics#3097)

* Jdk11 support (intel-analytics#3098)

* update for jdk 11 support and doc

* add serializeUid (intel-analytics#3099)

* update doc (intel-analytics#3104)

* add doc for running in ide (intel-analytics#3106)

* fix callBigDLFunc return a Int while the true return value from java is a byte array. (intel-analytics#3111)

* add list of df support (intel-analytics#3113)

* Update readme (intel-analytics#3118)

* Update index.md

* add 0.12.2 release download (#3122)

* remove DLFrames (intel-analytics#3124)

* remove DLFrames

* update

* update

* update

* rm dlframe example from test script

* Add Utest about dividing zero (#3128)

* Add Utest about dividing zero

* add Utest and zero check of LocalData

* add Utest and zero check of LocalData

* change

* Add Utest about dividing zero

* fix test

* add python3 to Dockerfile (intel-analytics#3132)

* add python3 to Dockerfile

* update

* update jdk

* update

* make default DistriOptimizer as V2 (intel-analytics#3129)

* make default DistriOptimizer as V2

* update

* fix dlframe (intel-analytics#3133)

* DistriOptimizerV2 logger (intel-analytics#3135)

* DistriOptimizerV2 logger

* update

* fix style check

* validate epoch num

* move dlframe SharedParamsApater to AZ and roll back to OptimizerV1 (intel-analytics#3137)

* upgrade spark version (intel-analytics#3138)

* Update deploy-spark2.sh

* 0.13 release doc (#3144)

* upgrade log4j (intel-analytics#3141)

* flip0.14 (intel-analytics#3142)

* flip0.14

* update

* Update deploy-spark3.sh (#3145)

* update

* update

* update

* update

* fix make dist

* migrate path

* update

* update

Co-authored-by: zhangxiaoli73 <380761639@qq.com>
Co-authored-by: Jerry Wu <wzhongyuan@gmail.com>
Co-authored-by: Xin Qiu <qiuxin2012@users.noreply.github.com>
Co-authored-by: Yanzhang Wang <i8run15@gmail.com>
Co-authored-by: GenBrg <34305977+GenBrg@users.noreply.github.com>
Co-authored-by: LeicongLi <leicongli@gmail.com>
Co-authored-by: Emiliano Martinez <emimartinez.sanchez@gmail.com>
Co-authored-by: abdolence <abdulla.abd.m@gmail.com>
Co-authored-by: Enrique Garcia <engapa@gmail.com>
Co-authored-by: Louie Tsai <louie.tsai@intel.com>
Co-authored-by: yaochi <yaochitc@gmail.com>
Co-authored-by: Menooker <Menooker@users.noreply.github.com>
Co-authored-by: Menooker <myjisgreat@live.cn>
Co-authored-by: Firecrackerxox <mengceng.he@intel.com>
Co-authored-by: majing921201 <1834475657@qq.com>
Co-authored-by: jenniew <jenniewang123@gmail.com>
Co-authored-by: Xiao <lingxiao1989@gmail.com>
Co-authored-by: Firecrackerxox <he044646@sina.com>
Co-authored-by: Hui Li <lihuibinghan@sina.com>
Co-authored-by: Jason Dai <jason.dai@intel.com>
Co-authored-by: dding3 <ding.ding@intel.com>
Co-authored-by: Yang Wang <yang3.wang@intel.com>
Co-authored-by: Yina Chen <33650826+cyita@users.noreply.github.com>
Co-authored-by: Hangrui Cao <50705298+DiegoCao@users.noreply.github.com>
Co-authored-by: pinggao18 <44043817+pinggao18@users.noreply.github.com>
Le-Zheng added a commit to Le-Zheng/analytics-zoo that referenced this pull request Sep 22, 2021
* convert static graph to IR graph and build (intel-analytics#2711)

* add static graph to IR graph

* meet pr comments

* [Enhancement] - Enhance unig test to avoid dynamic resource allocation issue by docker (intel-analytics#2713)

* make the core number fixed

* fix local predictor

* add Trigger and/or python API (intel-analytics#2682)

* add spark 2.4 support (intel-analytics#2715)

* update sparse tensor's document (#2714)

* Reserve all state in OptimMethod when calling Optimizer.optimize() multiple times (#2648)

* reserve optimMethod for each worker

* add valdiation throughput

* cache variable previousOptim

* fix: move mkldnn computing to a single thread pool (intel-analytics#2724)

Because if we use the parent thread directly, there will be two bugs,
1. The child threads forked from parent thread will be bound to core 0
because of the affinity settings.
2. The native thread has some unknown thread local variables. So if
the parent thread exits and is recreated, such as the thread from
Executors.newFixedThreadPool. The whole app will be segment fault.
The parent thread means the main thread (Local Mode) or worker thread of
mapPartition (Distributed Mode).

* add ceilMode for Pooling & fix batchNorm evaluate (#2708)

* add ceilMode for Pooling & fix batchNorm evaluate

* add training status for dnn layer

* fix comments

* fix IRGraph init & Add regualizer (#2736)

* fix IRGraph init & Add regualizer

* meet review comments

* fix: update mkldnn version to v0.17 issues. (intel-analytics#2712)

There're two issues,

1. the padding tensor required. mkl-dnn will use a padding tensor which
    will use more memory, such as 4x1x28x28 to 4x8x28x28(avx2). It will
    pad to times of simd width.
2. the TensorMMap between DenseTensor and DnnTensor. Previous impl
    will allocate DnnTensor when model is created, which will cost too much
    space. So this patch will allocate it at runtime.

* add computshape for some layers and add skip primitives in DnnGraph (intel-analytics#2740)

* add computshape for some layer and add skip primitives in DnnGraph

* meet pr comments

* Improve documentation (intel-analytics#2745)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* include edge case to cover all the data types (#2742)

* layer auto fusion for dnn graph (intel-analytics#2746)

* add auto fusion in dnn graph

* refactor predict for dnn model (intel-analytics#2737)

* refactor predict for dnn model

* remove some unit tests (intel-analytics#2752)

* remove some conflict tests (#2753)

* Update documentation (intel-analytics#2749)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Fix Add operation error when type is Double importing Tensorflow graph (#2721)

* feature: add byte supports for DnnTensor (intel-analytics#2751)

* feat: add byte supports for DnnTensor

* [New Feature] Calculating Scales (#2750)

* [New Feature]Calculating Scales

* recursively update mask for container module (intel-analytics#2754)

* recursively update mask for container module

* [Enhancement] - Speed up BlasWrapper performance under MKL-DNN (intel-analytics#2748)

* add parallel in Blaswrapper

* refactor to support ssd

* meet pr comments

* fix logger serialize

* Loss Function docs improvement (intel-analytics#2757)

* Improve Loss Function docs v2

* change asInstanceOf to toDistirbuted in optimizer (#2755)

* change asInstanceOf to toDistirbuted

* change asInstanceOf to toDistirbuted

* convert scale in blas to dnn (#2758)

* convert scale in blas to dnn

* meet pr comment

* feat: reorder for int8 supports (#2756)

1. Because the new data type, we should add a new attribute called dataType
    to the `MemoryData`.
2. Because we should transfer the scales between FP32->int8 and Int8->FP32.
    we should add two new attributes called `mask` and `scales`.

* fix conversion accuracy (intel-analytics#2760)

*  fix accuracy for saved model

* exclude mkldnn model when conversion

* feature: layer wise supports of int8 (intel-analytics#2762)

Enable the int8 data type in layers, especially for convolutions.
So for a specific layer, it can accept a int8 input. If you want to the fp32
output, should add a reorder.

* feature: mkldnn int8 layer wise supports (intel-analytics#2759)

including 3 steps.

1. generate scales of model.
   need an api like `generateScalesWithMask` to generate the scales of
   fp32 model. and the model returned is an fp32 model too.
2. quantize the model
   the `quantize()` api will be compatible with the `bigquant`
   backend, which will set the quantize flag. And when doing compile,
   the quantized weight, output, input will be generated by mkldnn at
   runtime.
3. do the inference (forward).

* update readme for v1 training (intel-analytics#2763)

* update release doc for preparation (intel-analytics#2764)

* change some docs about mkldnn (intel-analytics#2765)

* add comments about mkldnn

* meet pr comments

* examples for int8 (intel-analytics#2761)

This is an example of how to use mkldnn int8. There're two steps, use
GenInt8Scales to generate the scales first and save the new model. And than you
can use the quantized model as usual.

* enable fustion by default (intel-analytics#2766)

* fix: the influence of default value of fusion (#2768)

* fix: use too much memory of mkldnn models (intel-analytics#2783)

* fix: inplace of input/output and weight dimension error (intel-analytics#2779)

Some layer's input and output use the same memory. We can't do forward in the
`calcScales`. Because at that time, the input has been changed, its scales maybe
not right. Such as,

Seqeuntail().add(Conv).add(ReLU)

it will do two steps, seq.forward(input) first. and when go into the ReLU, it
will do another forward, so the input will be the output. And scales will be
wrong.

For convolution's weight, the dimension always is 5, although the group number
is 1. But for dnn convolution, if there's no group, the weight's dimension
should be 4.

* fix: the blas wrapper has no scales (intel-analytics#2778)

* fix softmax (intel-analytics#2777)

* fix: performance regression on resnet50 (intel-analytics#2774)

the u8 to s8 or s8 to u8 needs no reorder on this case.

* fix log init (#2781)

* fix: dropout should init primitive (#2789)

* Docs update for spark 2.3, build 0.7 and deps exlude (intel-analytics#2671)

* flip to 0.9.0 (intel-analytics#2792)

* Improve Layer documentation v1 (#2767)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Loss Function docs improvement v1

* Improve Loss Function docs v2

* Improve Layers documentation

* Improve documentation on Activations

* minor fix

* Update a code section with python style on Metrics.md (intel-analytics#2665)

* [Fix] doc : some changes for scalaUserGuide and release links according to … (intel-analytics#2791)

* doc : some changes for scalaUserGuide and release links according to v0.8.0 release

* Update build-bigdl-core.md

* Update build-bigdl-core.md

* test: should compare the right grad input (intel-analytics#2794)

* fix the wrong error message (#2800)

* [New feature] Add attention layer and ffn layer (intel-analytics#2795)

* add attention layer

* add ffn layer and more unit tests

* refactor according to pr comments

* add SerializationTest

* fix unit tests

* add python api

* update readme with newly adopted mkl-dnn (#2803)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer (intel-analytics#2802)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer:
Add LARS optimizer: Layer-wise scaled. Also with utility functions to build a set of LARS optim for a container.

Bug fix: The gradient block id of AllReduceParameter is originally composed of {id}{pidTo}gradientBytes{pidFrom}. But the combination of {id}{pidTo} will cause ambiguity. e.g., "112" can be {1}{12} or {11}{2}. Now a "_" is added to separate id from pidTo

* refine documents, correctly set the lrSchedulerOwner bit

* format the added code

* make Lars inherit SGD

* rename Lars -> LarsSGD and reformat

* style changes

* bugfix - set mask for container (intel-analytics#2807)

* bugfix - set mask for container

* bugfix #2805: set dimension mask

* Update Graph.scala

* Update Graph.scala

* change set mask indicator's name

* rename set mask params

* [Enhancement]: Scala Reflection: get default value for constructor parameters (intel-analytics#2808)

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* resolve conflict

* resolve conflict

* code style check

* remove print

* fix typos

fix typos

* replace randomcropper with centercrop for better performance (#2818)

* fix: memory data hash code should contain data type (intel-analytics#2821)

* Optimize backward graph generation and CAddTable (intel-analytics#2817)

* Optimize backward graph generation and caddtable

* refine add table

* change api name

* add layer norm and expand size layers (#2819)

* add layer norm and expand size

* meet pr comments

* feat: enable global average pooling (intel-analytics#2823)

* feat: enable global average pooling

* test: add more unit tests

* Optimizers: use member variable in parent class

* Revert "Optimizers: use member variable in parent class"

This reverts commit 7e47204

* Dilation in MKL-DNN Convolution (intel-analytics#2815)

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* fix typos

fix typos

* make todo all uppercase

* fix: calculate arbitrary mask of scales (intel-analytics#2822)

* Use one AllReduceParameter for multi-optim method  training (intel-analytics#2814)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* change random seed in UT

* [New feature] add transformer layer (intel-analytics#2825)

* add transformer

* refactor class name

* use same embedding for translation

* fix pr comments

* [Bug Fix] Fix Issue 2734 (#2816)

* fix issue 2734

* fix issue 2734

* fix issue 2734

* [Refactor] Reflection Utilization (#2831)

* refactor reflection utils

* refactor reflection utils

* feat: MKLDNN LSTM unidirectional/bidirectional inference support (intel-analytics#2806)

* LSTM draft

* MKLDNN LSTM fixed MD

* added hiddenSize

* setMemoryData NativeData

* weights NativeData format set to ldigo, all 1 test passed

* fixed format any problem

* LSTM weights bias initialisation

* add LSTM2 in nn

* Bidirectional LSTM inference enabled

* modified Bidirectional test

* LSTMSpec input format conversion bug between bigdl and mkldnn fixed, not support random weights, bias

* fixed the last problem 1 3 2 4

* Three inference tests with randomly generated parameters

* Added comments and modified the LSTMSpec (tests using Equivalent.nearequals)

* Deleted nn/LSTM2. Renamed methods. Added a requirement in nn/TimeDistributed

* combined initMemoryDescs() into initFwdPrimitives()

* Add require for input size and hidden size matching if layers of LSTM is more than one

* Refactor RNN

* Add comment on gate order to mkldnn/RNN

* Add unidirectional multilayer test

* add comments/ modify UTs

* phase is not used anymore/ use isTraining() in stead

* operationWant enhanced/ weight init/ release() parameters()

* remove input format check and change some variables names

* input format check / throw exception print info / release code

* comment style and RNNSerialTest

* remove unnecessary comments

* Softmax -> SoftMax (#2837)

* bug fix for cmul (intel-analytics#2836)

* bug fix for cmul

* meet pr comments

* set new storage to weight and bias for weight fusion (intel-analytics#2839)

* Add parameter processor for LARS (#2832)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* implement lars whole layer gradient norm calculation

* change random seed in UT

* add limitation on "trust" of LARS, remove debug output

* reformat

* add tests in DirtriOptimizer for LARS

* reformat

* update parameters in UT

* update parameters in UT

* Add transformer to LM example (intel-analytics#2835)

* add transformer to LM example

* refactor dropout in Transformer

* meet pr comments

* feat: MKLDNN LSTM unidirectional/bidirectional backward support (#2840)

* MKLDNN LSTM backward support with accuracy testing

* fix: require consistent between shape and layout of mkldnn (intel-analytics#2824)

* fix: fusion for multi-group of convolution (intel-analytics#2826)

* fix: support int8 of jointable (#2827)

* fix: support int8 of jointable
* doc: add more docs

* fix: invokeAndWait2 should throw the exception in the tasks (intel-analytics#2843)

* fix acc bug & init dnn thread (intel-analytics#2841)

* support tnc and ntc conversion (intel-analytics#2844)

* support ntc in dnn layer (intel-analytics#2847)

* support ntc in dnn layer

* meet pr comments

* [WIP]Add beam search feature in transformer model (intel-analytics#2834)

* add beam search feature

* Update beam search feature and unit test

* add symbolToLogits function set check

* update clearState and add serial test

* add SequenceBeamSearch to python layers

* add createSequenceBeamSearch method to python api

* feat: add a property to disable omp thread affinity (intel-analytics#2849)

* fix: use treeset to calc topk to upgrade the performance of DetectionOutputSSD (intel-analytics#2853)

* fix: wrong affinity settings (intel-analytics#2857)

* update beam search feature for interface with transformer model (#2855)

* update beam search for padding value and cache structure

* update python API for beam search

* add comments and update python layer

* modify comments format

* modify comments format

* Support converting blas lstm to dnn lstm (#2846)

* convert from blas lstm to dnn lstm

* meet pr comments

* fix load lstm error bug (intel-analytics#2858)

* Add beam search in transformer (intel-analytics#2856)

* Add beam search in transformer

* meet pr comments

* fix: upgrade the performance of normalize (intel-analytics#2854)

* feat: add axis to softmax (intel-analytics#2859)

* add release doc for 0.9 (intel-analytics#2862)

* fix: update core ref to master (intel-analytics#2865)

* flip version to 0.10.0 (intel-analytics#2869)

* [Bug Fix] - Fix module version comparison  (intel-analytics#2871)

* update serialization

* update serialization

* convert IRgraph momentum to mkldnn (intel-analytics#2872)

* tutorial fix (intel-analytics#2879)

* feat: RoiAlign Forward (intel-analytics#2874)

* Add set input output format API in Python (intel-analytics#2880)

* add set input output format

* add static graph check

* feat: Feature Pyramid Networks Forward (intel-analytics#2870)

* fix memory leak for ir graph training (intel-analytics#2895)

* add gemm layer (#2882)

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add gemm layer

* add gemm layer

* add Shape layer (intel-analytics#2885)

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add Gather layer (intel-analytics#2897)

* add gather layer

* [New feature] Add maskhead (intel-analytics#2892)

* support for maskhead

* fix unit tests (intel-analytics#2905)

* modify  predict/predictClass function  (#2868)

* predictClass output modification

* predict/predictClass function modification in Beta Api

* predict/predictClass function modification

* predict/predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* [New feature] Add Boxhead (intel-analytics#2894)

* add boxhead

* add SerialTest

* meet pr comments

* fix: Add TopBlocks to Feature Pyramid Networks (FPN) (#2899)

* Add Mean Average Precision validation method (intel-analytics#2906)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* fix boxhead unit tests (#2912)

* python api nested list input and pooler python api (intel-analytics#2900)

* Auto memory management for MKLDNN (#2867)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* style fixes

* change _implicitMemoryOwner -> _this

* [New feature] Add region proposal (intel-analytics#2896)

* add Regionproposal

* [New feature] add maskrcnn (#2908)

* add maskrcnn

* fix mask head

* move maskrcnn to models

* add maskrcnn serialTest

* Add Onnx Supported Layers (intel-analytics#2902)

* remove duplicated layers

* Update RoiLabel class and add RoiImageFeatureToBatch (intel-analytics#2913)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* update RoiLabel, add RoiImageFeatureToBatch

* fix typo in class name

* updates by suggestions

* minor updates

* Move RoiMiniBatch to MTImageFeatureToBatch.scala

* mask in RoiLabel now have Floats not Bytes

* use IndexedSeq for RoiLabel

* style fix

* add isCrowd and origSize to final target table

* style fix

* isCrowd change to float, add doc

* add tests and bug fixes

* add util getting RoiLabels from ImageFeatures

* add util getting RoiLabels from Table

* comment out the tests

* rename utils in RoiLabel

* feat: MKLDNN GRU forward/backward support (#2893)

* Onnx support: modify unsqueeze function (#2910)

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* add maskutils (intel-analytics#2921)

* add maskutils

* update tests & docs

* fix typo in document

* Fix memory leaks on training (intel-analytics#2914)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* fix memory leak in batch norm

* style fixes

* change _implicitMemoryOwner -> _this

* release submat

* release opencv submats

* support samples with different size  to one mini batch (intel-analytics#2929)

* add to batch with resize

* meet comments

* support batch for mask head and pooler (intel-analytics#2926)

* support batch for mask head

* meet comments

* Onnx support: add a dim parameter to ops.Gather (intel-analytics#2920)

* add dim parameter to ops.Gather

* improve and simplify code

* improve and simplify code

* improve and simplify code

* improve and simplify code

* support batch for regionproposal (#2928)

* support batch for regionproposal

* enable gru blas-to-dnn conversion (intel-analytics#2930)

* Onnx support: add pos parameter to softmax (intel-analytics#2933)

* add pos parameter to softmax

* add pos parameter to softmax

* add pos parameter to softmax

* fix review problem

* fix review problem

* Add resize for segmentation (intel-analytics#2923)

* add resize for segmentation

* meet pr comments

* support batch input for boxhead (#2924)

* boxhead support batch input

* meet pr comments

* COCO SeqFile (intel-analytics#2927)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* ignore non-existing images

* updates based on GH comments

* ONNX Support (#2918)

* onnx dev

* add onnx loader

* clean up

* feat: add precision recall auc (#2941)

* feat: add precision recall auc

* add post processing for maskrcnn model (#2931)

* add mask postprocessing

* put image info to mask model

* fix TimeDistributedCriterion() lack of parameter of dimension issue (intel-analytics#2940)

* revert back api (intel-analytics#2943)

* fix: softmax and bn+scale fusion (intel-analytics#2937)

* feat: multi models support with MKL-DNN backend (intel-analytics#2936)

* feat: multi models support with MKL-DNN backend

* add COCO MAP (#2935)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* add COCO MAP

* revert merge conflict

* ignore non-existing images

* add IOU related API. MAP now parses RLEs

* BBox now inclusive

* updates based on GH comments

* add COCODataset.getImageById

* COCO topK default => -1, remove height: Int, width: Int in GroundTruthRLE

* update imageId2Image

* rename MAPObjectDetection utils, add cocoSegmentationAndBBox, refine formatting

* rename utils

* update documents

* check size of bbox & classes & scores & labels & iscrowd. Handle empty predictions

* add gt and target image size checking, add support for empty target bbox, add UT

* detection sorted before matching with GT. Optimize MAPResult merging. Add UT for merging

* COCO Seq file reader: grey to bgr (intel-analytics#2942)

* grey to bgr

* refactor isGrayScaleImage

* simplify grey scale image checking

* Add the flushing denormal values option on BigDL side (#2934)

* add no argument apply api for softmax (intel-analytics#2945)

* add no argument apply api for softmax

* add no argument apply api for softmax

* ONNX ResNet example (intel-analytics#2939)

* add onnx resnet example

* add doc for onnx

* add doc for onnx

* clean up

* add maskrcnn inference example (intel-analytics#2944)

* add maskrcnn inference example

* meet pr comments

* add model download url

* Update the RoiLabel and MTImageFeatureToBatch (intel-analytics#2925)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* Python MKLDNN examples for CNN(LeNet) and RNN(LSTM) (#2932)

* fix: takeSample only works for dnn backend and get one batch (intel-analytics#2947)

* fix: takeSample only works for dnn backend and get one batch

* edit doc (#2948)

* Rename filesToRoiImageFrame to filesToRoiImageFeatures (intel-analytics#2949)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* filesToRoiImageFrame -> filesToRoiImageFeatures, to public

* fix: move out setMklThreads of MklDnn (intel-analytics#2950)

* memory data cleanup (#2956)

* memory data cleanup

* Onnx support: RoiAlign and TopK parameter update (#2957)

* Topk add dim and increase parameter

* RoiAlign add max pooling mode

* add test cases

* add test cases

* remove masks requirements (intel-analytics#2959)

* fix: the squeeze should not be included in IRElement (intel-analytics#2962)

* enhance COCODataset (#2954)

* enhance COCODataset:
Add COCODataset.loadFromSeqFile
Add COCODataset.toImageFeatures
Add COCOImage.toTable

* rename and polish doc

* fix COCO serialize bug

* fix typo in function name

* typo fix (intel-analytics#2965)

* rename RoiImageFeatureToBatch APIs (#2964)

* RoiMiniBatch enhancement (#2953)

* SerializableIndexedSeq

* allow empty target & image size info

* rename RoiImageFeatureToBatch APIs

* set as private

* change back to array

* MTImageFeatureToBatch without labels

* handle iscrowd

* remove duplication in merge

* feat: add softmax backward (intel-analytics#2967)

* feat: add softmax backward

* fix: fuse bn scale and relu to bn. (intel-analytics#2966)

* fix: fuse bn scale and relu.

* fix mask unit tests (intel-analytics#2973)

* fix: nms stability when using treeset. (intel-analytics#2972)

* flip version to 0.11 (intel-analytics#2974)

* refactor anchor generator (#2963)

* refactor anchor generator

* meet pr comments

* fix code style

* ROIAlign refactor (intel-analytics#2960)

* ROIAlign refactor

* fix unit tests

* fix model load of maskrcnn (intel-analytics#2961)

* fix maskrcnn model load

* delete temp file

* fix maskrcnn tests

* support roialign backward (intel-analytics#2975)

* support roialign backward

* fix sparselinear unit test

* fix: bn nhwc error, the channel should be the last dim (#2981)

* refactor: move torch relevants unit tests to integration tests. (intel-analytics#2971)

* fix: enable integration accuracy tests (intel-analytics#2976)

* fix: softmax dnn backend wrong order of primitive (intel-analytics#2986)

* modify TextClassifier.scala (#2987)

* Add a method to merge nested StaticGraphs (intel-analytics#2985)

* NHWC support when running with MKL-DNN (#2989)

* support NHWC for MKLDNN

* fix unit tests

* Keras with MKL-DNN backend support (#2990)

* Update README.md

* Update README.md

* feat: add distri optimizer v2 (intel-analytics#2992)

* update error message in AllReduceParameter (#2997)

* update error message in AllReduceParameter

* use tensorflow proto jar (#2994)

* fix callBigDLFunc (intel-analytics#3002)

* Remove final for AbstractModule (intel-analytics#3001)

* DistriOptimizerV2 argument (intel-analytics#3003)

* call DistriOptimizerV2

* fix inception (intel-analytics#3010)

* fix top1 and treenn (intel-analytics#3011)

* remove final setExtraParameters (#3014)

* move pretrain in DistriOptimizerV2 (intel-analytics#3016)

* move getData

* rename

* remove time counting

* deprecate dlframe (intel-analytics#3012)

* deprecate dlframe

* fix throughput (#3017)

* fix throughput

* update

* add release doc for 0.10.0 (intel-analytics#3020)

* test examples by distrioptimizerv2 (intel-analytics#3007)

* enable scala examples by distrioptimizerv2

* update example's readme

* update integration test

* test python examples by distriOptimizerV2 (intel-analytics#3008)

* Test python examples by distriOptimizerV2

* deprecate nn.keras (intel-analytics#3013)

* deprecate nn.keras

* fix loss when minibatch size is different (intel-analytics#3021)

* fix loss

* fix ut

* fix style check (intel-analytics#3022)

* specify pyspark version (intel-analytics#3030)

* specify pyspark version

* add release doc for 0.11 (#3026)

* flip version to 0.12 (intel-analytics#3029)



* update

* fix KerasLayer new parameters() (#3034)

* Fix analytics zoo protobuf shading problem (intel-analytics#3033)

* change shade name and remove protobuf-java (already introduced by tf)

* remove protobuf

* add required dependencies (#3047)

* update doc (intel-analytics#3056)

* Updatedoc (#3060)

* Update install-from-pip.md

* [WIP] spark 3.0 (intel-analytics#3054)

* spark 3.0

* add spark3.0 deployment (intel-analytics#3061)

* add spark3.0 deployment

* add warning to remind Optimizer() deprecates (intel-analytics#3062)

* add warning to remind deprecates

* Update scala maven plugin (#3068)

* update scala maven plugin

* change to public (#3064)

* Add big model support (#3067)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* squeeze target dimension (corner case) in ClassNLLCriterion (intel-analytics#3072)

* fix target dimension match error

* update message (#3073)

* flip version to 0.13-snapshot (intel-analytics#3074)

* flip version to 0.13-snapshot

* Uncompressed Tensor  (intel-analytics#3079)

* support no compressing parameter

* address comments

* hotfix ClassNLLCriterion with cloned target (#3081)

* hotfix ClassNLLCriterion with cloned target

* Fix SerializationUtils clone issue of QuantizedTensor (intel-analytics#3088)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* update clone quantizedtensor

* update

* add OptimPredictorShutdownSpec UT in integration test (#3089)

* move integration UT to a general test script (intel-analytics#3094)

* back port master (intel-analytics#3096)

* set seed to avoid random error in PredictionServiceUT (intel-analytics#3097)

* Jdk11 support (intel-analytics#3098)

* update for jdk 11 support and doc

* add serializeUid (intel-analytics#3099)

* update doc (intel-analytics#3104)

* add doc for running in ide (intel-analytics#3106)

* fix callBigDLFunc return a Int while the true return value from java is a byte array. (intel-analytics#3111)

* add list of df support (intel-analytics#3113)

* Update readme (intel-analytics#3118)

* Update index.md

* add 0.12.2 release download (#3122)

* remove DLFrames (intel-analytics#3124)

* remove DLFrames

* update

* update

* update

* rm dlframe example from test script

* Add Utest about dividing zero (#3128)

* Add Utest about dividing zero

* add Utest and zero check of LocalData

* add Utest and zero check of LocalData

* change

* Add Utest about dividing zero

* fix test

* add python3 to Dockerfile (intel-analytics#3132)

* add python3 to Dockerfile

* update

* update jdk

* update

* make default DistriOptimizer as V2 (intel-analytics#3129)

* make default DistriOptimizer as V2

* update

* fix dlframe (intel-analytics#3133)

* DistriOptimizerV2 logger (intel-analytics#3135)

* DistriOptimizerV2 logger

* update

* fix style check

* validate epoch num

* move dlframe SharedParamsApater to AZ and roll back to OptimizerV1 (intel-analytics#3137)

* upgrade spark version (intel-analytics#3138)

* Update deploy-spark2.sh

* 0.13 release doc (#3144)

* upgrade log4j (intel-analytics#3141)

* flip0.14 (intel-analytics#3142)

* flip0.14

* update

* Update deploy-spark3.sh (#3145)

* update

* update

* update

* update

* fix make dist

* migrate path

* update

* update

Co-authored-by: zhangxiaoli73 <380761639@qq.com>
Co-authored-by: Jerry Wu <wzhongyuan@gmail.com>
Co-authored-by: Xin Qiu <qiuxin2012@users.noreply.github.com>
Co-authored-by: Yanzhang Wang <i8run15@gmail.com>
Co-authored-by: GenBrg <34305977+GenBrg@users.noreply.github.com>
Co-authored-by: LeicongLi <leicongli@gmail.com>
Co-authored-by: Emiliano Martinez <emimartinez.sanchez@gmail.com>
Co-authored-by: abdolence <abdulla.abd.m@gmail.com>
Co-authored-by: Enrique Garcia <engapa@gmail.com>
Co-authored-by: Louie Tsai <louie.tsai@intel.com>
Co-authored-by: yaochi <yaochitc@gmail.com>
Co-authored-by: Menooker <Menooker@users.noreply.github.com>
Co-authored-by: Menooker <myjisgreat@live.cn>
Co-authored-by: Firecrackerxox <mengceng.he@intel.com>
Co-authored-by: majing921201 <1834475657@qq.com>
Co-authored-by: jenniew <jenniewang123@gmail.com>
Co-authored-by: Xiao <lingxiao1989@gmail.com>
Co-authored-by: Firecrackerxox <he044646@sina.com>
Co-authored-by: Hui Li <lihuibinghan@sina.com>
Co-authored-by: Jason Dai <jason.dai@intel.com>
Co-authored-by: dding3 <ding.ding@intel.com>
Co-authored-by: Yang Wang <yang3.wang@intel.com>
Co-authored-by: Yina Chen <33650826+cyita@users.noreply.github.com>
Co-authored-by: Hangrui Cao <50705298+DiegoCao@users.noreply.github.com>
Co-authored-by: pinggao18 <44043817+pinggao18@users.noreply.github.com>
Le-Zheng added a commit to Le-Zheng/analytics-zoo that referenced this pull request Sep 22, 2021
* convert static graph to IR graph and build (intel-analytics#2711)

* add static graph to IR graph

* meet pr comments

* [Enhancement] - Enhance unig test to avoid dynamic resource allocation issue by docker (intel-analytics#2713)

* make the core number fixed

* fix local predictor

* add Trigger and/or python API (intel-analytics#2682)

* add spark 2.4 support (intel-analytics#2715)

* update sparse tensor's document (#2714)

* Reserve all state in OptimMethod when calling Optimizer.optimize() multiple times (#2648)

* reserve optimMethod for each worker

* add valdiation throughput

* cache variable previousOptim

* fix: move mkldnn computing to a single thread pool (intel-analytics#2724)

Because if we use the parent thread directly, there will be two bugs,
1. The child threads forked from parent thread will be bound to core 0
because of the affinity settings.
2. The native thread has some unknown thread local variables. So if
the parent thread exits and is recreated, such as the thread from
Executors.newFixedThreadPool. The whole app will be segment fault.
The parent thread means the main thread (Local Mode) or worker thread of
mapPartition (Distributed Mode).

* add ceilMode for Pooling & fix batchNorm evaluate (#2708)

* add ceilMode for Pooling & fix batchNorm evaluate

* add training status for dnn layer

* fix comments

* fix IRGraph init & Add regualizer (#2736)

* fix IRGraph init & Add regualizer

* meet review comments

* fix: update mkldnn version to v0.17 issues. (intel-analytics#2712)

There're two issues,

1. the padding tensor required. mkl-dnn will use a padding tensor which
    will use more memory, such as 4x1x28x28 to 4x8x28x28(avx2). It will
    pad to times of simd width.
2. the TensorMMap between DenseTensor and DnnTensor. Previous impl
    will allocate DnnTensor when model is created, which will cost too much
    space. So this patch will allocate it at runtime.

* add computshape for some layers and add skip primitives in DnnGraph (intel-analytics#2740)

* add computshape for some layer and add skip primitives in DnnGraph

* meet pr comments

* Improve documentation (intel-analytics#2745)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* include edge case to cover all the data types (#2742)

* layer auto fusion for dnn graph (intel-analytics#2746)

* add auto fusion in dnn graph

* refactor predict for dnn model (intel-analytics#2737)

* refactor predict for dnn model

* remove some unit tests (intel-analytics#2752)

* remove some conflict tests (#2753)

* Update documentation (intel-analytics#2749)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Fix Add operation error when type is Double importing Tensorflow graph (#2721)

* feature: add byte supports for DnnTensor (intel-analytics#2751)

* feat: add byte supports for DnnTensor

* [New Feature] Calculating Scales (#2750)

* [New Feature]Calculating Scales

* recursively update mask for container module (intel-analytics#2754)

* recursively update mask for container module

* [Enhancement] - Speed up BlasWrapper performance under MKL-DNN (intel-analytics#2748)

* add parallel in Blaswrapper

* refactor to support ssd

* meet pr comments

* fix logger serialize

* Loss Function docs improvement (intel-analytics#2757)

* Improve Loss Function docs v2

* change asInstanceOf to toDistirbuted in optimizer (#2755)

* change asInstanceOf to toDistirbuted

* change asInstanceOf to toDistirbuted

* convert scale in blas to dnn (#2758)

* convert scale in blas to dnn

* meet pr comment

* feat: reorder for int8 supports (#2756)

1. Because the new data type, we should add a new attribute called dataType
    to the `MemoryData`.
2. Because we should transfer the scales between FP32->int8 and Int8->FP32.
    we should add two new attributes called `mask` and `scales`.

* fix conversion accuracy (intel-analytics#2760)

*  fix accuracy for saved model

* exclude mkldnn model when conversion

* feature: layer wise supports of int8 (intel-analytics#2762)

Enable the int8 data type in layers, especially for convolutions.
So for a specific layer, it can accept a int8 input. If you want to the fp32
output, should add a reorder.

* feature: mkldnn int8 layer wise supports (intel-analytics#2759)

including 3 steps.

1. generate scales of model.
   need an api like `generateScalesWithMask` to generate the scales of
   fp32 model. and the model returned is an fp32 model too.
2. quantize the model
   the `quantize()` api will be compatible with the `bigquant`
   backend, which will set the quantize flag. And when doing compile,
   the quantized weight, output, input will be generated by mkldnn at
   runtime.
3. do the inference (forward).

* update readme for v1 training (intel-analytics#2763)

* update release doc for preparation (intel-analytics#2764)

* change some docs about mkldnn (intel-analytics#2765)

* add comments about mkldnn

* meet pr comments

* examples for int8 (intel-analytics#2761)

This is an example of how to use mkldnn int8. There're two steps, use
GenInt8Scales to generate the scales first and save the new model. And than you
can use the quantized model as usual.

* enable fustion by default (intel-analytics#2766)

* fix: the influence of default value of fusion (#2768)

* fix: use too much memory of mkldnn models (intel-analytics#2783)

* fix: inplace of input/output and weight dimension error (intel-analytics#2779)

Some layer's input and output use the same memory. We can't do forward in the
`calcScales`. Because at that time, the input has been changed, its scales maybe
not right. Such as,

Seqeuntail().add(Conv).add(ReLU)

it will do two steps, seq.forward(input) first. and when go into the ReLU, it
will do another forward, so the input will be the output. And scales will be
wrong.

For convolution's weight, the dimension always is 5, although the group number
is 1. But for dnn convolution, if there's no group, the weight's dimension
should be 4.

* fix: the blas wrapper has no scales (intel-analytics#2778)

* fix softmax (intel-analytics#2777)

* fix: performance regression on resnet50 (intel-analytics#2774)

the u8 to s8 or s8 to u8 needs no reorder on this case.

* fix log init (#2781)

* fix: dropout should init primitive (#2789)

* Docs update for spark 2.3, build 0.7 and deps exlude (intel-analytics#2671)

* flip to 0.9.0 (intel-analytics#2792)

* Improve Layer documentation v1 (#2767)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Loss Function docs improvement v1

* Improve Loss Function docs v2

* Improve Layers documentation

* Improve documentation on Activations

* minor fix

* Update a code section with python style on Metrics.md (intel-analytics#2665)

* [Fix] doc : some changes for scalaUserGuide and release links according to … (intel-analytics#2791)

* doc : some changes for scalaUserGuide and release links according to v0.8.0 release

* Update build-bigdl-core.md

* Update build-bigdl-core.md

* test: should compare the right grad input (intel-analytics#2794)

* fix the wrong error message (#2800)

* [New feature] Add attention layer and ffn layer (intel-analytics#2795)

* add attention layer

* add ffn layer and more unit tests

* refactor according to pr comments

* add SerializationTest

* fix unit tests

* add python api

* update readme with newly adopted mkl-dnn (#2803)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer (intel-analytics#2802)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer:
Add LARS optimizer: Layer-wise scaled. Also with utility functions to build a set of LARS optim for a container.

Bug fix: The gradient block id of AllReduceParameter is originally composed of {id}{pidTo}gradientBytes{pidFrom}. But the combination of {id}{pidTo} will cause ambiguity. e.g., "112" can be {1}{12} or {11}{2}. Now a "_" is added to separate id from pidTo

* refine documents, correctly set the lrSchedulerOwner bit

* format the added code

* make Lars inherit SGD

* rename Lars -> LarsSGD and reformat

* style changes

* bugfix - set mask for container (intel-analytics#2807)

* bugfix - set mask for container

* bugfix #2805: set dimension mask

* Update Graph.scala

* Update Graph.scala

* change set mask indicator's name

* rename set mask params

* [Enhancement]: Scala Reflection: get default value for constructor parameters (intel-analytics#2808)

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* resolve conflict

* resolve conflict

* code style check

* remove print

* fix typos

fix typos

* replace randomcropper with centercrop for better performance (#2818)

* fix: memory data hash code should contain data type (intel-analytics#2821)

* Optimize backward graph generation and CAddTable (intel-analytics#2817)

* Optimize backward graph generation and caddtable

* refine add table

* change api name

* add layer norm and expand size layers (#2819)

* add layer norm and expand size

* meet pr comments

* feat: enable global average pooling (intel-analytics#2823)

* feat: enable global average pooling

* test: add more unit tests

* Optimizers: use member variable in parent class

* Revert "Optimizers: use member variable in parent class"

This reverts commit 7e47204

* Dilation in MKL-DNN Convolution (intel-analytics#2815)

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* fix typos

fix typos

* make todo all uppercase

* fix: calculate arbitrary mask of scales (intel-analytics#2822)

* Use one AllReduceParameter for multi-optim method  training (intel-analytics#2814)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* change random seed in UT

* [New feature] add transformer layer (intel-analytics#2825)

* add transformer

* refactor class name

* use same embedding for translation

* fix pr comments

* [Bug Fix] Fix Issue 2734 (#2816)

* fix issue 2734

* fix issue 2734

* fix issue 2734

* [Refactor] Reflection Utilization (#2831)

* refactor reflection utils

* refactor reflection utils

* feat: MKLDNN LSTM unidirectional/bidirectional inference support (intel-analytics#2806)

* LSTM draft

* MKLDNN LSTM fixed MD

* added hiddenSize

* setMemoryData NativeData

* weights NativeData format set to ldigo, all 1 test passed

* fixed format any problem

* LSTM weights bias initialisation

* add LSTM2 in nn

* Bidirectional LSTM inference enabled

* modified Bidirectional test

* LSTMSpec input format conversion bug between bigdl and mkldnn fixed, not support random weights, bias

* fixed the last problem 1 3 2 4

* Three inference tests with randomly generated parameters

* Added comments and modified the LSTMSpec (tests using Equivalent.nearequals)

* Deleted nn/LSTM2. Renamed methods. Added a requirement in nn/TimeDistributed

* combined initMemoryDescs() into initFwdPrimitives()

* Add require for input size and hidden size matching if layers of LSTM is more than one

* Refactor RNN

* Add comment on gate order to mkldnn/RNN

* Add unidirectional multilayer test

* add comments/ modify UTs

* phase is not used anymore/ use isTraining() in stead

* operationWant enhanced/ weight init/ release() parameters()

* remove input format check and change some variables names

* input format check / throw exception print info / release code

* comment style and RNNSerialTest

* remove unnecessary comments

* Softmax -> SoftMax (#2837)

* bug fix for cmul (intel-analytics#2836)

* bug fix for cmul

* meet pr comments

* set new storage to weight and bias for weight fusion (intel-analytics#2839)

* Add parameter processor for LARS (#2832)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* implement lars whole layer gradient norm calculation

* change random seed in UT

* add limitation on "trust" of LARS, remove debug output

* reformat

* add tests in DirtriOptimizer for LARS

* reformat

* update parameters in UT

* update parameters in UT

* Add transformer to LM example (intel-analytics#2835)

* add transformer to LM example

* refactor dropout in Transformer

* meet pr comments

* feat: MKLDNN LSTM unidirectional/bidirectional backward support (#2840)

* MKLDNN LSTM backward support with accuracy testing

* fix: require consistent between shape and layout of mkldnn (intel-analytics#2824)

* fix: fusion for multi-group of convolution (intel-analytics#2826)

* fix: support int8 of jointable (#2827)

* fix: support int8 of jointable
* doc: add more docs

* fix: invokeAndWait2 should throw the exception in the tasks (intel-analytics#2843)

* fix acc bug & init dnn thread (intel-analytics#2841)

* support tnc and ntc conversion (intel-analytics#2844)

* support ntc in dnn layer (intel-analytics#2847)

* support ntc in dnn layer

* meet pr comments

* [WIP]Add beam search feature in transformer model (intel-analytics#2834)

* add beam search feature

* Update beam search feature and unit test

* add symbolToLogits function set check

* update clearState and add serial test

* add SequenceBeamSearch to python layers

* add createSequenceBeamSearch method to python api

* feat: add a property to disable omp thread affinity (intel-analytics#2849)

* fix: use treeset to calc topk to upgrade the performance of DetectionOutputSSD (intel-analytics#2853)

* fix: wrong affinity settings (intel-analytics#2857)

* update beam search feature for interface with transformer model (#2855)

* update beam search for padding value and cache structure

* update python API for beam search

* add comments and update python layer

* modify comments format

* modify comments format

* Support converting blas lstm to dnn lstm (#2846)

* convert from blas lstm to dnn lstm

* meet pr comments

* fix load lstm error bug (intel-analytics#2858)

* Add beam search in transformer (intel-analytics#2856)

* Add beam search in transformer

* meet pr comments

* fix: upgrade the performance of normalize (intel-analytics#2854)

* feat: add axis to softmax (intel-analytics#2859)

* add release doc for 0.9 (intel-analytics#2862)

* fix: update core ref to master (intel-analytics#2865)

* flip version to 0.10.0 (intel-analytics#2869)

* [Bug Fix] - Fix module version comparison  (intel-analytics#2871)

* update serialization

* update serialization

* convert IRgraph momentum to mkldnn (intel-analytics#2872)

* tutorial fix (intel-analytics#2879)

* feat: RoiAlign Forward (intel-analytics#2874)

* Add set input output format API in Python (intel-analytics#2880)

* add set input output format

* add static graph check

* feat: Feature Pyramid Networks Forward (intel-analytics#2870)

* fix memory leak for ir graph training (intel-analytics#2895)

* add gemm layer (#2882)

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add gemm layer

* add gemm layer

* add Shape layer (intel-analytics#2885)

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add Gather layer (intel-analytics#2897)

* add gather layer

* [New feature] Add maskhead (intel-analytics#2892)

* support for maskhead

* fix unit tests (intel-analytics#2905)

* modify  predict/predictClass function  (#2868)

* predictClass output modification

* predict/predictClass function modification in Beta Api

* predict/predictClass function modification

* predict/predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* [New feature] Add Boxhead (intel-analytics#2894)

* add boxhead

* add SerialTest

* meet pr comments

* fix: Add TopBlocks to Feature Pyramid Networks (FPN) (#2899)

* Add Mean Average Precision validation method (intel-analytics#2906)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* fix boxhead unit tests (#2912)

* python api nested list input and pooler python api (intel-analytics#2900)

* Auto memory management for MKLDNN (#2867)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* style fixes

* change _implicitMemoryOwner -> _this

* [New feature] Add region proposal (intel-analytics#2896)

* add Regionproposal

* [New feature] add maskrcnn (#2908)

* add maskrcnn

* fix mask head

* move maskrcnn to models

* add maskrcnn serialTest

* Add Onnx Supported Layers (intel-analytics#2902)

* remove duplicated layers

* Update RoiLabel class and add RoiImageFeatureToBatch (intel-analytics#2913)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* update RoiLabel, add RoiImageFeatureToBatch

* fix typo in class name

* updates by suggestions

* minor updates

* Move RoiMiniBatch to MTImageFeatureToBatch.scala

* mask in RoiLabel now have Floats not Bytes

* use IndexedSeq for RoiLabel

* style fix

* add isCrowd and origSize to final target table

* style fix

* isCrowd change to float, add doc

* add tests and bug fixes

* add util getting RoiLabels from ImageFeatures

* add util getting RoiLabels from Table

* comment out the tests

* rename utils in RoiLabel

* feat: MKLDNN GRU forward/backward support (#2893)

* Onnx support: modify unsqueeze function (#2910)

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* add maskutils (intel-analytics#2921)

* add maskutils

* update tests & docs

* fix typo in document

* Fix memory leaks on training (intel-analytics#2914)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* fix memory leak in batch norm

* style fixes

* change _implicitMemoryOwner -> _this

* release submat

* release opencv submats

* support samples with different size  to one mini batch (intel-analytics#2929)

* add to batch with resize

* meet comments

* support batch for mask head and pooler (intel-analytics#2926)

* support batch for mask head

* meet comments

* Onnx support: add a dim parameter to ops.Gather (intel-analytics#2920)

* add dim parameter to ops.Gather

* improve and simplify code

* improve and simplify code

* improve and simplify code

* improve and simplify code

* support batch for regionproposal (#2928)

* support batch for regionproposal

* enable gru blas-to-dnn conversion (intel-analytics#2930)

* Onnx support: add pos parameter to softmax (intel-analytics#2933)

* add pos parameter to softmax

* add pos parameter to softmax

* add pos parameter to softmax

* fix review problem

* fix review problem

* Add resize for segmentation (intel-analytics#2923)

* add resize for segmentation

* meet pr comments

* support batch input for boxhead (#2924)

* boxhead support batch input

* meet pr comments

* COCO SeqFile (intel-analytics#2927)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* ignore non-existing images

* updates based on GH comments

* ONNX Support (#2918)

* onnx dev

* add onnx loader

* clean up

* feat: add precision recall auc (#2941)

* feat: add precision recall auc

* add post processing for maskrcnn model (#2931)

* add mask postprocessing

* put image info to mask model

* fix TimeDistributedCriterion() lack of parameter of dimension issue (intel-analytics#2940)

* revert back api (intel-analytics#2943)

* fix: softmax and bn+scale fusion (intel-analytics#2937)

* feat: multi models support with MKL-DNN backend (intel-analytics#2936)

* feat: multi models support with MKL-DNN backend

* add COCO MAP (#2935)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* add COCO MAP

* revert merge conflict

* ignore non-existing images

* add IOU related API. MAP now parses RLEs

* BBox now inclusive

* updates based on GH comments

* add COCODataset.getImageById

* COCO topK default => -1, remove height: Int, width: Int in GroundTruthRLE

* update imageId2Image

* rename MAPObjectDetection utils, add cocoSegmentationAndBBox, refine formatting

* rename utils

* update documents

* check size of bbox & classes & scores & labels & iscrowd. Handle empty predictions

* add gt and target image size checking, add support for empty target bbox, add UT

* detection sorted before matching with GT. Optimize MAPResult merging. Add UT for merging

* COCO Seq file reader: grey to bgr (intel-analytics#2942)

* grey to bgr

* refactor isGrayScaleImage

* simplify grey scale image checking

* Add the flushing denormal values option on BigDL side (#2934)

* add no argument apply api for softmax (intel-analytics#2945)

* add no argument apply api for softmax

* add no argument apply api for softmax

* ONNX ResNet example (intel-analytics#2939)

* add onnx resnet example

* add doc for onnx

* add doc for onnx

* clean up

* add maskrcnn inference example (intel-analytics#2944)

* add maskrcnn inference example

* meet pr comments

* add model download url

* Update the RoiLabel and MTImageFeatureToBatch (intel-analytics#2925)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* Python MKLDNN examples for CNN(LeNet) and RNN(LSTM) (#2932)

* fix: takeSample only works for dnn backend and get one batch (intel-analytics#2947)

* fix: takeSample only works for dnn backend and get one batch

* edit doc (#2948)

* Rename filesToRoiImageFrame to filesToRoiImageFeatures (intel-analytics#2949)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* filesToRoiImageFrame -> filesToRoiImageFeatures, to public

* fix: move out setMklThreads of MklDnn (intel-analytics#2950)

* memory data cleanup (#2956)

* memory data cleanup

* Onnx support: RoiAlign and TopK parameter update (#2957)

* Topk add dim and increase parameter

* RoiAlign add max pooling mode

* add test cases

* add test cases

* remove masks requirements (intel-analytics#2959)

* fix: the squeeze should not be included in IRElement (intel-analytics#2962)

* enhance COCODataset (#2954)

* enhance COCODataset:
Add COCODataset.loadFromSeqFile
Add COCODataset.toImageFeatures
Add COCOImage.toTable

* rename and polish doc

* fix COCO serialize bug

* fix typo in function name

* typo fix (intel-analytics#2965)

* rename RoiImageFeatureToBatch APIs (#2964)

* RoiMiniBatch enhancement (#2953)

* SerializableIndexedSeq

* allow empty target & image size info

* rename RoiImageFeatureToBatch APIs

* set as private

* change back to array

* MTImageFeatureToBatch without labels

* handle iscrowd

* remove duplication in merge

* feat: add softmax backward (intel-analytics#2967)

* feat: add softmax backward

* fix: fuse bn scale and relu to bn. (intel-analytics#2966)

* fix: fuse bn scale and relu.

* fix mask unit tests (intel-analytics#2973)

* fix: nms stability when using treeset. (intel-analytics#2972)

* flip version to 0.11 (intel-analytics#2974)

* refactor anchor generator (#2963)

* refactor anchor generator

* meet pr comments

* fix code style

* ROIAlign refactor (intel-analytics#2960)

* ROIAlign refactor

* fix unit tests

* fix model load of maskrcnn (intel-analytics#2961)

* fix maskrcnn model load

* delete temp file

* fix maskrcnn tests

* support roialign backward (intel-analytics#2975)

* support roialign backward

* fix sparselinear unit test

* fix: bn nhwc error, the channel should be the last dim (#2981)

* refactor: move torch relevants unit tests to integration tests. (intel-analytics#2971)

* fix: enable integration accuracy tests (intel-analytics#2976)

* fix: softmax dnn backend wrong order of primitive (intel-analytics#2986)

* modify TextClassifier.scala (#2987)

* Add a method to merge nested StaticGraphs (intel-analytics#2985)

* NHWC support when running with MKL-DNN (#2989)

* support NHWC for MKLDNN

* fix unit tests

* Keras with MKL-DNN backend support (#2990)

* Update README.md

* Update README.md

* feat: add distri optimizer v2 (intel-analytics#2992)

* update error message in AllReduceParameter (#2997)

* update error message in AllReduceParameter

* use tensorflow proto jar (#2994)

* fix callBigDLFunc (intel-analytics#3002)

* Remove final for AbstractModule (intel-analytics#3001)

* DistriOptimizerV2 argument (intel-analytics#3003)

* call DistriOptimizerV2

* fix inception (intel-analytics#3010)

* fix top1 and treenn (intel-analytics#3011)

* remove final setExtraParameters (#3014)

* move pretrain in DistriOptimizerV2 (intel-analytics#3016)

* move getData

* rename

* remove time counting

* deprecate dlframe (intel-analytics#3012)

* deprecate dlframe

* fix throughput (#3017)

* fix throughput

* update

* add release doc for 0.10.0 (intel-analytics#3020)

* test examples by distrioptimizerv2 (intel-analytics#3007)

* enable scala examples by distrioptimizerv2

* update example's readme

* update integration test

* test python examples by distriOptimizerV2 (intel-analytics#3008)

* Test python examples by distriOptimizerV2

* deprecate nn.keras (intel-analytics#3013)

* deprecate nn.keras

* fix loss when minibatch size is different (intel-analytics#3021)

* fix loss

* fix ut

* fix style check (intel-analytics#3022)

* specify pyspark version (intel-analytics#3030)

* specify pyspark version

* add release doc for 0.11 (#3026)

* flip version to 0.12 (intel-analytics#3029)



* update

* fix KerasLayer new parameters() (#3034)

* Fix analytics zoo protobuf shading problem (intel-analytics#3033)

* change shade name and remove protobuf-java (already introduced by tf)

* remove protobuf

* add required dependencies (#3047)

* update doc (intel-analytics#3056)

* Updatedoc (#3060)

* Update install-from-pip.md

* [WIP] spark 3.0 (intel-analytics#3054)

* spark 3.0

* add spark3.0 deployment (intel-analytics#3061)

* add spark3.0 deployment

* add warning to remind Optimizer() deprecates (intel-analytics#3062)

* add warning to remind deprecates

* Update scala maven plugin (#3068)

* update scala maven plugin

* change to public (#3064)

* Add big model support (#3067)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* squeeze target dimension (corner case) in ClassNLLCriterion (intel-analytics#3072)

* fix target dimension match error

* update message (#3073)

* flip version to 0.13-snapshot (intel-analytics#3074)

* flip version to 0.13-snapshot

* Uncompressed Tensor  (intel-analytics#3079)

* support no compressing parameter

* address comments

* hotfix ClassNLLCriterion with cloned target (#3081)

* hotfix ClassNLLCriterion with cloned target

* Fix SerializationUtils clone issue of QuantizedTensor (intel-analytics#3088)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* update clone quantizedtensor

* update

* add OptimPredictorShutdownSpec UT in integration test (#3089)

* move integration UT to a general test script (intel-analytics#3094)

* back port master (intel-analytics#3096)

* set seed to avoid random error in PredictionServiceUT (intel-analytics#3097)

* Jdk11 support (intel-analytics#3098)

* update for jdk 11 support and doc

* add serializeUid (intel-analytics#3099)

* update doc (intel-analytics#3104)

* add doc for running in ide (intel-analytics#3106)

* fix callBigDLFunc return a Int while the true return value from java is a byte array. (intel-analytics#3111)

* add list of df support (intel-analytics#3113)

* Update readme (intel-analytics#3118)

* Update index.md

* add 0.12.2 release download (#3122)

* remove DLFrames (intel-analytics#3124)

* remove DLFrames

* update

* update

* update

* rm dlframe example from test script

* Add Utest about dividing zero (#3128)

* Add Utest about dividing zero

* add Utest and zero check of LocalData

* add Utest and zero check of LocalData

* change

* Add Utest about dividing zero

* fix test

* add python3 to Dockerfile (intel-analytics#3132)

* add python3 to Dockerfile

* update

* update jdk

* update

* make default DistriOptimizer as V2 (intel-analytics#3129)

* make default DistriOptimizer as V2

* update

* fix dlframe (intel-analytics#3133)

* DistriOptimizerV2 logger (intel-analytics#3135)

* DistriOptimizerV2 logger

* update

* fix style check

* validate epoch num

* move dlframe SharedParamsApater to AZ and roll back to OptimizerV1 (intel-analytics#3137)

* upgrade spark version (intel-analytics#3138)

* Update deploy-spark2.sh

* 0.13 release doc (#3144)

* upgrade log4j (intel-analytics#3141)

* flip0.14 (intel-analytics#3142)

* flip0.14

* update

* Update deploy-spark3.sh (#3145)

* update

* update

* update

* update

* fix make dist

* migrate path

* update

* update

Co-authored-by: zhangxiaoli73 <380761639@qq.com>
Co-authored-by: Jerry Wu <wzhongyuan@gmail.com>
Co-authored-by: Xin Qiu <qiuxin2012@users.noreply.github.com>
Co-authored-by: Yanzhang Wang <i8run15@gmail.com>
Co-authored-by: GenBrg <34305977+GenBrg@users.noreply.github.com>
Co-authored-by: LeicongLi <leicongli@gmail.com>
Co-authored-by: Emiliano Martinez <emimartinez.sanchez@gmail.com>
Co-authored-by: abdolence <abdulla.abd.m@gmail.com>
Co-authored-by: Enrique Garcia <engapa@gmail.com>
Co-authored-by: Louie Tsai <louie.tsai@intel.com>
Co-authored-by: yaochi <yaochitc@gmail.com>
Co-authored-by: Menooker <Menooker@users.noreply.github.com>
Co-authored-by: Menooker <myjisgreat@live.cn>
Co-authored-by: Firecrackerxox <mengceng.he@intel.com>
Co-authored-by: majing921201 <1834475657@qq.com>
Co-authored-by: jenniew <jenniewang123@gmail.com>
Co-authored-by: Xiao <lingxiao1989@gmail.com>
Co-authored-by: Firecrackerxox <he044646@sina.com>
Co-authored-by: Hui Li <lihuibinghan@sina.com>
Co-authored-by: Jason Dai <jason.dai@intel.com>
Co-authored-by: dding3 <ding.ding@intel.com>
Co-authored-by: Yang Wang <yang3.wang@intel.com>
Co-authored-by: Yina Chen <33650826+cyita@users.noreply.github.com>
Co-authored-by: Hangrui Cao <50705298+DiegoCao@users.noreply.github.com>
Co-authored-by: pinggao18 <44043817+pinggao18@users.noreply.github.com>
liu-shaojun pushed a commit that referenced this pull request Mar 6, 2024
@liu-shaojun liu-shaojun deleted the branch intel-analytics:master March 6, 2024 09:23
@liu-shaojun liu-shaojun closed this Mar 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants