Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster Serving preprocessing refine #3062

Merged
merged 3 commits into from
Nov 11, 2020
Merged

Conversation

Litchilitchy
Copy link
Contributor

No description provided.

Copy link
Contributor

@qiyuangong qiyuangong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Litchilitchy Litchilitchy merged commit 752e758 into intel:master Nov 11, 2020
dding3 pushed a commit to dding3/analytics-zoo that referenced this pull request Jul 27, 2021
dding3 pushed a commit to dding3/analytics-zoo that referenced this pull request Jul 30, 2021
dding3 pushed a commit to dding3/analytics-zoo that referenced this pull request Aug 9, 2021
dding3 pushed a commit to dding3/analytics-zoo that referenced this pull request Aug 12, 2021
dding3 pushed a commit to dding3/analytics-zoo that referenced this pull request Aug 16, 2021
dding3 pushed a commit to dding3/analytics-zoo that referenced this pull request Aug 16, 2021
dding3 pushed a commit to dding3/analytics-zoo that referenced this pull request Aug 17, 2021
dding3 pushed a commit to dding3/analytics-zoo that referenced this pull request Aug 18, 2021
dding3 pushed a commit to dding3/analytics-zoo that referenced this pull request Aug 24, 2021
dding3 pushed a commit to dding3/analytics-zoo that referenced this pull request Aug 24, 2021
dding3 pushed a commit to dding3/analytics-zoo that referenced this pull request Aug 24, 2021
dding3 pushed a commit to dding3/analytics-zoo that referenced this pull request Aug 25, 2021
dding3 pushed a commit to dding3/analytics-zoo that referenced this pull request Aug 25, 2021
Le-Zheng added a commit that referenced this pull request Sep 1, 2021
* convert static graph to IR graph and build (#2711)

* add static graph to IR graph

* meet pr comments

* [Enhancement] - Enhance unig test to avoid dynamic resource allocation issue by docker (#2713)

* make the core number fixed

* fix local predictor

* add Trigger and/or python API (#2682)

* add spark 2.4 support (#2715)

* update sparse tensor's document (#2714)

* Reserve all state in OptimMethod when calling Optimizer.optimize() multiple times (#2648)

* reserve optimMethod for each worker

* add valdiation throughput

* cache variable previousOptim

* fix: move mkldnn computing to a single thread pool (#2724)

Because if we use the parent thread directly, there will be two bugs,
1. The child threads forked from parent thread will be bound to core 0
because of the affinity settings.
2. The native thread has some unknown thread local variables. So if
the parent thread exits and is recreated, such as the thread from
Executors.newFixedThreadPool. The whole app will be segment fault.
The parent thread means the main thread (Local Mode) or worker thread of
mapPartition (Distributed Mode).

* add ceilMode for Pooling & fix batchNorm evaluate (#2708)

* add ceilMode for Pooling & fix batchNorm evaluate

* add training status for dnn layer

* fix comments

* fix IRGraph init & Add regualizer (#2736)

* fix IRGraph init & Add regualizer

* meet review comments

* fix: update mkldnn version to v0.17 issues. (#2712)

There're two issues,

1. the padding tensor required. mkl-dnn will use a padding tensor which
    will use more memory, such as 4x1x28x28 to 4x8x28x28(avx2). It will
    pad to times of simd width.
2. the TensorMMap between DenseTensor and DnnTensor. Previous impl
    will allocate DnnTensor when model is created, which will cost too much
    space. So this patch will allocate it at runtime.

* add computshape for some layers and add skip primitives in DnnGraph (#2740)

* add computshape for some layer and add skip primitives in DnnGraph

* meet pr comments

* Improve documentation (#2745)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* include edge case to cover all the data types (#2742)

* layer auto fusion for dnn graph (#2746)

* add auto fusion in dnn graph

* refactor predict for dnn model (#2737)

* refactor predict for dnn model

* remove some unit tests (#2752)

* remove some conflict tests (#2753)

* Update documentation (#2749)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Fix Add operation error when type is Double importing Tensorflow graph (#2721)

* feature: add byte supports for DnnTensor (#2751)

* feat: add byte supports for DnnTensor

* [New Feature] Calculating Scales (#2750)

* [New Feature]Calculating Scales

* recursively update mask for container module (#2754)

* recursively update mask for container module

* [Enhancement] - Speed up BlasWrapper performance under MKL-DNN (#2748)

* add parallel in Blaswrapper

* refactor to support ssd

* meet pr comments

* fix logger serialize

* Loss Function docs improvement (#2757)

* Improve Loss Function docs v2

* change asInstanceOf to toDistirbuted in optimizer (#2755)

* change asInstanceOf to toDistirbuted

* change asInstanceOf to toDistirbuted

* convert scale in blas to dnn (#2758)

* convert scale in blas to dnn

* meet pr comment

* feat: reorder for int8 supports (#2756)

1. Because the new data type, we should add a new attribute called dataType
    to the `MemoryData`.
2. Because we should transfer the scales between FP32->int8 and Int8->FP32.
    we should add two new attributes called `mask` and `scales`.

* fix conversion accuracy (#2760)

*  fix accuracy for saved model

* exclude mkldnn model when conversion

* feature: layer wise supports of int8 (#2762)

Enable the int8 data type in layers, especially for convolutions.
So for a specific layer, it can accept a int8 input. If you want to the fp32
output, should add a reorder.

* feature: mkldnn int8 layer wise supports (#2759)

including 3 steps.

1. generate scales of model.
   need an api like `generateScalesWithMask` to generate the scales of
   fp32 model. and the model returned is an fp32 model too.
2. quantize the model
   the `quantize()` api will be compatible with the `bigquant`
   backend, which will set the quantize flag. And when doing compile,
   the quantized weight, output, input will be generated by mkldnn at
   runtime.
3. do the inference (forward).

* update readme for v1 training (#2763)

* update release doc for preparation (#2764)

* change some docs about mkldnn (#2765)

* add comments about mkldnn

* meet pr comments

* examples for int8 (#2761)

This is an example of how to use mkldnn int8. There're two steps, use
GenInt8Scales to generate the scales first and save the new model. And than you
can use the quantized model as usual.

* enable fustion by default (#2766)

* fix: the influence of default value of fusion (#2768)

* fix: use too much memory of mkldnn models (#2783)

* fix: inplace of input/output and weight dimension error (#2779)

Some layer's input and output use the same memory. We can't do forward in the
`calcScales`. Because at that time, the input has been changed, its scales maybe
not right. Such as,

Seqeuntail().add(Conv).add(ReLU)

it will do two steps, seq.forward(input) first. and when go into the ReLU, it
will do another forward, so the input will be the output. And scales will be
wrong.

For convolution's weight, the dimension always is 5, although the group number
is 1. But for dnn convolution, if there's no group, the weight's dimension
should be 4.

* fix: the blas wrapper has no scales (#2778)

* fix softmax (#2777)

* fix: performance regression on resnet50 (#2774)

the u8 to s8 or s8 to u8 needs no reorder on this case.

* fix log init (#2781)

* fix: dropout should init primitive (#2789)

* Docs update for spark 2.3, build 0.7 and deps exlude (#2671)

* flip to 0.9.0 (#2792)

* Improve Layer documentation v1 (#2767)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Loss Function docs improvement v1

* Improve Loss Function docs v2

* Improve Layers documentation

* Improve documentation on Activations

* minor fix

* Update a code section with python style on Metrics.md (#2665)

* [Fix] doc : some changes for scalaUserGuide and release links according to … (#2791)

* doc : some changes for scalaUserGuide and release links according to v0.8.0 release

* Update build-bigdl-core.md

* Update build-bigdl-core.md

* test: should compare the right grad input (#2794)

* fix the wrong error message (#2800)

* [New feature] Add attention layer and ffn layer (#2795)

* add attention layer

* add ffn layer and more unit tests

* refactor according to pr comments

* add SerializationTest

* fix unit tests

* add python api

* update readme with newly adopted mkl-dnn (#2803)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer (#2802)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer:
Add LARS optimizer: Layer-wise scaled. Also with utility functions to build a set of LARS optim for a container.

Bug fix: The gradient block id of AllReduceParameter is originally composed of {id}{pidTo}gradientBytes{pidFrom}. But the combination of {id}{pidTo} will cause ambiguity. e.g., "112" can be {1}{12} or {11}{2}. Now a "_" is added to separate id from pidTo

* refine documents, correctly set the lrSchedulerOwner bit

* format the added code

* make Lars inherit SGD

* rename Lars -> LarsSGD and reformat

* style changes

* bugfix - set mask for container (#2807)

* bugfix - set mask for container

* bugfix #2805: set dimension mask

* Update Graph.scala

* Update Graph.scala

* change set mask indicator's name

* rename set mask params

* [Enhancement]: Scala Reflection: get default value for constructor parameters (#2808)

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* resolve conflict

* resolve conflict

* code style check

* remove print

* fix typos

fix typos

* replace randomcropper with centercrop for better performance (#2818)

* fix: memory data hash code should contain data type (#2821)

* Optimize backward graph generation and CAddTable (#2817)

* Optimize backward graph generation and caddtable

* refine add table

* change api name

* add layer norm and expand size layers (#2819)

* add layer norm and expand size

* meet pr comments

* feat: enable global average pooling (#2823)

* feat: enable global average pooling

* test: add more unit tests

* Optimizers: use member variable in parent class

* Revert "Optimizers: use member variable in parent class"

This reverts commit 7e47204

* Dilation in MKL-DNN Convolution (#2815)

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* fix typos

fix typos

* make todo all uppercase

* fix: calculate arbitrary mask of scales (#2822)

* Use one AllReduceParameter for multi-optim method  training (#2814)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* change random seed in UT

* [New feature] add transformer layer (#2825)

* add transformer

* refactor class name

* use same embedding for translation

* fix pr comments

* [Bug Fix] Fix Issue 2734 (#2816)

* fix issue 2734

* fix issue 2734

* fix issue 2734

* [Refactor] Reflection Utilization (#2831)

* refactor reflection utils

* refactor reflection utils

* feat: MKLDNN LSTM unidirectional/bidirectional inference support (#2806)

* LSTM draft

* MKLDNN LSTM fixed MD

* added hiddenSize

* setMemoryData NativeData

* weights NativeData format set to ldigo, all 1 test passed

* fixed format any problem

* LSTM weights bias initialisation

* add LSTM2 in nn

* Bidirectional LSTM inference enabled

* modified Bidirectional test

* LSTMSpec input format conversion bug between bigdl and mkldnn fixed, not support random weights, bias

* fixed the last problem 1 3 2 4

* Three inference tests with randomly generated parameters

* Added comments and modified the LSTMSpec (tests using Equivalent.nearequals)

* Deleted nn/LSTM2. Renamed methods. Added a requirement in nn/TimeDistributed

* combined initMemoryDescs() into initFwdPrimitives()

* Add require for input size and hidden size matching if layers of LSTM is more than one

* Refactor RNN

* Add comment on gate order to mkldnn/RNN

* Add unidirectional multilayer test

* add comments/ modify UTs

* phase is not used anymore/ use isTraining() in stead

* operationWant enhanced/ weight init/ release() parameters()

* remove input format check and change some variables names

* input format check / throw exception print info / release code

* comment style and RNNSerialTest

* remove unnecessary comments

* Softmax -> SoftMax (#2837)

* bug fix for cmul (#2836)

* bug fix for cmul

* meet pr comments

* set new storage to weight and bias for weight fusion (#2839)

* Add parameter processor for LARS (#2832)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* implement lars whole layer gradient norm calculation

* change random seed in UT

* add limitation on "trust" of LARS, remove debug output

* reformat

* add tests in DirtriOptimizer for LARS

* reformat

* update parameters in UT

* update parameters in UT

* Add transformer to LM example (#2835)

* add transformer to LM example

* refactor dropout in Transformer

* meet pr comments

* feat: MKLDNN LSTM unidirectional/bidirectional backward support (#2840)

* MKLDNN LSTM backward support with accuracy testing

* fix: require consistent between shape and layout of mkldnn (#2824)

* fix: fusion for multi-group of convolution (#2826)

* fix: support int8 of jointable (#2827)

* fix: support int8 of jointable
* doc: add more docs

* fix: invokeAndWait2 should throw the exception in the tasks (#2843)

* fix acc bug & init dnn thread (#2841)

* support tnc and ntc conversion (#2844)

* support ntc in dnn layer (#2847)

* support ntc in dnn layer

* meet pr comments

* [WIP]Add beam search feature in transformer model (#2834)

* add beam search feature

* Update beam search feature and unit test

* add symbolToLogits function set check

* update clearState and add serial test

* add SequenceBeamSearch to python layers

* add createSequenceBeamSearch method to python api

* feat: add a property to disable omp thread affinity (#2849)

* fix: use treeset to calc topk to upgrade the performance of DetectionOutputSSD (#2853)

* fix: wrong affinity settings (#2857)

* update beam search feature for interface with transformer model (#2855)

* update beam search for padding value and cache structure

* update python API for beam search

* add comments and update python layer

* modify comments format

* modify comments format

* Support converting blas lstm to dnn lstm (#2846)

* convert from blas lstm to dnn lstm

* meet pr comments

* fix load lstm error bug (#2858)

* Add beam search in transformer (#2856)

* Add beam search in transformer

* meet pr comments

* fix: upgrade the performance of normalize (#2854)

* feat: add axis to softmax (#2859)

* add release doc for 0.9 (#2862)

* fix: update core ref to master (#2865)

* flip version to 0.10.0 (#2869)

* [Bug Fix] - Fix module version comparison  (#2871)

* update serialization

* update serialization

* convert IRgraph momentum to mkldnn (#2872)

* tutorial fix (#2879)

* feat: RoiAlign Forward (#2874)

* Add set input output format API in Python (#2880)

* add set input output format

* add static graph check

* feat: Feature Pyramid Networks Forward (#2870)

* fix memory leak for ir graph training (#2895)

* add gemm layer (#2882)

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add gemm layer

* add gemm layer

* add Shape layer (#2885)

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add Gather layer (#2897)

* add gather layer

* [New feature] Add maskhead (#2892)

* support for maskhead

* fix unit tests (#2905)

* modify  predict/predictClass function  (#2868)

* predictClass output modification

* predict/predictClass function modification in Beta Api

* predict/predictClass function modification

* predict/predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* [New feature] Add Boxhead (#2894)

* add boxhead

* add SerialTest

* meet pr comments

* fix: Add TopBlocks to Feature Pyramid Networks (FPN) (#2899)

* Add Mean Average Precision validation method (#2906)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* fix boxhead unit tests (#2912)

* python api nested list input and pooler python api (#2900)

* Auto memory management for MKLDNN (#2867)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* style fixes

* change _implicitMemoryOwner -> _this

* [New feature] Add region proposal (#2896)

* add Regionproposal

* [New feature] add maskrcnn (#2908)

* add maskrcnn

* fix mask head

* move maskrcnn to models

* add maskrcnn serialTest

* Add Onnx Supported Layers (#2902)

* remove duplicated layers

* Update RoiLabel class and add RoiImageFeatureToBatch (#2913)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* update RoiLabel, add RoiImageFeatureToBatch

* fix typo in class name

* updates by suggestions

* minor updates

* Move RoiMiniBatch to MTImageFeatureToBatch.scala

* mask in RoiLabel now have Floats not Bytes

* use IndexedSeq for RoiLabel

* style fix

* add isCrowd and origSize to final target table

* style fix

* isCrowd change to float, add doc

* add tests and bug fixes

* add util getting RoiLabels from ImageFeatures

* add util getting RoiLabels from Table

* comment out the tests

* rename utils in RoiLabel

* feat: MKLDNN GRU forward/backward support (#2893)

* Onnx support: modify unsqueeze function (#2910)

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* add maskutils (#2921)

* add maskutils

* update tests & docs

* fix typo in document

* Fix memory leaks on training (#2914)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* fix memory leak in batch norm

* style fixes

* change _implicitMemoryOwner -> _this

* release submat

* release opencv submats

* support samples with different size  to one mini batch (#2929)

* add to batch with resize

* meet comments

* support batch for mask head and pooler (#2926)

* support batch for mask head

* meet comments

* Onnx support: add a dim parameter to ops.Gather (#2920)

* add dim parameter to ops.Gather

* improve and simplify code

* improve and simplify code

* improve and simplify code

* improve and simplify code

* support batch for regionproposal (#2928)

* support batch for regionproposal

* enable gru blas-to-dnn conversion (#2930)

* Onnx support: add pos parameter to softmax (#2933)

* add pos parameter to softmax

* add pos parameter to softmax

* add pos parameter to softmax

* fix review problem

* fix review problem

* Add resize for segmentation (#2923)

* add resize for segmentation

* meet pr comments

* support batch input for boxhead (#2924)

* boxhead support batch input

* meet pr comments

* COCO SeqFile (#2927)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* ignore non-existing images

* updates based on GH comments

* ONNX Support (#2918)

* onnx dev

* add onnx loader

* clean up

* feat: add precision recall auc (#2941)

* feat: add precision recall auc

* add post processing for maskrcnn model (#2931)

* add mask postprocessing

* put image info to mask model

* fix TimeDistributedCriterion() lack of parameter of dimension issue (#2940)

* revert back api (#2943)

* fix: softmax and bn+scale fusion (#2937)

* feat: multi models support with MKL-DNN backend (#2936)

* feat: multi models support with MKL-DNN backend

* add COCO MAP (#2935)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* add COCO MAP

* revert merge conflict

* ignore non-existing images

* add IOU related API. MAP now parses RLEs

* BBox now inclusive

* updates based on GH comments

* add COCODataset.getImageById

* COCO topK default => -1, remove height: Int, width: Int in GroundTruthRLE

* update imageId2Image

* rename MAPObjectDetection utils, add cocoSegmentationAndBBox, refine formatting

* rename utils

* update documents

* check size of bbox & classes & scores & labels & iscrowd. Handle empty predictions

* add gt and target image size checking, add support for empty target bbox, add UT

* detection sorted before matching with GT. Optimize MAPResult merging. Add UT for merging

* COCO Seq file reader: grey to bgr (#2942)

* grey to bgr

* refactor isGrayScaleImage

* simplify grey scale image checking

* Add the flushing denormal values option on BigDL side (#2934)

* add no argument apply api for softmax (#2945)

* add no argument apply api for softmax

* add no argument apply api for softmax

* ONNX ResNet example (#2939)

* add onnx resnet example

* add doc for onnx

* add doc for onnx

* clean up

* add maskrcnn inference example (#2944)

* add maskrcnn inference example

* meet pr comments

* add model download url

* Update the RoiLabel and MTImageFeatureToBatch (#2925)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* Python MKLDNN examples for CNN(LeNet) and RNN(LSTM) (#2932)

* fix: takeSample only works for dnn backend and get one batch (#2947)

* fix: takeSample only works for dnn backend and get one batch

* edit doc (#2948)

* Rename filesToRoiImageFrame to filesToRoiImageFeatures (#2949)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* filesToRoiImageFrame -> filesToRoiImageFeatures, to public

* fix: move out setMklThreads of MklDnn (#2950)

* memory data cleanup (#2956)

* memory data cleanup

* Onnx support: RoiAlign and TopK parameter update (#2957)

* Topk add dim and increase parameter

* RoiAlign add max pooling mode

* add test cases

* add test cases

* remove masks requirements (#2959)

* fix: the squeeze should not be included in IRElement (#2962)

* enhance COCODataset (#2954)

* enhance COCODataset:
Add COCODataset.loadFromSeqFile
Add COCODataset.toImageFeatures
Add COCOImage.toTable

* rename and polish doc

* fix COCO serialize bug

* fix typo in function name

* typo fix (#2965)

* rename RoiImageFeatureToBatch APIs (#2964)

* RoiMiniBatch enhancement (#2953)

* SerializableIndexedSeq

* allow empty target & image size info

* rename RoiImageFeatureToBatch APIs

* set as private

* change back to array

* MTImageFeatureToBatch without labels

* handle iscrowd

* remove duplication in merge

* feat: add softmax backward (#2967)

* feat: add softmax backward

* fix: fuse bn scale and relu to bn. (#2966)

* fix: fuse bn scale and relu.

* fix mask unit tests (#2973)

* fix: nms stability when using treeset. (#2972)

* flip version to 0.11 (#2974)

* refactor anchor generator (#2963)

* refactor anchor generator

* meet pr comments

* fix code style

* ROIAlign refactor (#2960)

* ROIAlign refactor

* fix unit tests

* fix model load of maskrcnn (#2961)

* fix maskrcnn model load

* delete temp file

* fix maskrcnn tests

* support roialign backward (#2975)

* support roialign backward

* fix sparselinear unit test

* fix: bn nhwc error, the channel should be the last dim (#2981)

* refactor: move torch relevants unit tests to integration tests. (#2971)

* fix: enable integration accuracy tests (#2976)

* fix: softmax dnn backend wrong order of primitive (#2986)

* modify TextClassifier.scala (#2987)

* Add a method to merge nested StaticGraphs (#2985)

* NHWC support when running with MKL-DNN (#2989)

* support NHWC for MKLDNN

* fix unit tests

* Keras with MKL-DNN backend support (#2990)

* Update README.md

* Update README.md

* feat: add distri optimizer v2 (#2992)

* update error message in AllReduceParameter (#2997)

* update error message in AllReduceParameter

* use tensorflow proto jar (#2994)

* fix callBigDLFunc (#3002)

* Remove final for AbstractModule (#3001)

* DistriOptimizerV2 argument (#3003)

* call DistriOptimizerV2

* fix inception (#3010)

* fix top1 and treenn (#3011)

* remove final setExtraParameters (#3014)

* move pretrain in DistriOptimizerV2 (#3016)

* move getData

* rename

* remove time counting

* deprecate dlframe (#3012)

* deprecate dlframe

* fix throughput (#3017)

* fix throughput

* update

* add release doc for 0.10.0 (#3020)

* test examples by distrioptimizerv2 (#3007)

* enable scala examples by distrioptimizerv2

* update example's readme

* update integration test

* test python examples by distriOptimizerV2 (#3008)

* Test python examples by distriOptimizerV2

* deprecate nn.keras (#3013)

* deprecate nn.keras

* fix loss when minibatch size is different (#3021)

* fix loss

* fix ut

* fix style check (#3022)

* specify pyspark version (#3030)

* specify pyspark version

* add release doc for 0.11 (#3026)

* flip version to 0.12 (#3029)



* update

* fix KerasLayer new parameters() (#3034)

* Fix analytics zoo protobuf shading problem (#3033)

* change shade name and remove protobuf-java (already introduced by tf)

* remove protobuf

* add required dependencies (#3047)

* update doc (#3056)

* Updatedoc (#3060)

* Update install-from-pip.md

* [WIP] spark 3.0 (#3054)

* spark 3.0

* add spark3.0 deployment (#3061)

* add spark3.0 deployment

* add warning to remind Optimizer() deprecates (#3062)

* add warning to remind deprecates

* Update scala maven plugin (#3068)

* update scala maven plugin

* change to public (#3064)

* Add big model support (#3067)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* squeeze target dimension (corner case) in ClassNLLCriterion (#3072)

* fix target dimension match error

* update message (#3073)

* flip version to 0.13-snapshot (#3074)

* flip version to 0.13-snapshot

* Uncompressed Tensor  (#3079)

* support no compressing parameter

* address comments

* hotfix ClassNLLCriterion with cloned target (#3081)

* hotfix ClassNLLCriterion with cloned target

* Fix SerializationUtils clone issue of QuantizedTensor (#3088)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* update clone quantizedtensor

* update

* add OptimPredictorShutdownSpec UT in integration test (#3089)

* move integration UT to a general test script (#3094)

* back port master (#3096)

* set seed to avoid random error in PredictionServiceUT (#3097)

* Jdk11 support (#3098)

* update for jdk 11 support and doc

* add serializeUid (#3099)

* update doc (#3104)

* add doc for running in ide (#3106)

* fix callBigDLFunc return a Int while the true return value from java is a byte array. (#3111)

* add list of df support (#3113)

* Update readme (#3118)

* Update index.md

* add 0.12.2 release download (#3122)

* remove DLFrames (#3124)

* remove DLFrames

* update

* update

* update

* rm dlframe example from test script

* Add Utest about dividing zero (#3128)

* Add Utest about dividing zero

* add Utest and zero check of LocalData

* add Utest and zero check of LocalData

* change

* Add Utest about dividing zero

* fix test

* add python3 to Dockerfile (#3132)

* add python3 to Dockerfile

* update

* update jdk

* update

* make default DistriOptimizer as V2 (#3129)

* make default DistriOptimizer as V2

* update

* fix dlframe (#3133)

* DistriOptimizerV2 logger (#3135)

* DistriOptimizerV2 logger

* update

* fix style check

* validate epoch num

* move dlframe SharedParamsApater to AZ and roll back to OptimizerV1 (#3137)

* upgrade spark version (#3138)

* Update deploy-spark2.sh

* 0.13 release doc (#3144)

* upgrade log4j (#3141)

* flip0.14 (#3142)

* flip0.14

* update

* Update deploy-spark3.sh (#3145)

* update

* update

* update

* update

* fix make dist

* migrate path

* update

* update

Co-authored-by: zhangxiaoli73 <[email protected]>
Co-authored-by: Jerry Wu <[email protected]>
Co-authored-by: Xin Qiu <[email protected]>
Co-authored-by: Yanzhang Wang <[email protected]>
Co-authored-by: GenBrg <[email protected]>
Co-authored-by: LeicongLi <[email protected]>
Co-authored-by: Emiliano Martinez <[email protected]>
Co-authored-by: abdolence <[email protected]>
Co-authored-by: Enrique Garcia <[email protected]>
Co-authored-by: Louie Tsai <[email protected]>
Co-authored-by: yaochi <[email protected]>
Co-authored-by: Menooker <[email protected]>
Co-authored-by: Menooker <[email protected]>
Co-authored-by: Firecrackerxox <[email protected]>
Co-authored-by: majing921201 <[email protected]>
Co-authored-by: jenniew <[email protected]>
Co-authored-by: Xiao <[email protected]>
Co-authored-by: Firecrackerxox <[email protected]>
Co-authored-by: Hui Li <[email protected]>
Co-authored-by: Jason Dai <[email protected]>
Co-authored-by: dding3 <[email protected]>
Co-authored-by: Yang Wang <[email protected]>
Co-authored-by: Yina Chen <[email protected]>
Co-authored-by: Hangrui Cao <[email protected]>
Co-authored-by: pinggao18 <[email protected]>
dding3 added a commit to dding3/analytics-zoo that referenced this pull request Sep 2, 2021
* convert static graph to IR graph and build (intel#2711)

* add static graph to IR graph

* meet pr comments

* [Enhancement] - Enhance unig test to avoid dynamic resource allocation issue by docker (intel#2713)

* make the core number fixed

* fix local predictor

* add Trigger and/or python API (intel#2682)

* add spark 2.4 support (intel#2715)

* update sparse tensor's document (#2714)

* Reserve all state in OptimMethod when calling Optimizer.optimize() multiple times (#2648)

* reserve optimMethod for each worker

* add valdiation throughput

* cache variable previousOptim

* fix: move mkldnn computing to a single thread pool (intel#2724)

Because if we use the parent thread directly, there will be two bugs,
1. The child threads forked from parent thread will be bound to core 0
because of the affinity settings.
2. The native thread has some unknown thread local variables. So if
the parent thread exits and is recreated, such as the thread from
Executors.newFixedThreadPool. The whole app will be segment fault.
The parent thread means the main thread (Local Mode) or worker thread of
mapPartition (Distributed Mode).

* add ceilMode for Pooling & fix batchNorm evaluate (#2708)

* add ceilMode for Pooling & fix batchNorm evaluate

* add training status for dnn layer

* fix comments

* fix IRGraph init & Add regualizer (#2736)

* fix IRGraph init & Add regualizer

* meet review comments

* fix: update mkldnn version to v0.17 issues. (intel#2712)

There're two issues,

1. the padding tensor required. mkl-dnn will use a padding tensor which
    will use more memory, such as 4x1x28x28 to 4x8x28x28(avx2). It will
    pad to times of simd width.
2. the TensorMMap between DenseTensor and DnnTensor. Previous impl
    will allocate DnnTensor when model is created, which will cost too much
    space. So this patch will allocate it at runtime.

* add computshape for some layers and add skip primitives in DnnGraph (intel#2740)

* add computshape for some layer and add skip primitives in DnnGraph

* meet pr comments

* Improve documentation (intel#2745)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* include edge case to cover all the data types (#2742)

* layer auto fusion for dnn graph (intel#2746)

* add auto fusion in dnn graph

* refactor predict for dnn model (intel#2737)

* refactor predict for dnn model

* remove some unit tests (intel#2752)

* remove some conflict tests (#2753)

* Update documentation (intel#2749)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Fix Add operation error when type is Double importing Tensorflow graph (#2721)

* feature: add byte supports for DnnTensor (intel#2751)

* feat: add byte supports for DnnTensor

* [New Feature] Calculating Scales (#2750)

* [New Feature]Calculating Scales

* recursively update mask for container module (intel#2754)

* recursively update mask for container module

* [Enhancement] - Speed up BlasWrapper performance under MKL-DNN (intel#2748)

* add parallel in Blaswrapper

* refactor to support ssd

* meet pr comments

* fix logger serialize

* Loss Function docs improvement (intel#2757)

* Improve Loss Function docs v2

* change asInstanceOf to toDistirbuted in optimizer (#2755)

* change asInstanceOf to toDistirbuted

* change asInstanceOf to toDistirbuted

* convert scale in blas to dnn (#2758)

* convert scale in blas to dnn

* meet pr comment

* feat: reorder for int8 supports (#2756)

1. Because the new data type, we should add a new attribute called dataType
    to the `MemoryData`.
2. Because we should transfer the scales between FP32->int8 and Int8->FP32.
    we should add two new attributes called `mask` and `scales`.

* fix conversion accuracy (intel#2760)

*  fix accuracy for saved model

* exclude mkldnn model when conversion

* feature: layer wise supports of int8 (intel#2762)

Enable the int8 data type in layers, especially for convolutions.
So for a specific layer, it can accept a int8 input. If you want to the fp32
output, should add a reorder.

* feature: mkldnn int8 layer wise supports (intel#2759)

including 3 steps.

1. generate scales of model.
   need an api like `generateScalesWithMask` to generate the scales of
   fp32 model. and the model returned is an fp32 model too.
2. quantize the model
   the `quantize()` api will be compatible with the `bigquant`
   backend, which will set the quantize flag. And when doing compile,
   the quantized weight, output, input will be generated by mkldnn at
   runtime.
3. do the inference (forward).

* update readme for v1 training (intel#2763)

* update release doc for preparation (intel#2764)

* change some docs about mkldnn (intel#2765)

* add comments about mkldnn

* meet pr comments

* examples for int8 (intel#2761)

This is an example of how to use mkldnn int8. There're two steps, use
GenInt8Scales to generate the scales first and save the new model. And than you
can use the quantized model as usual.

* enable fustion by default (intel#2766)

* fix: the influence of default value of fusion (#2768)

* fix: use too much memory of mkldnn models (intel#2783)

* fix: inplace of input/output and weight dimension error (intel#2779)

Some layer's input and output use the same memory. We can't do forward in the
`calcScales`. Because at that time, the input has been changed, its scales maybe
not right. Such as,

Seqeuntail().add(Conv).add(ReLU)

it will do two steps, seq.forward(input) first. and when go into the ReLU, it
will do another forward, so the input will be the output. And scales will be
wrong.

For convolution's weight, the dimension always is 5, although the group number
is 1. But for dnn convolution, if there's no group, the weight's dimension
should be 4.

* fix: the blas wrapper has no scales (intel#2778)

* fix softmax (intel#2777)

* fix: performance regression on resnet50 (intel#2774)

the u8 to s8 or s8 to u8 needs no reorder on this case.

* fix log init (#2781)

* fix: dropout should init primitive (#2789)

* Docs update for spark 2.3, build 0.7 and deps exlude (intel#2671)

* flip to 0.9.0 (intel#2792)

* Improve Layer documentation v1 (#2767)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Loss Function docs improvement v1

* Improve Loss Function docs v2

* Improve Layers documentation

* Improve documentation on Activations

* minor fix

* Update a code section with python style on Metrics.md (intel#2665)

* [Fix] doc : some changes for scalaUserGuide and release links according to … (intel#2791)

* doc : some changes for scalaUserGuide and release links according to v0.8.0 release

* Update build-bigdl-core.md

* Update build-bigdl-core.md

* test: should compare the right grad input (intel#2794)

* fix the wrong error message (#2800)

* [New feature] Add attention layer and ffn layer (intel#2795)

* add attention layer

* add ffn layer and more unit tests

* refactor according to pr comments

* add SerializationTest

* fix unit tests

* add python api

* update readme with newly adopted mkl-dnn (#2803)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer (intel#2802)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer:
Add LARS optimizer: Layer-wise scaled. Also with utility functions to build a set of LARS optim for a container.

Bug fix: The gradient block id of AllReduceParameter is originally composed of {id}{pidTo}gradientBytes{pidFrom}. But the combination of {id}{pidTo} will cause ambiguity. e.g., "112" can be {1}{12} or {11}{2}. Now a "_" is added to separate id from pidTo

* refine documents, correctly set the lrSchedulerOwner bit

* format the added code

* make Lars inherit SGD

* rename Lars -> LarsSGD and reformat

* style changes

* bugfix - set mask for container (intel#2807)

* bugfix - set mask for container

* bugfix #2805: set dimension mask

* Update Graph.scala

* Update Graph.scala

* change set mask indicator's name

* rename set mask params

* [Enhancement]: Scala Reflection: get default value for constructor parameters (intel#2808)

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* resolve conflict

* resolve conflict

* code style check

* remove print

* fix typos

fix typos

* replace randomcropper with centercrop for better performance (#2818)

* fix: memory data hash code should contain data type (intel#2821)

* Optimize backward graph generation and CAddTable (intel#2817)

* Optimize backward graph generation and caddtable

* refine add table

* change api name

* add layer norm and expand size layers (#2819)

* add layer norm and expand size

* meet pr comments

* feat: enable global average pooling (intel#2823)

* feat: enable global average pooling

* test: add more unit tests

* Optimizers: use member variable in parent class

* Revert "Optimizers: use member variable in parent class"

This reverts commit 7e47204

* Dilation in MKL-DNN Convolution (intel#2815)

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* fix typos

fix typos

* make todo all uppercase

* fix: calculate arbitrary mask of scales (intel#2822)

* Use one AllReduceParameter for multi-optim method  training (intel#2814)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* change random seed in UT

* [New feature] add transformer layer (intel#2825)

* add transformer

* refactor class name

* use same embedding for translation

* fix pr comments

* [Bug Fix] Fix Issue 2734 (#2816)

* fix issue 2734

* fix issue 2734

* fix issue 2734

* [Refactor] Reflection Utilization (#2831)

* refactor reflection utils

* refactor reflection utils

* feat: MKLDNN LSTM unidirectional/bidirectional inference support (intel#2806)

* LSTM draft

* MKLDNN LSTM fixed MD

* added hiddenSize

* setMemoryData NativeData

* weights NativeData format set to ldigo, all 1 test passed

* fixed format any problem

* LSTM weights bias initialisation

* add LSTM2 in nn

* Bidirectional LSTM inference enabled

* modified Bidirectional test

* LSTMSpec input format conversion bug between bigdl and mkldnn fixed, not support random weights, bias

* fixed the last problem 1 3 2 4

* Three inference tests with randomly generated parameters

* Added comments and modified the LSTMSpec (tests using Equivalent.nearequals)

* Deleted nn/LSTM2. Renamed methods. Added a requirement in nn/TimeDistributed

* combined initMemoryDescs() into initFwdPrimitives()

* Add require for input size and hidden size matching if layers of LSTM is more than one

* Refactor RNN

* Add comment on gate order to mkldnn/RNN

* Add unidirectional multilayer test

* add comments/ modify UTs

* phase is not used anymore/ use isTraining() in stead

* operationWant enhanced/ weight init/ release() parameters()

* remove input format check and change some variables names

* input format check / throw exception print info / release code

* comment style and RNNSerialTest

* remove unnecessary comments

* Softmax -> SoftMax (#2837)

* bug fix for cmul (intel#2836)

* bug fix for cmul

* meet pr comments

* set new storage to weight and bias for weight fusion (intel#2839)

* Add parameter processor for LARS (#2832)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* implement lars whole layer gradient norm calculation

* change random seed in UT

* add limitation on "trust" of LARS, remove debug output

* reformat

* add tests in DirtriOptimizer for LARS

* reformat

* update parameters in UT

* update parameters in UT

* Add transformer to LM example (intel#2835)

* add transformer to LM example

* refactor dropout in Transformer

* meet pr comments

* feat: MKLDNN LSTM unidirectional/bidirectional backward support (#2840)

* MKLDNN LSTM backward support with accuracy testing

* fix: require consistent between shape and layout of mkldnn (intel#2824)

* fix: fusion for multi-group of convolution (intel#2826)

* fix: support int8 of jointable (#2827)

* fix: support int8 of jointable
* doc: add more docs

* fix: invokeAndWait2 should throw the exception in the tasks (intel#2843)

* fix acc bug & init dnn thread (intel#2841)

* support tnc and ntc conversion (intel#2844)

* support ntc in dnn layer (intel#2847)

* support ntc in dnn layer

* meet pr comments

* [WIP]Add beam search feature in transformer model (intel#2834)

* add beam search feature

* Update beam search feature and unit test

* add symbolToLogits function set check

* update clearState and add serial test

* add SequenceBeamSearch to python layers

* add createSequenceBeamSearch method to python api

* feat: add a property to disable omp thread affinity (intel#2849)

* fix: use treeset to calc topk to upgrade the performance of DetectionOutputSSD (intel#2853)

* fix: wrong affinity settings (intel#2857)

* update beam search feature for interface with transformer model (#2855)

* update beam search for padding value and cache structure

* update python API for beam search

* add comments and update python layer

* modify comments format

* modify comments format

* Support converting blas lstm to dnn lstm (#2846)

* convert from blas lstm to dnn lstm

* meet pr comments

* fix load lstm error bug (intel#2858)

* Add beam search in transformer (intel#2856)

* Add beam search in transformer

* meet pr comments

* fix: upgrade the performance of normalize (intel#2854)

* feat: add axis to softmax (intel#2859)

* add release doc for 0.9 (intel#2862)

* fix: update core ref to master (intel#2865)

* flip version to 0.10.0 (intel#2869)

* [Bug Fix] - Fix module version comparison  (intel#2871)

* update serialization

* update serialization

* convert IRgraph momentum to mkldnn (intel#2872)

* tutorial fix (intel#2879)

* feat: RoiAlign Forward (intel#2874)

* Add set input output format API in Python (intel#2880)

* add set input output format

* add static graph check

* feat: Feature Pyramid Networks Forward (intel#2870)

* fix memory leak for ir graph training (intel#2895)

* add gemm layer (#2882)

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add gemm layer

* add gemm layer

* add Shape layer (intel#2885)

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add Gather layer (intel#2897)

* add gather layer

* [New feature] Add maskhead (intel#2892)

* support for maskhead

* fix unit tests (intel#2905)

* modify  predict/predictClass function  (#2868)

* predictClass output modification

* predict/predictClass function modification in Beta Api

* predict/predictClass function modification

* predict/predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* [New feature] Add Boxhead (intel#2894)

* add boxhead

* add SerialTest

* meet pr comments

* fix: Add TopBlocks to Feature Pyramid Networks (FPN) (#2899)

* Add Mean Average Precision validation method (intel#2906)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* fix boxhead unit tests (#2912)

* python api nested list input and pooler python api (intel#2900)

* Auto memory management for MKLDNN (#2867)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* style fixes

* change _implicitMemoryOwner -> _this

* [New feature] Add region proposal (intel#2896)

* add Regionproposal

* [New feature] add maskrcnn (#2908)

* add maskrcnn

* fix mask head

* move maskrcnn to models

* add maskrcnn serialTest

* Add Onnx Supported Layers (intel#2902)

* remove duplicated layers

* Update RoiLabel class and add RoiImageFeatureToBatch (intel#2913)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* update RoiLabel, add RoiImageFeatureToBatch

* fix typo in class name

* updates by suggestions

* minor updates

* Move RoiMiniBatch to MTImageFeatureToBatch.scala

* mask in RoiLabel now have Floats not Bytes

* use IndexedSeq for RoiLabel

* style fix

* add isCrowd and origSize to final target table

* style fix

* isCrowd change to float, add doc

* add tests and bug fixes

* add util getting RoiLabels from ImageFeatures

* add util getting RoiLabels from Table

* comment out the tests

* rename utils in RoiLabel

* feat: MKLDNN GRU forward/backward support (#2893)

* Onnx support: modify unsqueeze function (#2910)

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* add maskutils (intel#2921)

* add maskutils

* update tests & docs

* fix typo in document

* Fix memory leaks on training (intel#2914)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* fix memory leak in batch norm

* style fixes

* change _implicitMemoryOwner -> _this

* release submat

* release opencv submats

* support samples with different size  to one mini batch (intel#2929)

* add to batch with resize

* meet comments

* support batch for mask head and pooler (intel#2926)

* support batch for mask head

* meet comments

* Onnx support: add a dim parameter to ops.Gather (intel#2920)

* add dim parameter to ops.Gather

* improve and simplify code

* improve and simplify code

* improve and simplify code

* improve and simplify code

* support batch for regionproposal (#2928)

* support batch for regionproposal

* enable gru blas-to-dnn conversion (intel#2930)

* Onnx support: add pos parameter to softmax (intel#2933)

* add pos parameter to softmax

* add pos parameter to softmax

* add pos parameter to softmax

* fix review problem

* fix review problem

* Add resize for segmentation (intel#2923)

* add resize for segmentation

* meet pr comments

* support batch input for boxhead (#2924)

* boxhead support batch input

* meet pr comments

* COCO SeqFile (intel#2927)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* ignore non-existing images

* updates based on GH comments

* ONNX Support (#2918)

* onnx dev

* add onnx loader

* clean up

* feat: add precision recall auc (#2941)

* feat: add precision recall auc

* add post processing for maskrcnn model (#2931)

* add mask postprocessing

* put image info to mask model

* fix TimeDistributedCriterion() lack of parameter of dimension issue (intel#2940)

* revert back api (intel#2943)

* fix: softmax and bn+scale fusion (intel#2937)

* feat: multi models support with MKL-DNN backend (intel#2936)

* feat: multi models support with MKL-DNN backend

* add COCO MAP (#2935)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* add COCO MAP

* revert merge conflict

* ignore non-existing images

* add IOU related API. MAP now parses RLEs

* BBox now inclusive

* updates based on GH comments

* add COCODataset.getImageById

* COCO topK default => -1, remove height: Int, width: Int in GroundTruthRLE

* update imageId2Image

* rename MAPObjectDetection utils, add cocoSegmentationAndBBox, refine formatting

* rename utils

* update documents

* check size of bbox & classes & scores & labels & iscrowd. Handle empty predictions

* add gt and target image size checking, add support for empty target bbox, add UT

* detection sorted before matching with GT. Optimize MAPResult merging. Add UT for merging

* COCO Seq file reader: grey to bgr (intel#2942)

* grey to bgr

* refactor isGrayScaleImage

* simplify grey scale image checking

* Add the flushing denormal values option on BigDL side (#2934)

* add no argument apply api for softmax (intel#2945)

* add no argument apply api for softmax

* add no argument apply api for softmax

* ONNX ResNet example (intel#2939)

* add onnx resnet example

* add doc for onnx

* add doc for onnx

* clean up

* add maskrcnn inference example (intel#2944)

* add maskrcnn inference example

* meet pr comments

* add model download url

* Update the RoiLabel and MTImageFeatureToBatch (intel#2925)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* Python MKLDNN examples for CNN(LeNet) and RNN(LSTM) (#2932)

* fix: takeSample only works for dnn backend and get one batch (intel#2947)

* fix: takeSample only works for dnn backend and get one batch

* edit doc (#2948)

* Rename filesToRoiImageFrame to filesToRoiImageFeatures (intel#2949)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* filesToRoiImageFrame -> filesToRoiImageFeatures, to public

* fix: move out setMklThreads of MklDnn (intel#2950)

* memory data cleanup (#2956)

* memory data cleanup

* Onnx support: RoiAlign and TopK parameter update (#2957)

* Topk add dim and increase parameter

* RoiAlign add max pooling mode

* add test cases

* add test cases

* remove masks requirements (intel#2959)

* fix: the squeeze should not be included in IRElement (intel#2962)

* enhance COCODataset (#2954)

* enhance COCODataset:
Add COCODataset.loadFromSeqFile
Add COCODataset.toImageFeatures
Add COCOImage.toTable

* rename and polish doc

* fix COCO serialize bug

* fix typo in function name

* typo fix (intel#2965)

* rename RoiImageFeatureToBatch APIs (#2964)

* RoiMiniBatch enhancement (#2953)

* SerializableIndexedSeq

* allow empty target & image size info

* rename RoiImageFeatureToBatch APIs

* set as private

* change back to array

* MTImageFeatureToBatch without labels

* handle iscrowd

* remove duplication in merge

* feat: add softmax backward (intel#2967)

* feat: add softmax backward

* fix: fuse bn scale and relu to bn. (intel#2966)

* fix: fuse bn scale and relu.

* fix mask unit tests (intel#2973)

* fix: nms stability when using treeset. (intel#2972)

* flip version to 0.11 (intel#2974)

* refactor anchor generator (#2963)

* refactor anchor generator

* meet pr comments

* fix code style

* ROIAlign refactor (intel#2960)

* ROIAlign refactor

* fix unit tests

* fix model load of maskrcnn (intel#2961)

* fix maskrcnn model load

* delete temp file

* fix maskrcnn tests

* support roialign backward (intel#2975)

* support roialign backward

* fix sparselinear unit test

* fix: bn nhwc error, the channel should be the last dim (#2981)

* refactor: move torch relevants unit tests to integration tests. (intel#2971)

* fix: enable integration accuracy tests (intel#2976)

* fix: softmax dnn backend wrong order of primitive (intel#2986)

* modify TextClassifier.scala (#2987)

* Add a method to merge nested StaticGraphs (intel#2985)

* NHWC support when running with MKL-DNN (#2989)

* support NHWC for MKLDNN

* fix unit tests

* Keras with MKL-DNN backend support (#2990)

* Update README.md

* Update README.md

* feat: add distri optimizer v2 (intel#2992)

* update error message in AllReduceParameter (#2997)

* update error message in AllReduceParameter

* use tensorflow proto jar (#2994)

* fix callBigDLFunc (intel#3002)

* Remove final for AbstractModule (intel#3001)

* DistriOptimizerV2 argument (intel#3003)

* call DistriOptimizerV2

* fix inception (intel#3010)

* fix top1 and treenn (intel#3011)

* remove final setExtraParameters (#3014)

* move pretrain in DistriOptimizerV2 (intel#3016)

* move getData

* rename

* remove time counting

* deprecate dlframe (intel#3012)

* deprecate dlframe

* fix throughput (#3017)

* fix throughput

* update

* add release doc for 0.10.0 (intel#3020)

* test examples by distrioptimizerv2 (intel#3007)

* enable scala examples by distrioptimizerv2

* update example's readme

* update integration test

* test python examples by distriOptimizerV2 (intel#3008)

* Test python examples by distriOptimizerV2

* deprecate nn.keras (intel#3013)

* deprecate nn.keras

* fix loss when minibatch size is different (intel#3021)

* fix loss

* fix ut

* fix style check (intel#3022)

* specify pyspark version (intel#3030)

* specify pyspark version

* add release doc for 0.11 (#3026)

* flip version to 0.12 (intel#3029)



* update

* fix KerasLayer new parameters() (#3034)

* Fix analytics zoo protobuf shading problem (intel#3033)

* change shade name and remove protobuf-java (already introduced by tf)

* remove protobuf

* add required dependencies (#3047)

* update doc (intel#3056)

* Updatedoc (#3060)

* Update install-from-pip.md

* [WIP] spark 3.0 (intel#3054)

* spark 3.0

* add spark3.0 deployment (intel#3061)

* add spark3.0 deployment

* add warning to remind Optimizer() deprecates (intel#3062)

* add warning to remind deprecates

* Update scala maven plugin (#3068)

* update scala maven plugin

* change to public (#3064)

* Add big model support (#3067)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* squeeze target dimension (corner case) in ClassNLLCriterion (intel#3072)

* fix target dimension match error

* update message (#3073)

* flip version to 0.13-snapshot (intel#3074)

* flip version to 0.13-snapshot

* Uncompressed Tensor  (intel#3079)

* support no compressing parameter

* address comments

* hotfix ClassNLLCriterion with cloned target (#3081)

* hotfix ClassNLLCriterion with cloned target

* Fix SerializationUtils clone issue of QuantizedTensor (intel#3088)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* update clone quantizedtensor

* update

* add OptimPredictorShutdownSpec UT in integration test (#3089)

* move integration UT to a general test script (intel#3094)

* back port master (intel#3096)

* set seed to avoid random error in PredictionServiceUT (intel#3097)

* Jdk11 support (intel#3098)

* update for jdk 11 support and doc

* add serializeUid (intel#3099)

* update doc (intel#3104)

* add doc for running in ide (intel#3106)

* fix callBigDLFunc return a Int while the true return value from java is a byte array. (intel#3111)

* add list of df support (intel#3113)

* Update readme (intel#3118)

* Update index.md

* add 0.12.2 release download (#3122)

* remove DLFrames (intel#3124)

* remove DLFrames

* update

* update

* update

* rm dlframe example from test script

* Add Utest about dividing zero (#3128)

* Add Utest about dividing zero

* add Utest and zero check of LocalData

* add Utest and zero check of LocalData

* change

* Add Utest about dividing zero

* fix test

* add python3 to Dockerfile (intel#3132)

* add python3 to Dockerfile

* update

* update jdk

* update

* make default DistriOptimizer as V2 (intel#3129)

* make default DistriOptimizer as V2

* update

* fix dlframe (intel#3133)

* DistriOptimizerV2 logger (intel#3135)

* DistriOptimizerV2 logger

* update

* fix style check

* validate epoch num

* move dlframe SharedParamsApater to AZ and roll back to OptimizerV1 (intel#3137)

* upgrade spark version (intel#3138)

* Update deploy-spark2.sh

* 0.13 release doc (#3144)

* upgrade log4j (intel#3141)

* flip0.14 (intel#3142)

* flip0.14

* update

* Update deploy-spark3.sh (#3145)

* update

* update

* update

* update

* fix make dist

* migrate path

* update

* update

Co-authored-by: zhangxiaoli73 <[email protected]>
Co-authored-by: Jerry Wu <[email protected]>
Co-authored-by: Xin Qiu <[email protected]>
Co-authored-by: Yanzhang Wang <[email protected]>
Co-authored-by: GenBrg <[email protected]>
Co-authored-by: LeicongLi <[email protected]>
Co-authored-by: Emiliano Martinez <[email protected]>
Co-authored-by: abdolence <[email protected]>
Co-authored-by: Enrique Garcia <[email protected]>
Co-authored-by: Louie Tsai <[email protected]>
Co-authored-by: yaochi <[email protected]>
Co-authored-by: Menooker <[email protected]>
Co-authored-by: Menooker <[email protected]>
Co-authored-by: Firecrackerxox <[email protected]>
Co-authored-by: majing921201 <[email protected]>
Co-authored-by: jenniew <[email protected]>
Co-authored-by: Xiao <[email protected]>
Co-authored-by: Firecrackerxox <[email protected]>
Co-authored-by: Hui Li <[email protected]>
Co-authored-by: Jason Dai <[email protected]>
Co-authored-by: dding3 <[email protected]>
Co-authored-by: Yang Wang <[email protected]>
Co-authored-by: Yina Chen <[email protected]>
Co-authored-by: Hangrui Cao <[email protected]>
Co-authored-by: pinggao18 <[email protected]>
dding3 added a commit to dding3/analytics-zoo that referenced this pull request Sep 7, 2021
* convert static graph to IR graph and build (intel#2711)

* add static graph to IR graph

* meet pr comments

* [Enhancement] - Enhance unig test to avoid dynamic resource allocation issue by docker (intel#2713)

* make the core number fixed

* fix local predictor

* add Trigger and/or python API (intel#2682)

* add spark 2.4 support (intel#2715)

* update sparse tensor's document (#2714)

* Reserve all state in OptimMethod when calling Optimizer.optimize() multiple times (#2648)

* reserve optimMethod for each worker

* add valdiation throughput

* cache variable previousOptim

* fix: move mkldnn computing to a single thread pool (intel#2724)

Because if we use the parent thread directly, there will be two bugs,
1. The child threads forked from parent thread will be bound to core 0
because of the affinity settings.
2. The native thread has some unknown thread local variables. So if
the parent thread exits and is recreated, such as the thread from
Executors.newFixedThreadPool. The whole app will be segment fault.
The parent thread means the main thread (Local Mode) or worker thread of
mapPartition (Distributed Mode).

* add ceilMode for Pooling & fix batchNorm evaluate (#2708)

* add ceilMode for Pooling & fix batchNorm evaluate

* add training status for dnn layer

* fix comments

* fix IRGraph init & Add regualizer (#2736)

* fix IRGraph init & Add regualizer

* meet review comments

* fix: update mkldnn version to v0.17 issues. (intel#2712)

There're two issues,

1. the padding tensor required. mkl-dnn will use a padding tensor which
    will use more memory, such as 4x1x28x28 to 4x8x28x28(avx2). It will
    pad to times of simd width.
2. the TensorMMap between DenseTensor and DnnTensor. Previous impl
    will allocate DnnTensor when model is created, which will cost too much
    space. So this patch will allocate it at runtime.

* add computshape for some layers and add skip primitives in DnnGraph (intel#2740)

* add computshape for some layer and add skip primitives in DnnGraph

* meet pr comments

* Improve documentation (intel#2745)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* include edge case to cover all the data types (#2742)

* layer auto fusion for dnn graph (intel#2746)

* add auto fusion in dnn graph

* refactor predict for dnn model (intel#2737)

* refactor predict for dnn model

* remove some unit tests (intel#2752)

* remove some conflict tests (#2753)

* Update documentation (intel#2749)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Fix Add operation error when type is Double importing Tensorflow graph (#2721)

* feature: add byte supports for DnnTensor (intel#2751)

* feat: add byte supports for DnnTensor

* [New Feature] Calculating Scales (#2750)

* [New Feature]Calculating Scales

* recursively update mask for container module (intel#2754)

* recursively update mask for container module

* [Enhancement] - Speed up BlasWrapper performance under MKL-DNN (intel#2748)

* add parallel in Blaswrapper

* refactor to support ssd

* meet pr comments

* fix logger serialize

* Loss Function docs improvement (intel#2757)

* Improve Loss Function docs v2

* change asInstanceOf to toDistirbuted in optimizer (#2755)

* change asInstanceOf to toDistirbuted

* change asInstanceOf to toDistirbuted

* convert scale in blas to dnn (#2758)

* convert scale in blas to dnn

* meet pr comment

* feat: reorder for int8 supports (#2756)

1. Because the new data type, we should add a new attribute called dataType
    to the `MemoryData`.
2. Because we should transfer the scales between FP32->int8 and Int8->FP32.
    we should add two new attributes called `mask` and `scales`.

* fix conversion accuracy (intel#2760)

*  fix accuracy for saved model

* exclude mkldnn model when conversion

* feature: layer wise supports of int8 (intel#2762)

Enable the int8 data type in layers, especially for convolutions.
So for a specific layer, it can accept a int8 input. If you want to the fp32
output, should add a reorder.

* feature: mkldnn int8 layer wise supports (intel#2759)

including 3 steps.

1. generate scales of model.
   need an api like `generateScalesWithMask` to generate the scales of
   fp32 model. and the model returned is an fp32 model too.
2. quantize the model
   the `quantize()` api will be compatible with the `bigquant`
   backend, which will set the quantize flag. And when doing compile,
   the quantized weight, output, input will be generated by mkldnn at
   runtime.
3. do the inference (forward).

* update readme for v1 training (intel#2763)

* update release doc for preparation (intel#2764)

* change some docs about mkldnn (intel#2765)

* add comments about mkldnn

* meet pr comments

* examples for int8 (intel#2761)

This is an example of how to use mkldnn int8. There're two steps, use
GenInt8Scales to generate the scales first and save the new model. And than you
can use the quantized model as usual.

* enable fustion by default (intel#2766)

* fix: the influence of default value of fusion (#2768)

* fix: use too much memory of mkldnn models (intel#2783)

* fix: inplace of input/output and weight dimension error (intel#2779)

Some layer's input and output use the same memory. We can't do forward in the
`calcScales`. Because at that time, the input has been changed, its scales maybe
not right. Such as,

Seqeuntail().add(Conv).add(ReLU)

it will do two steps, seq.forward(input) first. and when go into the ReLU, it
will do another forward, so the input will be the output. And scales will be
wrong.

For convolution's weight, the dimension always is 5, although the group number
is 1. But for dnn convolution, if there's no group, the weight's dimension
should be 4.

* fix: the blas wrapper has no scales (intel#2778)

* fix softmax (intel#2777)

* fix: performance regression on resnet50 (intel#2774)

the u8 to s8 or s8 to u8 needs no reorder on this case.

* fix log init (#2781)

* fix: dropout should init primitive (#2789)

* Docs update for spark 2.3, build 0.7 and deps exlude (intel#2671)

* flip to 0.9.0 (intel#2792)

* Improve Layer documentation v1 (#2767)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Loss Function docs improvement v1

* Improve Loss Function docs v2

* Improve Layers documentation

* Improve documentation on Activations

* minor fix

* Update a code section with python style on Metrics.md (intel#2665)

* [Fix] doc : some changes for scalaUserGuide and release links according to … (intel#2791)

* doc : some changes for scalaUserGuide and release links according to v0.8.0 release

* Update build-bigdl-core.md

* Update build-bigdl-core.md

* test: should compare the right grad input (intel#2794)

* fix the wrong error message (#2800)

* [New feature] Add attention layer and ffn layer (intel#2795)

* add attention layer

* add ffn layer and more unit tests

* refactor according to pr comments

* add SerializationTest

* fix unit tests

* add python api

* update readme with newly adopted mkl-dnn (#2803)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer (intel#2802)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer:
Add LARS optimizer: Layer-wise scaled. Also with utility functions to build a set of LARS optim for a container.

Bug fix: The gradient block id of AllReduceParameter is originally composed of {id}{pidTo}gradientBytes{pidFrom}. But the combination of {id}{pidTo} will cause ambiguity. e.g., "112" can be {1}{12} or {11}{2}. Now a "_" is added to separate id from pidTo

* refine documents, correctly set the lrSchedulerOwner bit

* format the added code

* make Lars inherit SGD

* rename Lars -> LarsSGD and reformat

* style changes

* bugfix - set mask for container (intel#2807)

* bugfix - set mask for container

* bugfix #2805: set dimension mask

* Update Graph.scala

* Update Graph.scala

* change set mask indicator's name

* rename set mask params

* [Enhancement]: Scala Reflection: get default value for constructor parameters (intel#2808)

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* resolve conflict

* resolve conflict

* code style check

* remove print

* fix typos

fix typos

* replace randomcropper with centercrop for better performance (#2818)

* fix: memory data hash code should contain data type (intel#2821)

* Optimize backward graph generation and CAddTable (intel#2817)

* Optimize backward graph generation and caddtable

* refine add table

* change api name

* add layer norm and expand size layers (#2819)

* add layer norm and expand size

* meet pr comments

* feat: enable global average pooling (intel#2823)

* feat: enable global average pooling

* test: add more unit tests

* Optimizers: use member variable in parent class

* Revert "Optimizers: use member variable in parent class"

This reverts commit 7e47204

* Dilation in MKL-DNN Convolution (intel#2815)

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* fix typos

fix typos

* make todo all uppercase

* fix: calculate arbitrary mask of scales (intel#2822)

* Use one AllReduceParameter for multi-optim method  training (intel#2814)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* change random seed in UT

* [New feature] add transformer layer (intel#2825)

* add transformer

* refactor class name

* use same embedding for translation

* fix pr comments

* [Bug Fix] Fix Issue 2734 (#2816)

* fix issue 2734

* fix issue 2734

* fix issue 2734

* [Refactor] Reflection Utilization (#2831)

* refactor reflection utils

* refactor reflection utils

* feat: MKLDNN LSTM unidirectional/bidirectional inference support (intel#2806)

* LSTM draft

* MKLDNN LSTM fixed MD

* added hiddenSize

* setMemoryData NativeData

* weights NativeData format set to ldigo, all 1 test passed

* fixed format any problem

* LSTM weights bias initialisation

* add LSTM2 in nn

* Bidirectional LSTM inference enabled

* modified Bidirectional test

* LSTMSpec input format conversion bug between bigdl and mkldnn fixed, not support random weights, bias

* fixed the last problem 1 3 2 4

* Three inference tests with randomly generated parameters

* Added comments and modified the LSTMSpec (tests using Equivalent.nearequals)

* Deleted nn/LSTM2. Renamed methods. Added a requirement in nn/TimeDistributed

* combined initMemoryDescs() into initFwdPrimitives()

* Add require for input size and hidden size matching if layers of LSTM is more than one

* Refactor RNN

* Add comment on gate order to mkldnn/RNN

* Add unidirectional multilayer test

* add comments/ modify UTs

* phase is not used anymore/ use isTraining() in stead

* operationWant enhanced/ weight init/ release() parameters()

* remove input format check and change some variables names

* input format check / throw exception print info / release code

* comment style and RNNSerialTest

* remove unnecessary comments

* Softmax -> SoftMax (#2837)

* bug fix for cmul (intel#2836)

* bug fix for cmul

* meet pr comments

* set new storage to weight and bias for weight fusion (intel#2839)

* Add parameter processor for LARS (#2832)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* implement lars whole layer gradient norm calculation

* change random seed in UT

* add limitation on "trust" of LARS, remove debug output

* reformat

* add tests in DirtriOptimizer for LARS

* reformat

* update parameters in UT

* update parameters in UT

* Add transformer to LM example (intel#2835)

* add transformer to LM example

* refactor dropout in Transformer

* meet pr comments

* feat: MKLDNN LSTM unidirectional/bidirectional backward support (#2840)

* MKLDNN LSTM backward support with accuracy testing

* fix: require consistent between shape and layout of mkldnn (intel#2824)

* fix: fusion for multi-group of convolution (intel#2826)

* fix: support int8 of jointable (#2827)

* fix: support int8 of jointable
* doc: add more docs

* fix: invokeAndWait2 should throw the exception in the tasks (intel#2843)

* fix acc bug & init dnn thread (intel#2841)

* support tnc and ntc conversion (intel#2844)

* support ntc in dnn layer (intel#2847)

* support ntc in dnn layer

* meet pr comments

* [WIP]Add beam search feature in transformer model (intel#2834)

* add beam search feature

* Update beam search feature and unit test

* add symbolToLogits function set check

* update clearState and add serial test

* add SequenceBeamSearch to python layers

* add createSequenceBeamSearch method to python api

* feat: add a property to disable omp thread affinity (intel#2849)

* fix: use treeset to calc topk to upgrade the performance of DetectionOutputSSD (intel#2853)

* fix: wrong affinity settings (intel#2857)

* update beam search feature for interface with transformer model (#2855)

* update beam search for padding value and cache structure

* update python API for beam search

* add comments and update python layer

* modify comments format

* modify comments format

* Support converting blas lstm to dnn lstm (#2846)

* convert from blas lstm to dnn lstm

* meet pr comments

* fix load lstm error bug (intel#2858)

* Add beam search in transformer (intel#2856)

* Add beam search in transformer

* meet pr comments

* fix: upgrade the performance of normalize (intel#2854)

* feat: add axis to softmax (intel#2859)

* add release doc for 0.9 (intel#2862)

* fix: update core ref to master (intel#2865)

* flip version to 0.10.0 (intel#2869)

* [Bug Fix] - Fix module version comparison  (intel#2871)

* update serialization

* update serialization

* convert IRgraph momentum to mkldnn (intel#2872)

* tutorial fix (intel#2879)

* feat: RoiAlign Forward (intel#2874)

* Add set input output format API in Python (intel#2880)

* add set input output format

* add static graph check

* feat: Feature Pyramid Networks Forward (intel#2870)

* fix memory leak for ir graph training (intel#2895)

* add gemm layer (#2882)

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add gemm layer

* add gemm layer

* add Shape layer (intel#2885)

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add Gather layer (intel#2897)

* add gather layer

* [New feature] Add maskhead (intel#2892)

* support for maskhead

* fix unit tests (intel#2905)

* modify  predict/predictClass function  (#2868)

* predictClass output modification

* predict/predictClass function modification in Beta Api

* predict/predictClass function modification

* predict/predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* [New feature] Add Boxhead (intel#2894)

* add boxhead

* add SerialTest

* meet pr comments

* fix: Add TopBlocks to Feature Pyramid Networks (FPN) (#2899)

* Add Mean Average Precision validation method (intel#2906)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* fix boxhead unit tests (#2912)

* python api nested list input and pooler python api (intel#2900)

* Auto memory management for MKLDNN (#2867)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* style fixes

* change _implicitMemoryOwner -> _this

* [New feature] Add region proposal (intel#2896)

* add Regionproposal

* [New feature] add maskrcnn (#2908)

* add maskrcnn

* fix mask head

* move maskrcnn to models

* add maskrcnn serialTest

* Add Onnx Supported Layers (intel#2902)

* remove duplicated layers

* Update RoiLabel class and add RoiImageFeatureToBatch (intel#2913)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* update RoiLabel, add RoiImageFeatureToBatch

* fix typo in class name

* updates by suggestions

* minor updates

* Move RoiMiniBatch to MTImageFeatureToBatch.scala

* mask in RoiLabel now have Floats not Bytes

* use IndexedSeq for RoiLabel

* style fix

* add isCrowd and origSize to final target table

* style fix

* isCrowd change to float, add doc

* add tests and bug fixes

* add util getting RoiLabels from ImageFeatures

* add util getting RoiLabels from Table

* comment out the tests

* rename utils in RoiLabel

* feat: MKLDNN GRU forward/backward support (#2893)

* Onnx support: modify unsqueeze function (#2910)

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* add maskutils (intel#2921)

* add maskutils

* update tests & docs

* fix typo in document

* Fix memory leaks on training (intel#2914)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* fix memory leak in batch norm

* style fixes

* change _implicitMemoryOwner -> _this

* release submat

* release opencv submats

* support samples with different size  to one mini batch (intel#2929)

* add to batch with resize

* meet comments

* support batch for mask head and pooler (intel#2926)

* support batch for mask head

* meet comments

* Onnx support: add a dim parameter to ops.Gather (intel#2920)

* add dim parameter to ops.Gather

* improve and simplify code

* improve and simplify code

* improve and simplify code

* improve and simplify code

* support batch for regionproposal (#2928)

* support batch for regionproposal

* enable gru blas-to-dnn conversion (intel#2930)

* Onnx support: add pos parameter to softmax (intel#2933)

* add pos parameter to softmax

* add pos parameter to softmax

* add pos parameter to softmax

* fix review problem

* fix review problem

* Add resize for segmentation (intel#2923)

* add resize for segmentation

* meet pr comments

* support batch input for boxhead (#2924)

* boxhead support batch input

* meet pr comments

* COCO SeqFile (intel#2927)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* ignore non-existing images

* updates based on GH comments

* ONNX Support (#2918)

* onnx dev

* add onnx loader

* clean up

* feat: add precision recall auc (#2941)

* feat: add precision recall auc

* add post processing for maskrcnn model (#2931)

* add mask postprocessing

* put image info to mask model

* fix TimeDistributedCriterion() lack of parameter of dimension issue (intel#2940)

* revert back api (intel#2943)

* fix: softmax and bn+scale fusion (intel#2937)

* feat: multi models support with MKL-DNN backend (intel#2936)

* feat: multi models support with MKL-DNN backend

* add COCO MAP (#2935)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* add COCO MAP

* revert merge conflict

* ignore non-existing images

* add IOU related API. MAP now parses RLEs

* BBox now inclusive

* updates based on GH comments

* add COCODataset.getImageById

* COCO topK default => -1, remove height: Int, width: Int in GroundTruthRLE

* update imageId2Image

* rename MAPObjectDetection utils, add cocoSegmentationAndBBox, refine formatting

* rename utils

* update documents

* check size of bbox & classes & scores & labels & iscrowd. Handle empty predictions

* add gt and target image size checking, add support for empty target bbox, add UT

* detection sorted before matching with GT. Optimize MAPResult merging. Add UT for merging

* COCO Seq file reader: grey to bgr (intel#2942)

* grey to bgr

* refactor isGrayScaleImage

* simplify grey scale image checking

* Add the flushing denormal values option on BigDL side (#2934)

* add no argument apply api for softmax (intel#2945)

* add no argument apply api for softmax

* add no argument apply api for softmax

* ONNX ResNet example (intel#2939)

* add onnx resnet example

* add doc for onnx

* add doc for onnx

* clean up

* add maskrcnn inference example (intel#2944)

* add maskrcnn inference example

* meet pr comments

* add model download url

* Update the RoiLabel and MTImageFeatureToBatch (intel#2925)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* Python MKLDNN examples for CNN(LeNet) and RNN(LSTM) (#2932)

* fix: takeSample only works for dnn backend and get one batch (intel#2947)

* fix: takeSample only works for dnn backend and get one batch

* edit doc (#2948)

* Rename filesToRoiImageFrame to filesToRoiImageFeatures (intel#2949)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* filesToRoiImageFrame -> filesToRoiImageFeatures, to public

* fix: move out setMklThreads of MklDnn (intel#2950)

* memory data cleanup (#2956)

* memory data cleanup

* Onnx support: RoiAlign and TopK parameter update (#2957)

* Topk add dim and increase parameter

* RoiAlign add max pooling mode

* add test cases

* add test cases

* remove masks requirements (intel#2959)

* fix: the squeeze should not be included in IRElement (intel#2962)

* enhance COCODataset (#2954)

* enhance COCODataset:
Add COCODataset.loadFromSeqFile
Add COCODataset.toImageFeatures
Add COCOImage.toTable

* rename and polish doc

* fix COCO serialize bug

* fix typo in function name

* typo fix (intel#2965)

* rename RoiImageFeatureToBatch APIs (#2964)

* RoiMiniBatch enhancement (#2953)

* SerializableIndexedSeq

* allow empty target & image size info

* rename RoiImageFeatureToBatch APIs

* set as private

* change back to array

* MTImageFeatureToBatch without labels

* handle iscrowd

* remove duplication in merge

* feat: add softmax backward (intel#2967)

* feat: add softmax backward

* fix: fuse bn scale and relu to bn. (intel#2966)

* fix: fuse bn scale and relu.

* fix mask unit tests (intel#2973)

* fix: nms stability when using treeset. (intel#2972)

* flip version to 0.11 (intel#2974)

* refactor anchor generator (#2963)

* refactor anchor generator

* meet pr comments

* fix code style

* ROIAlign refactor (intel#2960)

* ROIAlign refactor

* fix unit tests

* fix model load of maskrcnn (intel#2961)

* fix maskrcnn model load

* delete temp file

* fix maskrcnn tests

* support roialign backward (intel#2975)

* support roialign backward

* fix sparselinear unit test

* fix: bn nhwc error, the channel should be the last dim (#2981)

* refactor: move torch relevants unit tests to integration tests. (intel#2971)

* fix: enable integration accuracy tests (intel#2976)

* fix: softmax dnn backend wrong order of primitive (intel#2986)

* modify TextClassifier.scala (#2987)

* Add a method to merge nested StaticGraphs (intel#2985)

* NHWC support when running with MKL-DNN (#2989)

* support NHWC for MKLDNN

* fix unit tests

* Keras with MKL-DNN backend support (#2990)

* Update README.md

* Update README.md

* feat: add distri optimizer v2 (intel#2992)

* update error message in AllReduceParameter (#2997)

* update error message in AllReduceParameter

* use tensorflow proto jar (#2994)

* fix callBigDLFunc (intel#3002)

* Remove final for AbstractModule (intel#3001)

* DistriOptimizerV2 argument (intel#3003)

* call DistriOptimizerV2

* fix inception (intel#3010)

* fix top1 and treenn (intel#3011)

* remove final setExtraParameters (#3014)

* move pretrain in DistriOptimizerV2 (intel#3016)

* move getData

* rename

* remove time counting

* deprecate dlframe (intel#3012)

* deprecate dlframe

* fix throughput (#3017)

* fix throughput

* update

* add release doc for 0.10.0 (intel#3020)

* test examples by distrioptimizerv2 (intel#3007)

* enable scala examples by distrioptimizerv2

* update example's readme

* update integration test

* test python examples by distriOptimizerV2 (intel#3008)

* Test python examples by distriOptimizerV2

* deprecate nn.keras (intel#3013)

* deprecate nn.keras

* fix loss when minibatch size is different (intel#3021)

* fix loss

* fix ut

* fix style check (intel#3022)

* specify pyspark version (intel#3030)

* specify pyspark version

* add release doc for 0.11 (#3026)

* flip version to 0.12 (intel#3029)



* update

* fix KerasLayer new parameters() (#3034)

* Fix analytics zoo protobuf shading problem (intel#3033)

* change shade name and remove protobuf-java (already introduced by tf)

* remove protobuf

* add required dependencies (#3047)

* update doc (intel#3056)

* Updatedoc (#3060)

* Update install-from-pip.md

* [WIP] spark 3.0 (intel#3054)

* spark 3.0

* add spark3.0 deployment (intel#3061)

* add spark3.0 deployment

* add warning to remind Optimizer() deprecates (intel#3062)

* add warning to remind deprecates

* Update scala maven plugin (#3068)

* update scala maven plugin

* change to public (#3064)

* Add big model support (#3067)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* squeeze target dimension (corner case) in ClassNLLCriterion (intel#3072)

* fix target dimension match error

* update message (#3073)

* flip version to 0.13-snapshot (intel#3074)

* flip version to 0.13-snapshot

* Uncompressed Tensor  (intel#3079)

* support no compressing parameter

* address comments

* hotfix ClassNLLCriterion with cloned target (#3081)

* hotfix ClassNLLCriterion with cloned target

* Fix SerializationUtils clone issue of QuantizedTensor (intel#3088)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* update clone quantizedtensor

* update

* add OptimPredictorShutdownSpec UT in integration test (#3089)

* move integration UT to a general test script (intel#3094)

* back port master (intel#3096)

* set seed to avoid random error in PredictionServiceUT (intel#3097)

* Jdk11 support (intel#3098)

* update for jdk 11 support and doc

* add serializeUid (intel#3099)

* update doc (intel#3104)

* add doc for running in ide (intel#3106)

* fix callBigDLFunc return a Int while the true return value from java is a byte array. (intel#3111)

* add list of df support (intel#3113)

* Update readme (intel#3118)

* Update index.md

* add 0.12.2 release download (#3122)

* remove DLFrames (intel#3124)

* remove DLFrames

* update

* update

* update

* rm dlframe example from test script

* Add Utest about dividing zero (#3128)

* Add Utest about dividing zero

* add Utest and zero check of LocalData

* add Utest and zero check of LocalData

* change

* Add Utest about dividing zero

* fix test

* add python3 to Dockerfile (intel#3132)

* add python3 to Dockerfile

* update

* update jdk

* update

* make default DistriOptimizer as V2 (intel#3129)

* make default DistriOptimizer as V2

* update

* fix dlframe (intel#3133)

* DistriOptimizerV2 logger (intel#3135)

* DistriOptimizerV2 logger

* update

* fix style check

* validate epoch num

* move dlframe SharedParamsApater to AZ and roll back to OptimizerV1 (intel#3137)

* upgrade spark version (intel#3138)

* Update deploy-spark2.sh

* 0.13 release doc (#3144)

* upgrade log4j (intel#3141)

* flip0.14 (intel#3142)

* flip0.14

* update

* Update deploy-spark3.sh (#3145)

* update

* update

* update

* update

* fix make dist

* migrate path

* update

* update

Co-authored-by: zhangxiaoli73 <[email protected]>
Co-authored-by: Jerry Wu <[email protected]>
Co-authored-by: Xin Qiu <[email protected]>
Co-authored-by: Yanzhang Wang <[email protected]>
Co-authored-by: GenBrg <[email protected]>
Co-authored-by: LeicongLi <[email protected]>
Co-authored-by: Emiliano Martinez <[email protected]>
Co-authored-by: abdolence <[email protected]>
Co-authored-by: Enrique Garcia <[email protected]>
Co-authored-by: Louie Tsai <[email protected]>
Co-authored-by: yaochi <[email protected]>
Co-authored-by: Menooker <[email protected]>
Co-authored-by: Menooker <[email protected]>
Co-authored-by: Firecrackerxox <[email protected]>
Co-authored-by: majing921201 <[email protected]>
Co-authored-by: jenniew <[email protected]>
Co-authored-by: Xiao <[email protected]>
Co-authored-by: Firecrackerxox <[email protected]>
Co-authored-by: Hui Li <[email protected]>
Co-authored-by: Jason Dai <[email protected]>
Co-authored-by: dding3 <[email protected]>
Co-authored-by: Yang Wang <[email protected]>
Co-authored-by: Yina Chen <[email protected]>
Co-authored-by: Hangrui Cao <[email protected]>
Co-authored-by: pinggao18 <[email protected]>
Le-Zheng added a commit to Le-Zheng/analytics-zoo that referenced this pull request Sep 7, 2021
Le-Zheng added a commit to Le-Zheng/analytics-zoo that referenced this pull request Sep 7, 2021
Le-Zheng added a commit to Le-Zheng/analytics-zoo that referenced this pull request Sep 7, 2021
Le-Zheng added a commit to Le-Zheng/analytics-zoo that referenced this pull request Sep 7, 2021
Le-Zheng added a commit to Le-Zheng/analytics-zoo that referenced this pull request Sep 7, 2021
* convert static graph to IR graph and build (intel#2711)

* add static graph to IR graph

* meet pr comments

* [Enhancement] - Enhance unig test to avoid dynamic resource allocation issue by docker (intel#2713)

* make the core number fixed

* fix local predictor

* add Trigger and/or python API (intel#2682)

* add spark 2.4 support (intel#2715)

* update sparse tensor's document (#2714)

* Reserve all state in OptimMethod when calling Optimizer.optimize() multiple times (#2648)

* reserve optimMethod for each worker

* add valdiation throughput

* cache variable previousOptim

* fix: move mkldnn computing to a single thread pool (intel#2724)

Because if we use the parent thread directly, there will be two bugs,
1. The child threads forked from parent thread will be bound to core 0
because of the affinity settings.
2. The native thread has some unknown thread local variables. So if
the parent thread exits and is recreated, such as the thread from
Executors.newFixedThreadPool. The whole app will be segment fault.
The parent thread means the main thread (Local Mode) or worker thread of
mapPartition (Distributed Mode).

* add ceilMode for Pooling & fix batchNorm evaluate (#2708)

* add ceilMode for Pooling & fix batchNorm evaluate

* add training status for dnn layer

* fix comments

* fix IRGraph init & Add regualizer (#2736)

* fix IRGraph init & Add regualizer

* meet review comments

* fix: update mkldnn version to v0.17 issues. (intel#2712)

There're two issues,

1. the padding tensor required. mkl-dnn will use a padding tensor which
    will use more memory, such as 4x1x28x28 to 4x8x28x28(avx2). It will
    pad to times of simd width.
2. the TensorMMap between DenseTensor and DnnTensor. Previous impl
    will allocate DnnTensor when model is created, which will cost too much
    space. So this patch will allocate it at runtime.

* add computshape for some layers and add skip primitives in DnnGraph (intel#2740)

* add computshape for some layer and add skip primitives in DnnGraph

* meet pr comments

* Improve documentation (intel#2745)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* include edge case to cover all the data types (#2742)

* layer auto fusion for dnn graph (intel#2746)

* add auto fusion in dnn graph

* refactor predict for dnn model (intel#2737)

* refactor predict for dnn model

* remove some unit tests (intel#2752)

* remove some conflict tests (#2753)

* Update documentation (intel#2749)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Fix Add operation error when type is Double importing Tensorflow graph (#2721)

* feature: add byte supports for DnnTensor (intel#2751)

* feat: add byte supports for DnnTensor

* [New Feature] Calculating Scales (#2750)

* [New Feature]Calculating Scales

* recursively update mask for container module (intel#2754)

* recursively update mask for container module

* [Enhancement] - Speed up BlasWrapper performance under MKL-DNN (intel#2748)

* add parallel in Blaswrapper

* refactor to support ssd

* meet pr comments

* fix logger serialize

* Loss Function docs improvement (intel#2757)

* Improve Loss Function docs v2

* change asInstanceOf to toDistirbuted in optimizer (#2755)

* change asInstanceOf to toDistirbuted

* change asInstanceOf to toDistirbuted

* convert scale in blas to dnn (#2758)

* convert scale in blas to dnn

* meet pr comment

* feat: reorder for int8 supports (#2756)

1. Because the new data type, we should add a new attribute called dataType
    to the `MemoryData`.
2. Because we should transfer the scales between FP32->int8 and Int8->FP32.
    we should add two new attributes called `mask` and `scales`.

* fix conversion accuracy (intel#2760)

*  fix accuracy for saved model

* exclude mkldnn model when conversion

* feature: layer wise supports of int8 (intel#2762)

Enable the int8 data type in layers, especially for convolutions.
So for a specific layer, it can accept a int8 input. If you want to the fp32
output, should add a reorder.

* feature: mkldnn int8 layer wise supports (intel#2759)

including 3 steps.

1. generate scales of model.
   need an api like `generateScalesWithMask` to generate the scales of
   fp32 model. and the model returned is an fp32 model too.
2. quantize the model
   the `quantize()` api will be compatible with the `bigquant`
   backend, which will set the quantize flag. And when doing compile,
   the quantized weight, output, input will be generated by mkldnn at
   runtime.
3. do the inference (forward).

* update readme for v1 training (intel#2763)

* update release doc for preparation (intel#2764)

* change some docs about mkldnn (intel#2765)

* add comments about mkldnn

* meet pr comments

* examples for int8 (intel#2761)

This is an example of how to use mkldnn int8. There're two steps, use
GenInt8Scales to generate the scales first and save the new model. And than you
can use the quantized model as usual.

* enable fustion by default (intel#2766)

* fix: the influence of default value of fusion (#2768)

* fix: use too much memory of mkldnn models (intel#2783)

* fix: inplace of input/output and weight dimension error (intel#2779)

Some layer's input and output use the same memory. We can't do forward in the
`calcScales`. Because at that time, the input has been changed, its scales maybe
not right. Such as,

Seqeuntail().add(Conv).add(ReLU)

it will do two steps, seq.forward(input) first. and when go into the ReLU, it
will do another forward, so the input will be the output. And scales will be
wrong.

For convolution's weight, the dimension always is 5, although the group number
is 1. But for dnn convolution, if there's no group, the weight's dimension
should be 4.

* fix: the blas wrapper has no scales (intel#2778)

* fix softmax (intel#2777)

* fix: performance regression on resnet50 (intel#2774)

the u8 to s8 or s8 to u8 needs no reorder on this case.

* fix log init (#2781)

* fix: dropout should init primitive (#2789)

* Docs update for spark 2.3, build 0.7 and deps exlude (intel#2671)

* flip to 0.9.0 (intel#2792)

* Improve Layer documentation v1 (#2767)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Loss Function docs improvement v1

* Improve Loss Function docs v2

* Improve Layers documentation

* Improve documentation on Activations

* minor fix

* Update a code section with python style on Metrics.md (intel#2665)

* [Fix] doc : some changes for scalaUserGuide and release links according to … (intel#2791)

* doc : some changes for scalaUserGuide and release links according to v0.8.0 release

* Update build-bigdl-core.md

* Update build-bigdl-core.md

* test: should compare the right grad input (intel#2794)

* fix the wrong error message (#2800)

* [New feature] Add attention layer and ffn layer (intel#2795)

* add attention layer

* add ffn layer and more unit tests

* refactor according to pr comments

* add SerializationTest

* fix unit tests

* add python api

* update readme with newly adopted mkl-dnn (#2803)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer (intel#2802)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer:
Add LARS optimizer: Layer-wise scaled. Also with utility functions to build a set of LARS optim for a container.

Bug fix: The gradient block id of AllReduceParameter is originally composed of {id}{pidTo}gradientBytes{pidFrom}. But the combination of {id}{pidTo} will cause ambiguity. e.g., "112" can be {1}{12} or {11}{2}. Now a "_" is added to separate id from pidTo

* refine documents, correctly set the lrSchedulerOwner bit

* format the added code

* make Lars inherit SGD

* rename Lars -> LarsSGD and reformat

* style changes

* bugfix - set mask for container (intel#2807)

* bugfix - set mask for container

* bugfix #2805: set dimension mask

* Update Graph.scala

* Update Graph.scala

* change set mask indicator's name

* rename set mask params

* [Enhancement]: Scala Reflection: get default value for constructor parameters (intel#2808)

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* resolve conflict

* resolve conflict

* code style check

* remove print

* fix typos

fix typos

* replace randomcropper with centercrop for better performance (#2818)

* fix: memory data hash code should contain data type (intel#2821)

* Optimize backward graph generation and CAddTable (intel#2817)

* Optimize backward graph generation and caddtable

* refine add table

* change api name

* add layer norm and expand size layers (#2819)

* add layer norm and expand size

* meet pr comments

* feat: enable global average pooling (intel#2823)

* feat: enable global average pooling

* test: add more unit tests

* Optimizers: use member variable in parent class

* Revert "Optimizers: use member variable in parent class"

This reverts commit 7e47204

* Dilation in MKL-DNN Convolution (intel#2815)

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* fix typos

fix typos

* make todo all uppercase

* fix: calculate arbitrary mask of scales (intel#2822)

* Use one AllReduceParameter for multi-optim method  training (intel#2814)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* change random seed in UT

* [New feature] add transformer layer (intel#2825)

* add transformer

* refactor class name

* use same embedding for translation

* fix pr comments

* [Bug Fix] Fix Issue 2734 (#2816)

* fix issue 2734

* fix issue 2734

* fix issue 2734

* [Refactor] Reflection Utilization (#2831)

* refactor reflection utils

* refactor reflection utils

* feat: MKLDNN LSTM unidirectional/bidirectional inference support (intel#2806)

* LSTM draft

* MKLDNN LSTM fixed MD

* added hiddenSize

* setMemoryData NativeData

* weights NativeData format set to ldigo, all 1 test passed

* fixed format any problem

* LSTM weights bias initialisation

* add LSTM2 in nn

* Bidirectional LSTM inference enabled

* modified Bidirectional test

* LSTMSpec input format conversion bug between bigdl and mkldnn fixed, not support random weights, bias

* fixed the last problem 1 3 2 4

* Three inference tests with randomly generated parameters

* Added comments and modified the LSTMSpec (tests using Equivalent.nearequals)

* Deleted nn/LSTM2. Renamed methods. Added a requirement in nn/TimeDistributed

* combined initMemoryDescs() into initFwdPrimitives()

* Add require for input size and hidden size matching if layers of LSTM is more than one

* Refactor RNN

* Add comment on gate order to mkldnn/RNN

* Add unidirectional multilayer test

* add comments/ modify UTs

* phase is not used anymore/ use isTraining() in stead

* operationWant enhanced/ weight init/ release() parameters()

* remove input format check and change some variables names

* input format check / throw exception print info / release code

* comment style and RNNSerialTest

* remove unnecessary comments

* Softmax -> SoftMax (#2837)

* bug fix for cmul (intel#2836)

* bug fix for cmul

* meet pr comments

* set new storage to weight and bias for weight fusion (intel#2839)

* Add parameter processor for LARS (#2832)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* implement lars whole layer gradient norm calculation

* change random seed in UT

* add limitation on "trust" of LARS, remove debug output

* reformat

* add tests in DirtriOptimizer for LARS

* reformat

* update parameters in UT

* update parameters in UT

* Add transformer to LM example (intel#2835)

* add transformer to LM example

* refactor dropout in Transformer

* meet pr comments

* feat: MKLDNN LSTM unidirectional/bidirectional backward support (#2840)

* MKLDNN LSTM backward support with accuracy testing

* fix: require consistent between shape and layout of mkldnn (intel#2824)

* fix: fusion for multi-group of convolution (intel#2826)

* fix: support int8 of jointable (#2827)

* fix: support int8 of jointable
* doc: add more docs

* fix: invokeAndWait2 should throw the exception in the tasks (intel#2843)

* fix acc bug & init dnn thread (intel#2841)

* support tnc and ntc conversion (intel#2844)

* support ntc in dnn layer (intel#2847)

* support ntc in dnn layer

* meet pr comments

* [WIP]Add beam search feature in transformer model (intel#2834)

* add beam search feature

* Update beam search feature and unit test

* add symbolToLogits function set check

* update clearState and add serial test

* add SequenceBeamSearch to python layers

* add createSequenceBeamSearch method to python api

* feat: add a property to disable omp thread affinity (intel#2849)

* fix: use treeset to calc topk to upgrade the performance of DetectionOutputSSD (intel#2853)

* fix: wrong affinity settings (intel#2857)

* update beam search feature for interface with transformer model (#2855)

* update beam search for padding value and cache structure

* update python API for beam search

* add comments and update python layer

* modify comments format

* modify comments format

* Support converting blas lstm to dnn lstm (#2846)

* convert from blas lstm to dnn lstm

* meet pr comments

* fix load lstm error bug (intel#2858)

* Add beam search in transformer (intel#2856)

* Add beam search in transformer

* meet pr comments

* fix: upgrade the performance of normalize (intel#2854)

* feat: add axis to softmax (intel#2859)

* add release doc for 0.9 (intel#2862)

* fix: update core ref to master (intel#2865)

* flip version to 0.10.0 (intel#2869)

* [Bug Fix] - Fix module version comparison  (intel#2871)

* update serialization

* update serialization

* convert IRgraph momentum to mkldnn (intel#2872)

* tutorial fix (intel#2879)

* feat: RoiAlign Forward (intel#2874)

* Add set input output format API in Python (intel#2880)

* add set input output format

* add static graph check

* feat: Feature Pyramid Networks Forward (intel#2870)

* fix memory leak for ir graph training (intel#2895)

* add gemm layer (#2882)

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add gemm layer

* add gemm layer

* add Shape layer (intel#2885)

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add Gather layer (intel#2897)

* add gather layer

* [New feature] Add maskhead (intel#2892)

* support for maskhead

* fix unit tests (intel#2905)

* modify  predict/predictClass function  (#2868)

* predictClass output modification

* predict/predictClass function modification in Beta Api

* predict/predictClass function modification

* predict/predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* [New feature] Add Boxhead (intel#2894)

* add boxhead

* add SerialTest

* meet pr comments

* fix: Add TopBlocks to Feature Pyramid Networks (FPN) (#2899)

* Add Mean Average Precision validation method (intel#2906)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* fix boxhead unit tests (#2912)

* python api nested list input and pooler python api (intel#2900)

* Auto memory management for MKLDNN (#2867)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* style fixes

* change _implicitMemoryOwner -> _this

* [New feature] Add region proposal (intel#2896)

* add Regionproposal

* [New feature] add maskrcnn (#2908)

* add maskrcnn

* fix mask head

* move maskrcnn to models

* add maskrcnn serialTest

* Add Onnx Supported Layers (intel#2902)

* remove duplicated layers

* Update RoiLabel class and add RoiImageFeatureToBatch (intel#2913)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* update RoiLabel, add RoiImageFeatureToBatch

* fix typo in class name

* updates by suggestions

* minor updates

* Move RoiMiniBatch to MTImageFeatureToBatch.scala

* mask in RoiLabel now have Floats not Bytes

* use IndexedSeq for RoiLabel

* style fix

* add isCrowd and origSize to final target table

* style fix

* isCrowd change to float, add doc

* add tests and bug fixes

* add util getting RoiLabels from ImageFeatures

* add util getting RoiLabels from Table

* comment out the tests

* rename utils in RoiLabel

* feat: MKLDNN GRU forward/backward support (#2893)

* Onnx support: modify unsqueeze function (#2910)

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* add maskutils (intel#2921)

* add maskutils

* update tests & docs

* fix typo in document

* Fix memory leaks on training (intel#2914)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* fix memory leak in batch norm

* style fixes

* change _implicitMemoryOwner -> _this

* release submat

* release opencv submats

* support samples with different size  to one mini batch (intel#2929)

* add to batch with resize

* meet comments

* support batch for mask head and pooler (intel#2926)

* support batch for mask head

* meet comments

* Onnx support: add a dim parameter to ops.Gather (intel#2920)

* add dim parameter to ops.Gather

* improve and simplify code

* improve and simplify code

* improve and simplify code

* improve and simplify code

* support batch for regionproposal (#2928)

* support batch for regionproposal

* enable gru blas-to-dnn conversion (intel#2930)

* Onnx support: add pos parameter to softmax (intel#2933)

* add pos parameter to softmax

* add pos parameter to softmax

* add pos parameter to softmax

* fix review problem

* fix review problem

* Add resize for segmentation (intel#2923)

* add resize for segmentation

* meet pr comments

* support batch input for boxhead (#2924)

* boxhead support batch input

* meet pr comments

* COCO SeqFile (intel#2927)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* ignore non-existing images

* updates based on GH comments

* ONNX Support (#2918)

* onnx dev

* add onnx loader

* clean up

* feat: add precision recall auc (#2941)

* feat: add precision recall auc

* add post processing for maskrcnn model (#2931)

* add mask postprocessing

* put image info to mask model

* fix TimeDistributedCriterion() lack of parameter of dimension issue (intel#2940)

* revert back api (intel#2943)

* fix: softmax and bn+scale fusion (intel#2937)

* feat: multi models support with MKL-DNN backend (intel#2936)

* feat: multi models support with MKL-DNN backend

* add COCO MAP (#2935)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* add COCO MAP

* revert merge conflict

* ignore non-existing images

* add IOU related API. MAP now parses RLEs

* BBox now inclusive

* updates based on GH comments

* add COCODataset.getImageById

* COCO topK default => -1, remove height: Int, width: Int in GroundTruthRLE

* update imageId2Image

* rename MAPObjectDetection utils, add cocoSegmentationAndBBox, refine formatting

* rename utils

* update documents

* check size of bbox & classes & scores & labels & iscrowd. Handle empty predictions

* add gt and target image size checking, add support for empty target bbox, add UT

* detection sorted before matching with GT. Optimize MAPResult merging. Add UT for merging

* COCO Seq file reader: grey to bgr (intel#2942)

* grey to bgr

* refactor isGrayScaleImage

* simplify grey scale image checking

* Add the flushing denormal values option on BigDL side (#2934)

* add no argument apply api for softmax (intel#2945)

* add no argument apply api for softmax

* add no argument apply api for softmax

* ONNX ResNet example (intel#2939)

* add onnx resnet example

* add doc for onnx

* add doc for onnx

* clean up

* add maskrcnn inference example (intel#2944)

* add maskrcnn inference example

* meet pr comments

* add model download url

* Update the RoiLabel and MTImageFeatureToBatch (intel#2925)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* Python MKLDNN examples for CNN(LeNet) and RNN(LSTM) (#2932)

* fix: takeSample only works for dnn backend and get one batch (intel#2947)

* fix: takeSample only works for dnn backend and get one batch

* edit doc (#2948)

* Rename filesToRoiImageFrame to filesToRoiImageFeatures (intel#2949)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* filesToRoiImageFrame -> filesToRoiImageFeatures, to public

* fix: move out setMklThreads of MklDnn (intel#2950)

* memory data cleanup (#2956)

* memory data cleanup

* Onnx support: RoiAlign and TopK parameter update (#2957)

* Topk add dim and increase parameter

* RoiAlign add max pooling mode

* add test cases

* add test cases

* remove masks requirements (intel#2959)

* fix: the squeeze should not be included in IRElement (intel#2962)

* enhance COCODataset (#2954)

* enhance COCODataset:
Add COCODataset.loadFromSeqFile
Add COCODataset.toImageFeatures
Add COCOImage.toTable

* rename and polish doc

* fix COCO serialize bug

* fix typo in function name

* typo fix (intel#2965)

* rename RoiImageFeatureToBatch APIs (#2964)

* RoiMiniBatch enhancement (#2953)

* SerializableIndexedSeq

* allow empty target & image size info

* rename RoiImageFeatureToBatch APIs

* set as private

* change back to array

* MTImageFeatureToBatch without labels

* handle iscrowd

* remove duplication in merge

* feat: add softmax backward (intel#2967)

* feat: add softmax backward

* fix: fuse bn scale and relu to bn. (intel#2966)

* fix: fuse bn scale and relu.

* fix mask unit tests (intel#2973)

* fix: nms stability when using treeset. (intel#2972)

* flip version to 0.11 (intel#2974)

* refactor anchor generator (#2963)

* refactor anchor generator

* meet pr comments

* fix code style

* ROIAlign refactor (intel#2960)

* ROIAlign refactor

* fix unit tests

* fix model load of maskrcnn (intel#2961)

* fix maskrcnn model load

* delete temp file

* fix maskrcnn tests

* support roialign backward (intel#2975)

* support roialign backward

* fix sparselinear unit test

* fix: bn nhwc error, the channel should be the last dim (#2981)

* refactor: move torch relevants unit tests to integration tests. (intel#2971)

* fix: enable integration accuracy tests (intel#2976)

* fix: softmax dnn backend wrong order of primitive (intel#2986)

* modify TextClassifier.scala (#2987)

* Add a method to merge nested StaticGraphs (intel#2985)

* NHWC support when running with MKL-DNN (#2989)

* support NHWC for MKLDNN

* fix unit tests

* Keras with MKL-DNN backend support (#2990)

* Update README.md

* Update README.md

* feat: add distri optimizer v2 (intel#2992)

* update error message in AllReduceParameter (#2997)

* update error message in AllReduceParameter

* use tensorflow proto jar (#2994)

* fix callBigDLFunc (intel#3002)

* Remove final for AbstractModule (intel#3001)

* DistriOptimizerV2 argument (intel#3003)

* call DistriOptimizerV2

* fix inception (intel#3010)

* fix top1 and treenn (intel#3011)

* remove final setExtraParameters (#3014)

* move pretrain in DistriOptimizerV2 (intel#3016)

* move getData

* rename

* remove time counting

* deprecate dlframe (intel#3012)

* deprecate dlframe

* fix throughput (#3017)

* fix throughput

* update

* add release doc for 0.10.0 (intel#3020)

* test examples by distrioptimizerv2 (intel#3007)

* enable scala examples by distrioptimizerv2

* update example's readme

* update integration test

* test python examples by distriOptimizerV2 (intel#3008)

* Test python examples by distriOptimizerV2

* deprecate nn.keras (intel#3013)

* deprecate nn.keras

* fix loss when minibatch size is different (intel#3021)

* fix loss

* fix ut

* fix style check (intel#3022)

* specify pyspark version (intel#3030)

* specify pyspark version

* add release doc for 0.11 (#3026)

* flip version to 0.12 (intel#3029)



* update

* fix KerasLayer new parameters() (#3034)

* Fix analytics zoo protobuf shading problem (intel#3033)

* change shade name and remove protobuf-java (already introduced by tf)

* remove protobuf

* add required dependencies (#3047)

* update doc (intel#3056)

* Updatedoc (#3060)

* Update install-from-pip.md

* [WIP] spark 3.0 (intel#3054)

* spark 3.0

* add spark3.0 deployment (intel#3061)

* add spark3.0 deployment

* add warning to remind Optimizer() deprecates (intel#3062)

* add warning to remind deprecates

* Update scala maven plugin (#3068)

* update scala maven plugin

* change to public (#3064)

* Add big model support (#3067)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* squeeze target dimension (corner case) in ClassNLLCriterion (intel#3072)

* fix target dimension match error

* update message (#3073)

* flip version to 0.13-snapshot (intel#3074)

* flip version to 0.13-snapshot

* Uncompressed Tensor  (intel#3079)

* support no compressing parameter

* address comments

* hotfix ClassNLLCriterion with cloned target (#3081)

* hotfix ClassNLLCriterion with cloned target

* Fix SerializationUtils clone issue of QuantizedTensor (intel#3088)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* update clone quantizedtensor

* update

* add OptimPredictorShutdownSpec UT in integration test (#3089)

* move integration UT to a general test script (intel#3094)

* back port master (intel#3096)

* set seed to avoid random error in PredictionServiceUT (intel#3097)

* Jdk11 support (intel#3098)

* update for jdk 11 support and doc

* add serializeUid (intel#3099)

* update doc (intel#3104)

* add doc for running in ide (intel#3106)

* fix callBigDLFunc return a Int while the true return value from java is a byte array. (intel#3111)

* add list of df support (intel#3113)

* Update readme (intel#3118)

* Update index.md

* add 0.12.2 release download (#3122)

* remove DLFrames (intel#3124)

* remove DLFrames

* update

* update

* update

* rm dlframe example from test script

* Add Utest about dividing zero (#3128)

* Add Utest about dividing zero

* add Utest and zero check of LocalData

* add Utest and zero check of LocalData

* change

* Add Utest about dividing zero

* fix test

* add python3 to Dockerfile (intel#3132)

* add python3 to Dockerfile

* update

* update jdk

* update

* make default DistriOptimizer as V2 (intel#3129)

* make default DistriOptimizer as V2

* update

* fix dlframe (intel#3133)

* DistriOptimizerV2 logger (intel#3135)

* DistriOptimizerV2 logger

* update

* fix style check

* validate epoch num

* move dlframe SharedParamsApater to AZ and roll back to OptimizerV1 (intel#3137)

* upgrade spark version (intel#3138)

* Update deploy-spark2.sh

* 0.13 release doc (#3144)

* upgrade log4j (intel#3141)

* flip0.14 (intel#3142)

* flip0.14

* update

* Update deploy-spark3.sh (#3145)

* update

* update

* update

* update

* fix make dist

* migrate path

* update

* update

Co-authored-by: zhangxiaoli73 <[email protected]>
Co-authored-by: Jerry Wu <[email protected]>
Co-authored-by: Xin Qiu <[email protected]>
Co-authored-by: Yanzhang Wang <[email protected]>
Co-authored-by: GenBrg <[email protected]>
Co-authored-by: LeicongLi <[email protected]>
Co-authored-by: Emiliano Martinez <[email protected]>
Co-authored-by: abdolence <[email protected]>
Co-authored-by: Enrique Garcia <[email protected]>
Co-authored-by: Louie Tsai <[email protected]>
Co-authored-by: yaochi <[email protected]>
Co-authored-by: Menooker <[email protected]>
Co-authored-by: Menooker <[email protected]>
Co-authored-by: Firecrackerxox <[email protected]>
Co-authored-by: majing921201 <[email protected]>
Co-authored-by: jenniew <[email protected]>
Co-authored-by: Xiao <[email protected]>
Co-authored-by: Firecrackerxox <[email protected]>
Co-authored-by: Hui Li <[email protected]>
Co-authored-by: Jason Dai <[email protected]>
Co-authored-by: dding3 <[email protected]>
Co-authored-by: Yang Wang <[email protected]>
Co-authored-by: Yina Chen <[email protected]>
Co-authored-by: Hangrui Cao <[email protected]>
Co-authored-by: pinggao18 <[email protected]>
Le-Zheng added a commit to Le-Zheng/analytics-zoo that referenced this pull request Sep 7, 2021
Le-Zheng added a commit to Le-Zheng/analytics-zoo that referenced this pull request Sep 7, 2021
Le-Zheng added a commit to Le-Zheng/analytics-zoo that referenced this pull request Sep 7, 2021
Le-Zheng added a commit to Le-Zheng/analytics-zoo that referenced this pull request Sep 7, 2021
Le-Zheng added a commit to Le-Zheng/analytics-zoo that referenced this pull request Sep 7, 2021
* convert static graph to IR graph and build (intel#2711)

* add static graph to IR graph

* meet pr comments

* [Enhancement] - Enhance unig test to avoid dynamic resource allocation issue by docker (intel#2713)

* make the core number fixed

* fix local predictor

* add Trigger and/or python API (intel#2682)

* add spark 2.4 support (intel#2715)

* update sparse tensor's document (#2714)

* Reserve all state in OptimMethod when calling Optimizer.optimize() multiple times (#2648)

* reserve optimMethod for each worker

* add valdiation throughput

* cache variable previousOptim

* fix: move mkldnn computing to a single thread pool (intel#2724)

Because if we use the parent thread directly, there will be two bugs,
1. The child threads forked from parent thread will be bound to core 0
because of the affinity settings.
2. The native thread has some unknown thread local variables. So if
the parent thread exits and is recreated, such as the thread from
Executors.newFixedThreadPool. The whole app will be segment fault.
The parent thread means the main thread (Local Mode) or worker thread of
mapPartition (Distributed Mode).

* add ceilMode for Pooling & fix batchNorm evaluate (#2708)

* add ceilMode for Pooling & fix batchNorm evaluate

* add training status for dnn layer

* fix comments

* fix IRGraph init & Add regualizer (#2736)

* fix IRGraph init & Add regualizer

* meet review comments

* fix: update mkldnn version to v0.17 issues. (intel#2712)

There're two issues,

1. the padding tensor required. mkl-dnn will use a padding tensor which
    will use more memory, such as 4x1x28x28 to 4x8x28x28(avx2). It will
    pad to times of simd width.
2. the TensorMMap between DenseTensor and DnnTensor. Previous impl
    will allocate DnnTensor when model is created, which will cost too much
    space. So this patch will allocate it at runtime.

* add computshape for some layers and add skip primitives in DnnGraph (intel#2740)

* add computshape for some layer and add skip primitives in DnnGraph

* meet pr comments

* Improve documentation (intel#2745)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* include edge case to cover all the data types (#2742)

* layer auto fusion for dnn graph (intel#2746)

* add auto fusion in dnn graph

* refactor predict for dnn model (intel#2737)

* refactor predict for dnn model

* remove some unit tests (intel#2752)

* remove some conflict tests (#2753)

* Update documentation (intel#2749)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Fix Add operation error when type is Double importing Tensorflow graph (#2721)

* feature: add byte supports for DnnTensor (intel#2751)

* feat: add byte supports for DnnTensor

* [New Feature] Calculating Scales (#2750)

* [New Feature]Calculating Scales

* recursively update mask for container module (intel#2754)

* recursively update mask for container module

* [Enhancement] - Speed up BlasWrapper performance under MKL-DNN (intel#2748)

* add parallel in Blaswrapper

* refactor to support ssd

* meet pr comments

* fix logger serialize

* Loss Function docs improvement (intel#2757)

* Improve Loss Function docs v2

* change asInstanceOf to toDistirbuted in optimizer (#2755)

* change asInstanceOf to toDistirbuted

* change asInstanceOf to toDistirbuted

* convert scale in blas to dnn (#2758)

* convert scale in blas to dnn

* meet pr comment

* feat: reorder for int8 supports (#2756)

1. Because the new data type, we should add a new attribute called dataType
    to the `MemoryData`.
2. Because we should transfer the scales between FP32->int8 and Int8->FP32.
    we should add two new attributes called `mask` and `scales`.

* fix conversion accuracy (intel#2760)

*  fix accuracy for saved model

* exclude mkldnn model when conversion

* feature: layer wise supports of int8 (intel#2762)

Enable the int8 data type in layers, especially for convolutions.
So for a specific layer, it can accept a int8 input. If you want to the fp32
output, should add a reorder.

* feature: mkldnn int8 layer wise supports (intel#2759)

including 3 steps.

1. generate scales of model.
   need an api like `generateScalesWithMask` to generate the scales of
   fp32 model. and the model returned is an fp32 model too.
2. quantize the model
   the `quantize()` api will be compatible with the `bigquant`
   backend, which will set the quantize flag. And when doing compile,
   the quantized weight, output, input will be generated by mkldnn at
   runtime.
3. do the inference (forward).

* update readme for v1 training (intel#2763)

* update release doc for preparation (intel#2764)

* change some docs about mkldnn (intel#2765)

* add comments about mkldnn

* meet pr comments

* examples for int8 (intel#2761)

This is an example of how to use mkldnn int8. There're two steps, use
GenInt8Scales to generate the scales first and save the new model. And than you
can use the quantized model as usual.

* enable fustion by default (intel#2766)

* fix: the influence of default value of fusion (#2768)

* fix: use too much memory of mkldnn models (intel#2783)

* fix: inplace of input/output and weight dimension error (intel#2779)

Some layer's input and output use the same memory. We can't do forward in the
`calcScales`. Because at that time, the input has been changed, its scales maybe
not right. Such as,

Seqeuntail().add(Conv).add(ReLU)

it will do two steps, seq.forward(input) first. and when go into the ReLU, it
will do another forward, so the input will be the output. And scales will be
wrong.

For convolution's weight, the dimension always is 5, although the group number
is 1. But for dnn convolution, if there's no group, the weight's dimension
should be 4.

* fix: the blas wrapper has no scales (intel#2778)

* fix softmax (intel#2777)

* fix: performance regression on resnet50 (intel#2774)

the u8 to s8 or s8 to u8 needs no reorder on this case.

* fix log init (#2781)

* fix: dropout should init primitive (#2789)

* Docs update for spark 2.3, build 0.7 and deps exlude (intel#2671)

* flip to 0.9.0 (intel#2792)

* Improve Layer documentation v1 (#2767)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Loss Function docs improvement v1

* Improve Loss Function docs v2

* Improve Layers documentation

* Improve documentation on Activations

* minor fix

* Update a code section with python style on Metrics.md (intel#2665)

* [Fix] doc : some changes for scalaUserGuide and release links according to … (intel#2791)

* doc : some changes for scalaUserGuide and release links according to v0.8.0 release

* Update build-bigdl-core.md

* Update build-bigdl-core.md

* test: should compare the right grad input (intel#2794)

* fix the wrong error message (#2800)

* [New feature] Add attention layer and ffn layer (intel#2795)

* add attention layer

* add ffn layer and more unit tests

* refactor according to pr comments

* add SerializationTest

* fix unit tests

* add python api

* update readme with newly adopted mkl-dnn (#2803)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer (intel#2802)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer:
Add LARS optimizer: Layer-wise scaled. Also with utility functions to build a set of LARS optim for a container.

Bug fix: The gradient block id of AllReduceParameter is originally composed of {id}{pidTo}gradientBytes{pidFrom}. But the combination of {id}{pidTo} will cause ambiguity. e.g., "112" can be {1}{12} or {11}{2}. Now a "_" is added to separate id from pidTo

* refine documents, correctly set the lrSchedulerOwner bit

* format the added code

* make Lars inherit SGD

* rename Lars -> LarsSGD and reformat

* style changes

* bugfix - set mask for container (intel#2807)

* bugfix - set mask for container

* bugfix #2805: set dimension mask

* Update Graph.scala

* Update Graph.scala

* change set mask indicator's name

* rename set mask params

* [Enhancement]: Scala Reflection: get default value for constructor parameters (intel#2808)

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* resolve conflict

* resolve conflict

* code style check

* remove print

* fix typos

fix typos

* replace randomcropper with centercrop for better performance (#2818)

* fix: memory data hash code should contain data type (intel#2821)

* Optimize backward graph generation and CAddTable (intel#2817)

* Optimize backward graph generation and caddtable

* refine add table

* change api name

* add layer norm and expand size layers (#2819)

* add layer norm and expand size

* meet pr comments

* feat: enable global average pooling (intel#2823)

* feat: enable global average pooling

* test: add more unit tests

* Optimizers: use member variable in parent class

* Revert "Optimizers: use member variable in parent class"

This reverts commit 7e47204

* Dilation in MKL-DNN Convolution (intel#2815)

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* fix typos

fix typos

* make todo all uppercase

* fix: calculate arbitrary mask of scales (intel#2822)

* Use one AllReduceParameter for multi-optim method  training (intel#2814)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* change random seed in UT

* [New feature] add transformer layer (intel#2825)

* add transformer

* refactor class name

* use same embedding for translation

* fix pr comments

* [Bug Fix] Fix Issue 2734 (#2816)

* fix issue 2734

* fix issue 2734

* fix issue 2734

* [Refactor] Reflection Utilization (#2831)

* refactor reflection utils

* refactor reflection utils

* feat: MKLDNN LSTM unidirectional/bidirectional inference support (intel#2806)

* LSTM draft

* MKLDNN LSTM fixed MD

* added hiddenSize

* setMemoryData NativeData

* weights NativeData format set to ldigo, all 1 test passed

* fixed format any problem

* LSTM weights bias initialisation

* add LSTM2 in nn

* Bidirectional LSTM inference enabled

* modified Bidirectional test

* LSTMSpec input format conversion bug between bigdl and mkldnn fixed, not support random weights, bias

* fixed the last problem 1 3 2 4

* Three inference tests with randomly generated parameters

* Added comments and modified the LSTMSpec (tests using Equivalent.nearequals)

* Deleted nn/LSTM2. Renamed methods. Added a requirement in nn/TimeDistributed

* combined initMemoryDescs() into initFwdPrimitives()

* Add require for input size and hidden size matching if layers of LSTM is more than one

* Refactor RNN

* Add comment on gate order to mkldnn/RNN

* Add unidirectional multilayer test

* add comments/ modify UTs

* phase is not used anymore/ use isTraining() in stead

* operationWant enhanced/ weight init/ release() parameters()

* remove input format check and change some variables names

* input format check / throw exception print info / release code

* comment style and RNNSerialTest

* remove unnecessary comments

* Softmax -> SoftMax (#2837)

* bug fix for cmul (intel#2836)

* bug fix for cmul

* meet pr comments

* set new storage to weight and bias for weight fusion (intel#2839)

* Add parameter processor for LARS (#2832)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* implement lars whole layer gradient norm calculation

* change random seed in UT

* add limitation on "trust" of LARS, remove debug output

* reformat

* add tests in DirtriOptimizer for LARS

* reformat

* update parameters in UT

* update parameters in UT

* Add transformer to LM example (intel#2835)

* add transformer to LM example

* refactor dropout in Transformer

* meet pr comments

* feat: MKLDNN LSTM unidirectional/bidirectional backward support (#2840)

* MKLDNN LSTM backward support with accuracy testing

* fix: require consistent between shape and layout of mkldnn (intel#2824)

* fix: fusion for multi-group of convolution (intel#2826)

* fix: support int8 of jointable (#2827)

* fix: support int8 of jointable
* doc: add more docs

* fix: invokeAndWait2 should throw the exception in the tasks (intel#2843)

* fix acc bug & init dnn thread (intel#2841)

* support tnc and ntc conversion (intel#2844)

* support ntc in dnn layer (intel#2847)

* support ntc in dnn layer

* meet pr comments

* [WIP]Add beam search feature in transformer model (intel#2834)

* add beam search feature

* Update beam search feature and unit test

* add symbolToLogits function set check

* update clearState and add serial test

* add SequenceBeamSearch to python layers

* add createSequenceBeamSearch method to python api

* feat: add a property to disable omp thread affinity (intel#2849)

* fix: use treeset to calc topk to upgrade the performance of DetectionOutputSSD (intel#2853)

* fix: wrong affinity settings (intel#2857)

* update beam search feature for interface with transformer model (#2855)

* update beam search for padding value and cache structure

* update python API for beam search

* add comments and update python layer

* modify comments format

* modify comments format

* Support converting blas lstm to dnn lstm (#2846)

* convert from blas lstm to dnn lstm

* meet pr comments

* fix load lstm error bug (intel#2858)

* Add beam search in transformer (intel#2856)

* Add beam search in transformer

* meet pr comments

* fix: upgrade the performance of normalize (intel#2854)

* feat: add axis to softmax (intel#2859)

* add release doc for 0.9 (intel#2862)

* fix: update core ref to master (intel#2865)

* flip version to 0.10.0 (intel#2869)

* [Bug Fix] - Fix module version comparison  (intel#2871)

* update serialization

* update serialization

* convert IRgraph momentum to mkldnn (intel#2872)

* tutorial fix (intel#2879)

* feat: RoiAlign Forward (intel#2874)

* Add set input output format API in Python (intel#2880)

* add set input output format

* add static graph check

* feat: Feature Pyramid Networks Forward (intel#2870)

* fix memory leak for ir graph training (intel#2895)

* add gemm layer (#2882)

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add gemm layer

* add gemm layer

* add Shape layer (intel#2885)

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add Gather layer (intel#2897)

* add gather layer

* [New feature] Add maskhead (intel#2892)

* support for maskhead

* fix unit tests (intel#2905)

* modify  predict/predictClass function  (#2868)

* predictClass output modification

* predict/predictClass function modification in Beta Api

* predict/predictClass function modification

* predict/predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* [New feature] Add Boxhead (intel#2894)

* add boxhead

* add SerialTest

* meet pr comments

* fix: Add TopBlocks to Feature Pyramid Networks (FPN) (#2899)

* Add Mean Average Precision validation method (intel#2906)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* fix boxhead unit tests (#2912)

* python api nested list input and pooler python api (intel#2900)

* Auto memory management for MKLDNN (#2867)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* style fixes

* change _implicitMemoryOwner -> _this

* [New feature] Add region proposal (intel#2896)

* add Regionproposal

* [New feature] add maskrcnn (#2908)

* add maskrcnn

* fix mask head

* move maskrcnn to models

* add maskrcnn serialTest

* Add Onnx Supported Layers (intel#2902)

* remove duplicated layers

* Update RoiLabel class and add RoiImageFeatureToBatch (intel#2913)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* update RoiLabel, add RoiImageFeatureToBatch

* fix typo in class name

* updates by suggestions

* minor updates

* Move RoiMiniBatch to MTImageFeatureToBatch.scala

* mask in RoiLabel now have Floats not Bytes

* use IndexedSeq for RoiLabel

* style fix

* add isCrowd and origSize to final target table

* style fix

* isCrowd change to float, add doc

* add tests and bug fixes

* add util getting RoiLabels from ImageFeatures

* add util getting RoiLabels from Table

* comment out the tests

* rename utils in RoiLabel

* feat: MKLDNN GRU forward/backward support (#2893)

* Onnx support: modify unsqueeze function (#2910)

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* add maskutils (intel#2921)

* add maskutils

* update tests & docs

* fix typo in document

* Fix memory leaks on training (intel#2914)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* fix memory leak in batch norm

* style fixes

* change _implicitMemoryOwner -> _this

* release submat

* release opencv submats

* support samples with different size  to one mini batch (intel#2929)

* add to batch with resize

* meet comments

* support batch for mask head and pooler (intel#2926)

* support batch for mask head

* meet comments

* Onnx support: add a dim parameter to ops.Gather (intel#2920)

* add dim parameter to ops.Gather

* improve and simplify code

* improve and simplify code

* improve and simplify code

* improve and simplify code

* support batch for regionproposal (#2928)

* support batch for regionproposal

* enable gru blas-to-dnn conversion (intel#2930)

* Onnx support: add pos parameter to softmax (intel#2933)

* add pos parameter to softmax

* add pos parameter to softmax

* add pos parameter to softmax

* fix review problem

* fix review problem

* Add resize for segmentation (intel#2923)

* add resize for segmentation

* meet pr comments

* support batch input for boxhead (#2924)

* boxhead support batch input

* meet pr comments

* COCO SeqFile (intel#2927)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* ignore non-existing images

* updates based on GH comments

* ONNX Support (#2918)

* onnx dev

* add onnx loader

* clean up

* feat: add precision recall auc (#2941)

* feat: add precision recall auc

* add post processing for maskrcnn model (#2931)

* add mask postprocessing

* put image info to mask model

* fix TimeDistributedCriterion() lack of parameter of dimension issue (intel#2940)

* revert back api (intel#2943)

* fix: softmax and bn+scale fusion (intel#2937)

* feat: multi models support with MKL-DNN backend (intel#2936)

* feat: multi models support with MKL-DNN backend

* add COCO MAP (#2935)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* add COCO MAP

* revert merge conflict

* ignore non-existing images

* add IOU related API. MAP now parses RLEs

* BBox now inclusive

* updates based on GH comments

* add COCODataset.getImageById

* COCO topK default => -1, remove height: Int, width: Int in GroundTruthRLE

* update imageId2Image

* rename MAPObjectDetection utils, add cocoSegmentationAndBBox, refine formatting

* rename utils

* update documents

* check size of bbox & classes & scores & labels & iscrowd. Handle empty predictions

* add gt and target image size checking, add support for empty target bbox, add UT

* detection sorted before matching with GT. Optimize MAPResult merging. Add UT for merging

* COCO Seq file reader: grey to bgr (intel#2942)

* grey to bgr

* refactor isGrayScaleImage

* simplify grey scale image checking

* Add the flushing denormal values option on BigDL side (#2934)

* add no argument apply api for softmax (intel#2945)

* add no argument apply api for softmax

* add no argument apply api for softmax

* ONNX ResNet example (intel#2939)

* add onnx resnet example

* add doc for onnx

* add doc for onnx

* clean up

* add maskrcnn inference example (intel#2944)

* add maskrcnn inference example

* meet pr comments

* add model download url

* Update the RoiLabel and MTImageFeatureToBatch (intel#2925)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* Python MKLDNN examples for CNN(LeNet) and RNN(LSTM) (#2932)

* fix: takeSample only works for dnn backend and get one batch (intel#2947)

* fix: takeSample only works for dnn backend and get one batch

* edit doc (#2948)

* Rename filesToRoiImageFrame to filesToRoiImageFeatures (intel#2949)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* filesToRoiImageFrame -> filesToRoiImageFeatures, to public

* fix: move out setMklThreads of MklDnn (intel#2950)

* memory data cleanup (#2956)

* memory data cleanup

* Onnx support: RoiAlign and TopK parameter update (#2957)

* Topk add dim and increase parameter

* RoiAlign add max pooling mode

* add test cases

* add test cases

* remove masks requirements (intel#2959)

* fix: the squeeze should not be included in IRElement (intel#2962)

* enhance COCODataset (#2954)

* enhance COCODataset:
Add COCODataset.loadFromSeqFile
Add COCODataset.toImageFeatures
Add COCOImage.toTable

* rename and polish doc

* fix COCO serialize bug

* fix typo in function name

* typo fix (intel#2965)

* rename RoiImageFeatureToBatch APIs (#2964)

* RoiMiniBatch enhancement (#2953)

* SerializableIndexedSeq

* allow empty target & image size info

* rename RoiImageFeatureToBatch APIs

* set as private

* change back to array

* MTImageFeatureToBatch without labels

* handle iscrowd

* remove duplication in merge

* feat: add softmax backward (intel#2967)

* feat: add softmax backward

* fix: fuse bn scale and relu to bn. (intel#2966)

* fix: fuse bn scale and relu.

* fix mask unit tests (intel#2973)

* fix: nms stability when using treeset. (intel#2972)

* flip version to 0.11 (intel#2974)

* refactor anchor generator (#2963)

* refactor anchor generator

* meet pr comments

* fix code style

* ROIAlign refactor (intel#2960)

* ROIAlign refactor

* fix unit tests

* fix model load of maskrcnn (intel#2961)

* fix maskrcnn model load

* delete temp file

* fix maskrcnn tests

* support roialign backward (intel#2975)

* support roialign backward

* fix sparselinear unit test

* fix: bn nhwc error, the channel should be the last dim (#2981)

* refactor: move torch relevants unit tests to integration tests. (intel#2971)

* fix: enable integration accuracy tests (intel#2976)

* fix: softmax dnn backend wrong order of primitive (intel#2986)

* modify TextClassifier.scala (#2987)

* Add a method to merge nested StaticGraphs (intel#2985)

* NHWC support when running with MKL-DNN (#2989)

* support NHWC for MKLDNN

* fix unit tests

* Keras with MKL-DNN backend support (#2990)

* Update README.md

* Update README.md

* feat: add distri optimizer v2 (intel#2992)

* update error message in AllReduceParameter (#2997)

* update error message in AllReduceParameter

* use tensorflow proto jar (#2994)

* fix callBigDLFunc (intel#3002)

* Remove final for AbstractModule (intel#3001)

* DistriOptimizerV2 argument (intel#3003)

* call DistriOptimizerV2

* fix inception (intel#3010)

* fix top1 and treenn (intel#3011)

* remove final setExtraParameters (#3014)

* move pretrain in DistriOptimizerV2 (intel#3016)

* move getData

* rename

* remove time counting

* deprecate dlframe (intel#3012)

* deprecate dlframe

* fix throughput (#3017)

* fix throughput

* update

* add release doc for 0.10.0 (intel#3020)

* test examples by distrioptimizerv2 (intel#3007)

* enable scala examples by distrioptimizerv2

* update example's readme

* update integration test

* test python examples by distriOptimizerV2 (intel#3008)

* Test python examples by distriOptimizerV2

* deprecate nn.keras (intel#3013)

* deprecate nn.keras

* fix loss when minibatch size is different (intel#3021)

* fix loss

* fix ut

* fix style check (intel#3022)

* specify pyspark version (intel#3030)

* specify pyspark version

* add release doc for 0.11 (#3026)

* flip version to 0.12 (intel#3029)



* update

* fix KerasLayer new parameters() (#3034)

* Fix analytics zoo protobuf shading problem (intel#3033)

* change shade name and remove protobuf-java (already introduced by tf)

* remove protobuf

* add required dependencies (#3047)

* update doc (intel#3056)

* Updatedoc (#3060)

* Update install-from-pip.md

* [WIP] spark 3.0 (intel#3054)

* spark 3.0

* add spark3.0 deployment (intel#3061)

* add spark3.0 deployment

* add warning to remind Optimizer() deprecates (intel#3062)

* add warning to remind deprecates

* Update scala maven plugin (#3068)

* update scala maven plugin

* change to public (#3064)

* Add big model support (#3067)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* squeeze target dimension (corner case) in ClassNLLCriterion (intel#3072)

* fix target dimension match error

* update message (#3073)

* flip version to 0.13-snapshot (intel#3074)

* flip version to 0.13-snapshot

* Uncompressed Tensor  (intel#3079)

* support no compressing parameter

* address comments

* hotfix ClassNLLCriterion with cloned target (#3081)

* hotfix ClassNLLCriterion with cloned target

* Fix SerializationUtils clone issue of QuantizedTensor (intel#3088)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* update clone quantizedtensor

* update

* add OptimPredictorShutdownSpec UT in integration test (#3089)

* move integration UT to a general test script (intel#3094)

* back port master (intel#3096)

* set seed to avoid random error in PredictionServiceUT (intel#3097)

* Jdk11 support (intel#3098)

* update for jdk 11 support and doc

* add serializeUid (intel#3099)

* update doc (intel#3104)

* add doc for running in ide (intel#3106)

* fix callBigDLFunc return a Int while the true return value from java is a byte array. (intel#3111)

* add list of df support (intel#3113)

* Update readme (intel#3118)

* Update index.md

* add 0.12.2 release download (#3122)

* remove DLFrames (intel#3124)

* remove DLFrames

* update

* update

* update

* rm dlframe example from test script

* Add Utest about dividing zero (#3128)

* Add Utest about dividing zero

* add Utest and zero check of LocalData

* add Utest and zero check of LocalData

* change

* Add Utest about dividing zero

* fix test

* add python3 to Dockerfile (intel#3132)

* add python3 to Dockerfile

* update

* update jdk

* update

* make default DistriOptimizer as V2 (intel#3129)

* make default DistriOptimizer as V2

* update

* fix dlframe (intel#3133)

* DistriOptimizerV2 logger (intel#3135)

* DistriOptimizerV2 logger

* update

* fix style check

* validate epoch num

* move dlframe SharedParamsApater to AZ and roll back to OptimizerV1 (intel#3137)

* upgrade spark version (intel#3138)

* Update deploy-spark2.sh

* 0.13 release doc (#3144)

* upgrade log4j (intel#3141)

* flip0.14 (intel#3142)

* flip0.14

* update

* Update deploy-spark3.sh (#3145)

* update

* update

* update

* update

* fix make dist

* migrate path

* update

* update

Co-authored-by: zhangxiaoli73 <[email protected]>
Co-authored-by: Jerry Wu <[email protected]>
Co-authored-by: Xin Qiu <[email protected]>
Co-authored-by: Yanzhang Wang <[email protected]>
Co-authored-by: GenBrg <[email protected]>
Co-authored-by: LeicongLi <[email protected]>
Co-authored-by: Emiliano Martinez <[email protected]>
Co-authored-by: abdolence <[email protected]>
Co-authored-by: Enrique Garcia <[email protected]>
Co-authored-by: Louie Tsai <[email protected]>
Co-authored-by: yaochi <[email protected]>
Co-authored-by: Menooker <[email protected]>
Co-authored-by: Menooker <[email protected]>
Co-authored-by: Firecrackerxox <[email protected]>
Co-authored-by: majing921201 <[email protected]>
Co-authored-by: jenniew <[email protected]>
Co-authored-by: Xiao <[email protected]>
Co-authored-by: Firecrackerxox <[email protected]>
Co-authored-by: Hui Li <[email protected]>
Co-authored-by: Jason Dai <[email protected]>
Co-authored-by: dding3 <[email protected]>
Co-authored-by: Yang Wang <[email protected]>
Co-authored-by: Yina Chen <[email protected]>
Co-authored-by: Hangrui Cao <[email protected]>
Co-authored-by: pinggao18 <[email protected]>
dding3 added a commit to dding3/analytics-zoo that referenced this pull request Sep 8, 2021
* convert static graph to IR graph and build (intel#2711)

* add static graph to IR graph

* meet pr comments

* [Enhancement] - Enhance unig test to avoid dynamic resource allocation issue by docker (intel#2713)

* make the core number fixed

* fix local predictor

* add Trigger and/or python API (intel#2682)

* add spark 2.4 support (intel#2715)

* update sparse tensor's document (#2714)

* Reserve all state in OptimMethod when calling Optimizer.optimize() multiple times (#2648)

* reserve optimMethod for each worker

* add valdiation throughput

* cache variable previousOptim

* fix: move mkldnn computing to a single thread pool (intel#2724)

Because if we use the parent thread directly, there will be two bugs,
1. The child threads forked from parent thread will be bound to core 0
because of the affinity settings.
2. The native thread has some unknown thread local variables. So if
the parent thread exits and is recreated, such as the thread from
Executors.newFixedThreadPool. The whole app will be segment fault.
The parent thread means the main thread (Local Mode) or worker thread of
mapPartition (Distributed Mode).

* add ceilMode for Pooling & fix batchNorm evaluate (#2708)

* add ceilMode for Pooling & fix batchNorm evaluate

* add training status for dnn layer

* fix comments

* fix IRGraph init & Add regualizer (#2736)

* fix IRGraph init & Add regualizer

* meet review comments

* fix: update mkldnn version to v0.17 issues. (intel#2712)

There're two issues,

1. the padding tensor required. mkl-dnn will use a padding tensor which
    will use more memory, such as 4x1x28x28 to 4x8x28x28(avx2). It will
    pad to times of simd width.
2. the TensorMMap between DenseTensor and DnnTensor. Previous impl
    will allocate DnnTensor when model is created, which will cost too much
    space. So this patch will allocate it at runtime.

* add computshape for some layers and add skip primitives in DnnGraph (intel#2740)

* add computshape for some layer and add skip primitives in DnnGraph

* meet pr comments

* Improve documentation (intel#2745)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* include edge case to cover all the data types (#2742)

* layer auto fusion for dnn graph (intel#2746)

* add auto fusion in dnn graph

* refactor predict for dnn model (intel#2737)

* refactor predict for dnn model

* remove some unit tests (intel#2752)

* remove some conflict tests (#2753)

* Update documentation (intel#2749)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Fix Add operation error when type is Double importing Tensorflow graph (#2721)

* feature: add byte supports for DnnTensor (intel#2751)

* feat: add byte supports for DnnTensor

* [New Feature] Calculating Scales (#2750)

* [New Feature]Calculating Scales

* recursively update mask for container module (intel#2754)

* recursively update mask for container module

* [Enhancement] - Speed up BlasWrapper performance under MKL-DNN (intel#2748)

* add parallel in Blaswrapper

* refactor to support ssd

* meet pr comments

* fix logger serialize

* Loss Function docs improvement (intel#2757)

* Improve Loss Function docs v2

* change asInstanceOf to toDistirbuted in optimizer (#2755)

* change asInstanceOf to toDistirbuted

* change asInstanceOf to toDistirbuted

* convert scale in blas to dnn (#2758)

* convert scale in blas to dnn

* meet pr comment

* feat: reorder for int8 supports (#2756)

1. Because the new data type, we should add a new attribute called dataType
    to the `MemoryData`.
2. Because we should transfer the scales between FP32->int8 and Int8->FP32.
    we should add two new attributes called `mask` and `scales`.

* fix conversion accuracy (intel#2760)

*  fix accuracy for saved model

* exclude mkldnn model when conversion

* feature: layer wise supports of int8 (intel#2762)

Enable the int8 data type in layers, especially for convolutions.
So for a specific layer, it can accept a int8 input. If you want to the fp32
output, should add a reorder.

* feature: mkldnn int8 layer wise supports (intel#2759)

including 3 steps.

1. generate scales of model.
   need an api like `generateScalesWithMask` to generate the scales of
   fp32 model. and the model returned is an fp32 model too.
2. quantize the model
   the `quantize()` api will be compatible with the `bigquant`
   backend, which will set the quantize flag. And when doing compile,
   the quantized weight, output, input will be generated by mkldnn at
   runtime.
3. do the inference (forward).

* update readme for v1 training (intel#2763)

* update release doc for preparation (intel#2764)

* change some docs about mkldnn (intel#2765)

* add comments about mkldnn

* meet pr comments

* examples for int8 (intel#2761)

This is an example of how to use mkldnn int8. There're two steps, use
GenInt8Scales to generate the scales first and save the new model. And than you
can use the quantized model as usual.

* enable fustion by default (intel#2766)

* fix: the influence of default value of fusion (#2768)

* fix: use too much memory of mkldnn models (intel#2783)

* fix: inplace of input/output and weight dimension error (intel#2779)

Some layer's input and output use the same memory. We can't do forward in the
`calcScales`. Because at that time, the input has been changed, its scales maybe
not right. Such as,

Seqeuntail().add(Conv).add(ReLU)

it will do two steps, seq.forward(input) first. and when go into the ReLU, it
will do another forward, so the input will be the output. And scales will be
wrong.

For convolution's weight, the dimension always is 5, although the group number
is 1. But for dnn convolution, if there's no group, the weight's dimension
should be 4.

* fix: the blas wrapper has no scales (intel#2778)

* fix softmax (intel#2777)

* fix: performance regression on resnet50 (intel#2774)

the u8 to s8 or s8 to u8 needs no reorder on this case.

* fix log init (#2781)

* fix: dropout should init primitive (#2789)

* Docs update for spark 2.3, build 0.7 and deps exlude (intel#2671)

* flip to 0.9.0 (intel#2792)

* Improve Layer documentation v1 (#2767)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Loss Function docs improvement v1

* Improve Loss Function docs v2

* Improve Layers documentation

* Improve documentation on Activations

* minor fix

* Update a code section with python style on Metrics.md (intel#2665)

* [Fix] doc : some changes for scalaUserGuide and release links according to … (intel#2791)

* doc : some changes for scalaUserGuide and release links according to v0.8.0 release

* Update build-bigdl-core.md

* Update build-bigdl-core.md

* test: should compare the right grad input (intel#2794)

* fix the wrong error message (#2800)

* [New feature] Add attention layer and ffn layer (intel#2795)

* add attention layer

* add ffn layer and more unit tests

* refactor according to pr comments

* add SerializationTest

* fix unit tests

* add python api

* update readme with newly adopted mkl-dnn (#2803)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer (intel#2802)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer:
Add LARS optimizer: Layer-wise scaled. Also with utility functions to build a set of LARS optim for a container.

Bug fix: The gradient block id of AllReduceParameter is originally composed of {id}{pidTo}gradientBytes{pidFrom}. But the combination of {id}{pidTo} will cause ambiguity. e.g., "112" can be {1}{12} or {11}{2}. Now a "_" is added to separate id from pidTo

* refine documents, correctly set the lrSchedulerOwner bit

* format the added code

* make Lars inherit SGD

* rename Lars -> LarsSGD and reformat

* style changes

* bugfix - set mask for container (intel#2807)

* bugfix - set mask for container

* bugfix #2805: set dimension mask

* Update Graph.scala

* Update Graph.scala

* change set mask indicator's name

* rename set mask params

* [Enhancement]: Scala Reflection: get default value for constructor parameters (intel#2808)

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* resolve conflict

* resolve conflict

* code style check

* remove print

* fix typos

fix typos

* replace randomcropper with centercrop for better performance (#2818)

* fix: memory data hash code should contain data type (intel#2821)

* Optimize backward graph generation and CAddTable (intel#2817)

* Optimize backward graph generation and caddtable

* refine add table

* change api name

* add layer norm and expand size layers (#2819)

* add layer norm and expand size

* meet pr comments

* feat: enable global average pooling (intel#2823)

* feat: enable global average pooling

* test: add more unit tests

* Optimizers: use member variable in parent class

* Revert "Optimizers: use member variable in parent class"

This reverts commit 7e47204

* Dilation in MKL-DNN Convolution (intel#2815)

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* fix typos

fix typos

* make todo all uppercase

* fix: calculate arbitrary mask of scales (intel#2822)

* Use one AllReduceParameter for multi-optim method  training (intel#2814)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* change random seed in UT

* [New feature] add transformer layer (intel#2825)

* add transformer

* refactor class name

* use same embedding for translation

* fix pr comments

* [Bug Fix] Fix Issue 2734 (#2816)

* fix issue 2734

* fix issue 2734

* fix issue 2734

* [Refactor] Reflection Utilization (#2831)

* refactor reflection utils

* refactor reflection utils

* feat: MKLDNN LSTM unidirectional/bidirectional inference support (intel#2806)

* LSTM draft

* MKLDNN LSTM fixed MD

* added hiddenSize

* setMemoryData NativeData

* weights NativeData format set to ldigo, all 1 test passed

* fixed format any problem

* LSTM weights bias initialisation

* add LSTM2 in nn

* Bidirectional LSTM inference enabled

* modified Bidirectional test

* LSTMSpec input format conversion bug between bigdl and mkldnn fixed, not support random weights, bias

* fixed the last problem 1 3 2 4

* Three inference tests with randomly generated parameters

* Added comments and modified the LSTMSpec (tests using Equivalent.nearequals)

* Deleted nn/LSTM2. Renamed methods. Added a requirement in nn/TimeDistributed

* combined initMemoryDescs() into initFwdPrimitives()

* Add require for input size and hidden size matching if layers of LSTM is more than one

* Refactor RNN

* Add comment on gate order to mkldnn/RNN

* Add unidirectional multilayer test

* add comments/ modify UTs

* phase is not used anymore/ use isTraining() in stead

* operationWant enhanced/ weight init/ release() parameters()

* remove input format check and change some variables names

* input format check / throw exception print info / release code

* comment style and RNNSerialTest

* remove unnecessary comments

* Softmax -> SoftMax (#2837)

* bug fix for cmul (intel#2836)

* bug fix for cmul

* meet pr comments

* set new storage to weight and bias for weight fusion (intel#2839)

* Add parameter processor for LARS (#2832)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* implement lars whole layer gradient norm calculation

* change random seed in UT

* add limitation on "trust" of LARS, remove debug output

* reformat

* add tests in DirtriOptimizer for LARS

* reformat

* update parameters in UT

* update parameters in UT

* Add transformer to LM example (intel#2835)

* add transformer to LM example

* refactor dropout in Transformer

* meet pr comments

* feat: MKLDNN LSTM unidirectional/bidirectional backward support (#2840)

* MKLDNN LSTM backward support with accuracy testing

* fix: require consistent between shape and layout of mkldnn (intel#2824)

* fix: fusion for multi-group of convolution (intel#2826)

* fix: support int8 of jointable (#2827)

* fix: support int8 of jointable
* doc: add more docs

* fix: invokeAndWait2 should throw the exception in the tasks (intel#2843)

* fix acc bug & init dnn thread (intel#2841)

* support tnc and ntc conversion (intel#2844)

* support ntc in dnn layer (intel#2847)

* support ntc in dnn layer

* meet pr comments

* [WIP]Add beam search feature in transformer model (intel#2834)

* add beam search feature

* Update beam search feature and unit test

* add symbolToLogits function set check

* update clearState and add serial test

* add SequenceBeamSearch to python layers

* add createSequenceBeamSearch method to python api

* feat: add a property to disable omp thread affinity (intel#2849)

* fix: use treeset to calc topk to upgrade the performance of DetectionOutputSSD (intel#2853)

* fix: wrong affinity settings (intel#2857)

* update beam search feature for interface with transformer model (#2855)

* update beam search for padding value and cache structure

* update python API for beam search

* add comments and update python layer

* modify comments format

* modify comments format

* Support converting blas lstm to dnn lstm (#2846)

* convert from blas lstm to dnn lstm

* meet pr comments

* fix load lstm error bug (intel#2858)

* Add beam search in transformer (intel#2856)

* Add beam search in transformer

* meet pr comments

* fix: upgrade the performance of normalize (intel#2854)

* feat: add axis to softmax (intel#2859)

* add release doc for 0.9 (intel#2862)

* fix: update core ref to master (intel#2865)

* flip version to 0.10.0 (intel#2869)

* [Bug Fix] - Fix module version comparison  (intel#2871)

* update serialization

* update serialization

* convert IRgraph momentum to mkldnn (intel#2872)

* tutorial fix (intel#2879)

* feat: RoiAlign Forward (intel#2874)

* Add set input output format API in Python (intel#2880)

* add set input output format

* add static graph check

* feat: Feature Pyramid Networks Forward (intel#2870)

* fix memory leak for ir graph training (intel#2895)

* add gemm layer (#2882)

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add gemm layer

* add gemm layer

* add Shape layer (intel#2885)

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add Gather layer (intel#2897)

* add gather layer

* [New feature] Add maskhead (intel#2892)

* support for maskhead

* fix unit tests (intel#2905)

* modify  predict/predictClass function  (#2868)

* predictClass output modification

* predict/predictClass function modification in Beta Api

* predict/predictClass function modification

* predict/predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* [New feature] Add Boxhead (intel#2894)

* add boxhead

* add SerialTest

* meet pr comments

* fix: Add TopBlocks to Feature Pyramid Networks (FPN) (#2899)

* Add Mean Average Precision validation method (intel#2906)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* fix boxhead unit tests (#2912)

* python api nested list input and pooler python api (intel#2900)

* Auto memory management for MKLDNN (#2867)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* style fixes

* change _implicitMemoryOwner -> _this

* [New feature] Add region proposal (intel#2896)

* add Regionproposal

* [New feature] add maskrcnn (#2908)

* add maskrcnn

* fix mask head

* move maskrcnn to models

* add maskrcnn serialTest

* Add Onnx Supported Layers (intel#2902)

* remove duplicated layers

* Update RoiLabel class and add RoiImageFeatureToBatch (intel#2913)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* update RoiLabel, add RoiImageFeatureToBatch

* fix typo in class name

* updates by suggestions

* minor updates

* Move RoiMiniBatch to MTImageFeatureToBatch.scala

* mask in RoiLabel now have Floats not Bytes

* use IndexedSeq for RoiLabel

* style fix

* add isCrowd and origSize to final target table

* style fix

* isCrowd change to float, add doc

* add tests and bug fixes

* add util getting RoiLabels from ImageFeatures

* add util getting RoiLabels from Table

* comment out the tests

* rename utils in RoiLabel

* feat: MKLDNN GRU forward/backward support (#2893)

* Onnx support: modify unsqueeze function (#2910)

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* add maskutils (intel#2921)

* add maskutils

* update tests & docs

* fix typo in document

* Fix memory leaks on training (intel#2914)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* fix memory leak in batch norm

* style fixes

* change _implicitMemoryOwner -> _this

* release submat

* release opencv submats

* support samples with different size  to one mini batch (intel#2929)

* add to batch with resize

* meet comments

* support batch for mask head and pooler (intel#2926)

* support batch for mask head

* meet comments

* Onnx support: add a dim parameter to ops.Gather (intel#2920)

* add dim parameter to ops.Gather

* improve and simplify code

* improve and simplify code

* improve and simplify code

* improve and simplify code

* support batch for regionproposal (#2928)

* support batch for regionproposal

* enable gru blas-to-dnn conversion (intel#2930)

* Onnx support: add pos parameter to softmax (intel#2933)

* add pos parameter to softmax

* add pos parameter to softmax

* add pos parameter to softmax

* fix review problem

* fix review problem

* Add resize for segmentation (intel#2923)

* add resize for segmentation

* meet pr comments

* support batch input for boxhead (#2924)

* boxhead support batch input

* meet pr comments

* COCO SeqFile (intel#2927)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* ignore non-existing images

* updates based on GH comments

* ONNX Support (#2918)

* onnx dev

* add onnx loader

* clean up

* feat: add precision recall auc (#2941)

* feat: add precision recall auc

* add post processing for maskrcnn model (#2931)

* add mask postprocessing

* put image info to mask model

* fix TimeDistributedCriterion() lack of parameter of dimension issue (intel#2940)

* revert back api (intel#2943)

* fix: softmax and bn+scale fusion (intel#2937)

* feat: multi models support with MKL-DNN backend (intel#2936)

* feat: multi models support with MKL-DNN backend

* add COCO MAP (#2935)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* add COCO MAP

* revert merge conflict

* ignore non-existing images

* add IOU related API. MAP now parses RLEs

* BBox now inclusive

* updates based on GH comments

* add COCODataset.getImageById

* COCO topK default => -1, remove height: Int, width: Int in GroundTruthRLE

* update imageId2Image

* rename MAPObjectDetection utils, add cocoSegmentationAndBBox, refine formatting

* rename utils

* update documents

* check size of bbox & classes & scores & labels & iscrowd. Handle empty predictions

* add gt and target image size checking, add support for empty target bbox, add UT

* detection sorted before matching with GT. Optimize MAPResult merging. Add UT for merging

* COCO Seq file reader: grey to bgr (intel#2942)

* grey to bgr

* refactor isGrayScaleImage

* simplify grey scale image checking

* Add the flushing denormal values option on BigDL side (#2934)

* add no argument apply api for softmax (intel#2945)

* add no argument apply api for softmax

* add no argument apply api for softmax

* ONNX ResNet example (intel#2939)

* add onnx resnet example

* add doc for onnx

* add doc for onnx

* clean up

* add maskrcnn inference example (intel#2944)

* add maskrcnn inference example

* meet pr comments

* add model download url

* Update the RoiLabel and MTImageFeatureToBatch (intel#2925)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* Python MKLDNN examples for CNN(LeNet) and RNN(LSTM) (#2932)

* fix: takeSample only works for dnn backend and get one batch (intel#2947)

* fix: takeSample only works for dnn backend and get one batch

* edit doc (#2948)

* Rename filesToRoiImageFrame to filesToRoiImageFeatures (intel#2949)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* filesToRoiImageFrame -> filesToRoiImageFeatures, to public

* fix: move out setMklThreads of MklDnn (intel#2950)

* memory data cleanup (#2956)

* memory data cleanup

* Onnx support: RoiAlign and TopK parameter update (#2957)

* Topk add dim and increase parameter

* RoiAlign add max pooling mode

* add test cases

* add test cases

* remove masks requirements (intel#2959)

* fix: the squeeze should not be included in IRElement (intel#2962)

* enhance COCODataset (#2954)

* enhance COCODataset:
Add COCODataset.loadFromSeqFile
Add COCODataset.toImageFeatures
Add COCOImage.toTable

* rename and polish doc

* fix COCO serialize bug

* fix typo in function name

* typo fix (intel#2965)

* rename RoiImageFeatureToBatch APIs (#2964)

* RoiMiniBatch enhancement (#2953)

* SerializableIndexedSeq

* allow empty target & image size info

* rename RoiImageFeatureToBatch APIs

* set as private

* change back to array

* MTImageFeatureToBatch without labels

* handle iscrowd

* remove duplication in merge

* feat: add softmax backward (intel#2967)

* feat: add softmax backward

* fix: fuse bn scale and relu to bn. (intel#2966)

* fix: fuse bn scale and relu.

* fix mask unit tests (intel#2973)

* fix: nms stability when using treeset. (intel#2972)

* flip version to 0.11 (intel#2974)

* refactor anchor generator (#2963)

* refactor anchor generator

* meet pr comments

* fix code style

* ROIAlign refactor (intel#2960)

* ROIAlign refactor

* fix unit tests

* fix model load of maskrcnn (intel#2961)

* fix maskrcnn model load

* delete temp file

* fix maskrcnn tests

* support roialign backward (intel#2975)

* support roialign backward

* fix sparselinear unit test

* fix: bn nhwc error, the channel should be the last dim (#2981)

* refactor: move torch relevants unit tests to integration tests. (intel#2971)

* fix: enable integration accuracy tests (intel#2976)

* fix: softmax dnn backend wrong order of primitive (intel#2986)

* modify TextClassifier.scala (#2987)

* Add a method to merge nested StaticGraphs (intel#2985)

* NHWC support when running with MKL-DNN (#2989)

* support NHWC for MKLDNN

* fix unit tests

* Keras with MKL-DNN backend support (#2990)

* Update README.md

* Update README.md

* feat: add distri optimizer v2 (intel#2992)

* update error message in AllReduceParameter (#2997)

* update error message in AllReduceParameter

* use tensorflow proto jar (#2994)

* fix callBigDLFunc (intel#3002)

* Remove final for AbstractModule (intel#3001)

* DistriOptimizerV2 argument (intel#3003)

* call DistriOptimizerV2

* fix inception (intel#3010)

* fix top1 and treenn (intel#3011)

* remove final setExtraParameters (#3014)

* move pretrain in DistriOptimizerV2 (intel#3016)

* move getData

* rename

* remove time counting

* deprecate dlframe (intel#3012)

* deprecate dlframe

* fix throughput (#3017)

* fix throughput

* update

* add release doc for 0.10.0 (intel#3020)

* test examples by distrioptimizerv2 (intel#3007)

* enable scala examples by distrioptimizerv2

* update example's readme

* update integration test

* test python examples by distriOptimizerV2 (intel#3008)

* Test python examples by distriOptimizerV2

* deprecate nn.keras (intel#3013)

* deprecate nn.keras

* fix loss when minibatch size is different (intel#3021)

* fix loss

* fix ut

* fix style check (intel#3022)

* specify pyspark version (intel#3030)

* specify pyspark version

* add release doc for 0.11 (#3026)

* flip version to 0.12 (intel#3029)



* update

* fix KerasLayer new parameters() (#3034)

* Fix analytics zoo protobuf shading problem (intel#3033)

* change shade name and remove protobuf-java (already introduced by tf)

* remove protobuf

* add required dependencies (#3047)

* update doc (intel#3056)

* Updatedoc (#3060)

* Update install-from-pip.md

* [WIP] spark 3.0 (intel#3054)

* spark 3.0

* add spark3.0 deployment (intel#3061)

* add spark3.0 deployment

* add warning to remind Optimizer() deprecates (intel#3062)

* add warning to remind deprecates

* Update scala maven plugin (#3068)

* update scala maven plugin

* change to public (#3064)

* Add big model support (#3067)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* squeeze target dimension (corner case) in ClassNLLCriterion (intel#3072)

* fix target dimension match error

* update message (#3073)

* flip version to 0.13-snapshot (intel#3074)

* flip version to 0.13-snapshot

* Uncompressed Tensor  (intel#3079)

* support no compressing parameter

* address comments

* hotfix ClassNLLCriterion with cloned target (#3081)

* hotfix ClassNLLCriterion with cloned target

* Fix SerializationUtils clone issue of QuantizedTensor (intel#3088)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* update clone quantizedtensor

* update

* add OptimPredictorShutdownSpec UT in integration test (#3089)

* move integration UT to a general test script (intel#3094)

* back port master (intel#3096)

* set seed to avoid random error in PredictionServiceUT (intel#3097)

* Jdk11 support (intel#3098)

* update for jdk 11 support and doc

* add serializeUid (intel#3099)

* update doc (intel#3104)

* add doc for running in ide (intel#3106)

* fix callBigDLFunc return a Int while the true return value from java is a byte array. (intel#3111)

* add list of df support (intel#3113)

* Update readme (intel#3118)

* Update index.md

* add 0.12.2 release download (#3122)

* remove DLFrames (intel#3124)

* remove DLFrames

* update

* update

* update

* rm dlframe example from test script

* Add Utest about dividing zero (#3128)

* Add Utest about dividing zero

* add Utest and zero check of LocalData

* add Utest and zero check of LocalData

* change

* Add Utest about dividing zero

* fix test

* add python3 to Dockerfile (intel#3132)

* add python3 to Dockerfile

* update

* update jdk

* update

* make default DistriOptimizer as V2 (intel#3129)

* make default DistriOptimizer as V2

* update

* fix dlframe (intel#3133)

* DistriOptimizerV2 logger (intel#3135)

* DistriOptimizerV2 logger

* update

* fix style check

* validate epoch num

* move dlframe SharedParamsApater to AZ and roll back to OptimizerV1 (intel#3137)

* upgrade spark version (intel#3138)

* Update deploy-spark2.sh

* 0.13 release doc (#3144)

* upgrade log4j (intel#3141)

* flip0.14 (intel#3142)

* flip0.14

* update

* Update deploy-spark3.sh (#3145)

* update

* update

* update

* update

* fix make dist

* migrate path

* update

* update

Co-authored-by: zhangxiaoli73 <[email protected]>
Co-authored-by: Jerry Wu <[email protected]>
Co-authored-by: Xin Qiu <[email protected]>
Co-authored-by: Yanzhang Wang <[email protected]>
Co-authored-by: GenBrg <[email protected]>
Co-authored-by: LeicongLi <[email protected]>
Co-authored-by: Emiliano Martinez <[email protected]>
Co-authored-by: abdolence <[email protected]>
Co-authored-by: Enrique Garcia <[email protected]>
Co-authored-by: Louie Tsai <[email protected]>
Co-authored-by: yaochi <[email protected]>
Co-authored-by: Menooker <[email protected]>
Co-authored-by: Menooker <[email protected]>
Co-authored-by: Firecrackerxox <[email protected]>
Co-authored-by: majing921201 <[email protected]>
Co-authored-by: jenniew <[email protected]>
Co-authored-by: Xiao <[email protected]>
Co-authored-by: Firecrackerxox <[email protected]>
Co-authored-by: Hui Li <[email protected]>
Co-authored-by: Jason Dai <[email protected]>
Co-authored-by: dding3 <[email protected]>
Co-authored-by: Yang Wang <[email protected]>
Co-authored-by: Yina Chen <[email protected]>
Co-authored-by: Hangrui Cao <[email protected]>
Co-authored-by: pinggao18 <[email protected]>
Le-Zheng added a commit to Le-Zheng/analytics-zoo that referenced this pull request Sep 8, 2021
* convert static graph to IR graph and build (intel#2711)

* add static graph to IR graph

* meet pr comments

* [Enhancement] - Enhance unig test to avoid dynamic resource allocation issue by docker (intel#2713)

* make the core number fixed

* fix local predictor

* add Trigger and/or python API (intel#2682)

* add spark 2.4 support (intel#2715)

* update sparse tensor's document (#2714)

* Reserve all state in OptimMethod when calling Optimizer.optimize() multiple times (#2648)

* reserve optimMethod for each worker

* add valdiation throughput

* cache variable previousOptim

* fix: move mkldnn computing to a single thread pool (intel#2724)

Because if we use the parent thread directly, there will be two bugs,
1. The child threads forked from parent thread will be bound to core 0
because of the affinity settings.
2. The native thread has some unknown thread local variables. So if
the parent thread exits and is recreated, such as the thread from
Executors.newFixedThreadPool. The whole app will be segment fault.
The parent thread means the main thread (Local Mode) or worker thread of
mapPartition (Distributed Mode).

* add ceilMode for Pooling & fix batchNorm evaluate (#2708)

* add ceilMode for Pooling & fix batchNorm evaluate

* add training status for dnn layer

* fix comments

* fix IRGraph init & Add regualizer (#2736)

* fix IRGraph init & Add regualizer

* meet review comments

* fix: update mkldnn version to v0.17 issues. (intel#2712)

There're two issues,

1. the padding tensor required. mkl-dnn will use a padding tensor which
    will use more memory, such as 4x1x28x28 to 4x8x28x28(avx2). It will
    pad to times of simd width.
2. the TensorMMap between DenseTensor and DnnTensor. Previous impl
    will allocate DnnTensor when model is created, which will cost too much
    space. So this patch will allocate it at runtime.

* add computshape for some layers and add skip primitives in DnnGraph (intel#2740)

* add computshape for some layer and add skip primitives in DnnGraph

* meet pr comments

* Improve documentation (intel#2745)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* include edge case to cover all the data types (#2742)

* layer auto fusion for dnn graph (intel#2746)

* add auto fusion in dnn graph

* refactor predict for dnn model (intel#2737)

* refactor predict for dnn model

* remove some unit tests (intel#2752)

* remove some conflict tests (#2753)

* Update documentation (intel#2749)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Fix Add operation error when type is Double importing Tensorflow graph (#2721)

* feature: add byte supports for DnnTensor (intel#2751)

* feat: add byte supports for DnnTensor

* [New Feature] Calculating Scales (#2750)

* [New Feature]Calculating Scales

* recursively update mask for container module (intel#2754)

* recursively update mask for container module

* [Enhancement] - Speed up BlasWrapper performance under MKL-DNN (intel#2748)

* add parallel in Blaswrapper

* refactor to support ssd

* meet pr comments

* fix logger serialize

* Loss Function docs improvement (intel#2757)

* Improve Loss Function docs v2

* change asInstanceOf to toDistirbuted in optimizer (#2755)

* change asInstanceOf to toDistirbuted

* change asInstanceOf to toDistirbuted

* convert scale in blas to dnn (#2758)

* convert scale in blas to dnn

* meet pr comment

* feat: reorder for int8 supports (#2756)

1. Because the new data type, we should add a new attribute called dataType
    to the `MemoryData`.
2. Because we should transfer the scales between FP32->int8 and Int8->FP32.
    we should add two new attributes called `mask` and `scales`.

* fix conversion accuracy (intel#2760)

*  fix accuracy for saved model

* exclude mkldnn model when conversion

* feature: layer wise supports of int8 (intel#2762)

Enable the int8 data type in layers, especially for convolutions.
So for a specific layer, it can accept a int8 input. If you want to the fp32
output, should add a reorder.

* feature: mkldnn int8 layer wise supports (intel#2759)

including 3 steps.

1. generate scales of model.
   need an api like `generateScalesWithMask` to generate the scales of
   fp32 model. and the model returned is an fp32 model too.
2. quantize the model
   the `quantize()` api will be compatible with the `bigquant`
   backend, which will set the quantize flag. And when doing compile,
   the quantized weight, output, input will be generated by mkldnn at
   runtime.
3. do the inference (forward).

* update readme for v1 training (intel#2763)

* update release doc for preparation (intel#2764)

* change some docs about mkldnn (intel#2765)

* add comments about mkldnn

* meet pr comments

* examples for int8 (intel#2761)

This is an example of how to use mkldnn int8. There're two steps, use
GenInt8Scales to generate the scales first and save the new model. And than you
can use the quantized model as usual.

* enable fustion by default (intel#2766)

* fix: the influence of default value of fusion (#2768)

* fix: use too much memory of mkldnn models (intel#2783)

* fix: inplace of input/output and weight dimension error (intel#2779)

Some layer's input and output use the same memory. We can't do forward in the
`calcScales`. Because at that time, the input has been changed, its scales maybe
not right. Such as,

Seqeuntail().add(Conv).add(ReLU)

it will do two steps, seq.forward(input) first. and when go into the ReLU, it
will do another forward, so the input will be the output. And scales will be
wrong.

For convolution's weight, the dimension always is 5, although the group number
is 1. But for dnn convolution, if there's no group, the weight's dimension
should be 4.

* fix: the blas wrapper has no scales (intel#2778)

* fix softmax (intel#2777)

* fix: performance regression on resnet50 (intel#2774)

the u8 to s8 or s8 to u8 needs no reorder on this case.

* fix log init (#2781)

* fix: dropout should init primitive (#2789)

* Docs update for spark 2.3, build 0.7 and deps exlude (intel#2671)

* flip to 0.9.0 (intel#2792)

* Improve Layer documentation v1 (#2767)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Loss Function docs improvement v1

* Improve Loss Function docs v2

* Improve Layers documentation

* Improve documentation on Activations

* minor fix

* Update a code section with python style on Metrics.md (intel#2665)

* [Fix] doc : some changes for scalaUserGuide and release links according to … (intel#2791)

* doc : some changes for scalaUserGuide and release links according to v0.8.0 release

* Update build-bigdl-core.md

* Update build-bigdl-core.md

* test: should compare the right grad input (intel#2794)

* fix the wrong error message (#2800)

* [New feature] Add attention layer and ffn layer (intel#2795)

* add attention layer

* add ffn layer and more unit tests

* refactor according to pr comments

* add SerializationTest

* fix unit tests

* add python api

* update readme with newly adopted mkl-dnn (#2803)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer (intel#2802)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer:
Add LARS optimizer: Layer-wise scaled. Also with utility functions to build a set of LARS optim for a container.

Bug fix: The gradient block id of AllReduceParameter is originally composed of {id}{pidTo}gradientBytes{pidFrom}. But the combination of {id}{pidTo} will cause ambiguity. e.g., "112" can be {1}{12} or {11}{2}. Now a "_" is added to separate id from pidTo

* refine documents, correctly set the lrSchedulerOwner bit

* format the added code

* make Lars inherit SGD

* rename Lars -> LarsSGD and reformat

* style changes

* bugfix - set mask for container (intel#2807)

* bugfix - set mask for container

* bugfix #2805: set dimension mask

* Update Graph.scala

* Update Graph.scala

* change set mask indicator's name

* rename set mask params

* [Enhancement]: Scala Reflection: get default value for constructor parameters (intel#2808)

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* resolve conflict

* resolve conflict

* code style check

* remove print

* fix typos

fix typos

* replace randomcropper with centercrop for better performance (#2818)

* fix: memory data hash code should contain data type (intel#2821)

* Optimize backward graph generation and CAddTable (intel#2817)

* Optimize backward graph generation and caddtable

* refine add table

* change api name

* add layer norm and expand size layers (#2819)

* add layer norm and expand size

* meet pr comments

* feat: enable global average pooling (intel#2823)

* feat: enable global average pooling

* test: add more unit tests

* Optimizers: use member variable in parent class

* Revert "Optimizers: use member variable in parent class"

This reverts commit 7e47204

* Dilation in MKL-DNN Convolution (intel#2815)

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* fix typos

fix typos

* make todo all uppercase

* fix: calculate arbitrary mask of scales (intel#2822)

* Use one AllReduceParameter for multi-optim method  training (intel#2814)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* change random seed in UT

* [New feature] add transformer layer (intel#2825)

* add transformer

* refactor class name

* use same embedding for translation

* fix pr comments

* [Bug Fix] Fix Issue 2734 (#2816)

* fix issue 2734

* fix issue 2734

* fix issue 2734

* [Refactor] Reflection Utilization (#2831)

* refactor reflection utils

* refactor reflection utils

* feat: MKLDNN LSTM unidirectional/bidirectional inference support (intel#2806)

* LSTM draft

* MKLDNN LSTM fixed MD

* added hiddenSize

* setMemoryData NativeData

* weights NativeData format set to ldigo, all 1 test passed

* fixed format any problem

* LSTM weights bias initialisation

* add LSTM2 in nn

* Bidirectional LSTM inference enabled

* modified Bidirectional test

* LSTMSpec input format conversion bug between bigdl and mkldnn fixed, not support random weights, bias

* fixed the last problem 1 3 2 4

* Three inference tests with randomly generated parameters

* Added comments and modified the LSTMSpec (tests using Equivalent.nearequals)

* Deleted nn/LSTM2. Renamed methods. Added a requirement in nn/TimeDistributed

* combined initMemoryDescs() into initFwdPrimitives()

* Add require for input size and hidden size matching if layers of LSTM is more than one

* Refactor RNN

* Add comment on gate order to mkldnn/RNN

* Add unidirectional multilayer test

* add comments/ modify UTs

* phase is not used anymore/ use isTraining() in stead

* operationWant enhanced/ weight init/ release() parameters()

* remove input format check and change some variables names

* input format check / throw exception print info / release code

* comment style and RNNSerialTest

* remove unnecessary comments

* Softmax -> SoftMax (#2837)

* bug fix for cmul (intel#2836)

* bug fix for cmul

* meet pr comments

* set new storage to weight and bias for weight fusion (intel#2839)

* Add parameter processor for LARS (#2832)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* implement lars whole layer gradient norm calculation

* change random seed in UT

* add limitation on "trust" of LARS, remove debug output

* reformat

* add tests in DirtriOptimizer for LARS

* reformat

* update parameters in UT

* update parameters in UT

* Add transformer to LM example (intel#2835)

* add transformer to LM example

* refactor dropout in Transformer

* meet pr comments

* feat: MKLDNN LSTM unidirectional/bidirectional backward support (#2840)

* MKLDNN LSTM backward support with accuracy testing

* fix: require consistent between shape and layout of mkldnn (intel#2824)

* fix: fusion for multi-group of convolution (intel#2826)

* fix: support int8 of jointable (#2827)

* fix: support int8 of jointable
* doc: add more docs

* fix: invokeAndWait2 should throw the exception in the tasks (intel#2843)

* fix acc bug & init dnn thread (intel#2841)

* support tnc and ntc conversion (intel#2844)

* support ntc in dnn layer (intel#2847)

* support ntc in dnn layer

* meet pr comments

* [WIP]Add beam search feature in transformer model (intel#2834)

* add beam search feature

* Update beam search feature and unit test

* add symbolToLogits function set check

* update clearState and add serial test

* add SequenceBeamSearch to python layers

* add createSequenceBeamSearch method to python api

* feat: add a property to disable omp thread affinity (intel#2849)

* fix: use treeset to calc topk to upgrade the performance of DetectionOutputSSD (intel#2853)

* fix: wrong affinity settings (intel#2857)

* update beam search feature for interface with transformer model (#2855)

* update beam search for padding value and cache structure

* update python API for beam search

* add comments and update python layer

* modify comments format

* modify comments format

* Support converting blas lstm to dnn lstm (#2846)

* convert from blas lstm to dnn lstm

* meet pr comments

* fix load lstm error bug (intel#2858)

* Add beam search in transformer (intel#2856)

* Add beam search in transformer

* meet pr comments

* fix: upgrade the performance of normalize (intel#2854)

* feat: add axis to softmax (intel#2859)

* add release doc for 0.9 (intel#2862)

* fix: update core ref to master (intel#2865)

* flip version to 0.10.0 (intel#2869)

* [Bug Fix] - Fix module version comparison  (intel#2871)

* update serialization

* update serialization

* convert IRgraph momentum to mkldnn (intel#2872)

* tutorial fix (intel#2879)

* feat: RoiAlign Forward (intel#2874)

* Add set input output format API in Python (intel#2880)

* add set input output format

* add static graph check

* feat: Feature Pyramid Networks Forward (intel#2870)

* fix memory leak for ir graph training (intel#2895)

* add gemm layer (#2882)

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add gemm layer

* add gemm layer

* add Shape layer (intel#2885)

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add Gather layer (intel#2897)

* add gather layer

* [New feature] Add maskhead (intel#2892)

* support for maskhead

* fix unit tests (intel#2905)

* modify  predict/predictClass function  (#2868)

* predictClass output modification

* predict/predictClass function modification in Beta Api

* predict/predictClass function modification

* predict/predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* [New feature] Add Boxhead (intel#2894)

* add boxhead

* add SerialTest

* meet pr comments

* fix: Add TopBlocks to Feature Pyramid Networks (FPN) (#2899)

* Add Mean Average Precision validation method (intel#2906)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* fix boxhead unit tests (#2912)

* python api nested list input and pooler python api (intel#2900)

* Auto memory management for MKLDNN (#2867)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* style fixes

* change _implicitMemoryOwner -> _this

* [New feature] Add region proposal (intel#2896)

* add Regionproposal

* [New feature] add maskrcnn (#2908)

* add maskrcnn

* fix mask head

* move maskrcnn to models

* add maskrcnn serialTest

* Add Onnx Supported Layers (intel#2902)

* remove duplicated layers

* Update RoiLabel class and add RoiImageFeatureToBatch (intel#2913)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* update RoiLabel, add RoiImageFeatureToBatch

* fix typo in class name

* updates by suggestions

* minor updates

* Move RoiMiniBatch to MTImageFeatureToBatch.scala

* mask in RoiLabel now have Floats not Bytes

* use IndexedSeq for RoiLabel

* style fix

* add isCrowd and origSize to final target table

* style fix

* isCrowd change to float, add doc

* add tests and bug fixes

* add util getting RoiLabels from ImageFeatures

* add util getting RoiLabels from Table

* comment out the tests

* rename utils in RoiLabel

* feat: MKLDNN GRU forward/backward support (#2893)

* Onnx support: modify unsqueeze function (#2910)

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* add maskutils (intel#2921)

* add maskutils

* update tests & docs

* fix typo in document

* Fix memory leaks on training (intel#2914)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* fix memory leak in batch norm

* style fixes

* change _implicitMemoryOwner -> _this

* release submat

* release opencv submats

* support samples with different size  to one mini batch (intel#2929)

* add to batch with resize

* meet comments

* support batch for mask head and pooler (intel#2926)

* support batch for mask head

* meet comments

* Onnx support: add a dim parameter to ops.Gather (intel#2920)

* add dim parameter to ops.Gather

* improve and simplify code

* improve and simplify code

* improve and simplify code

* improve and simplify code

* support batch for regionproposal (#2928)

* support batch for regionproposal

* enable gru blas-to-dnn conversion (intel#2930)

* Onnx support: add pos parameter to softmax (intel#2933)

* add pos parameter to softmax

* add pos parameter to softmax

* add pos parameter to softmax

* fix review problem

* fix review problem

* Add resize for segmentation (intel#2923)

* add resize for segmentation

* meet pr comments

* support batch input for boxhead (#2924)

* boxhead support batch input

* meet pr comments

* COCO SeqFile (intel#2927)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* ignore non-existing images

* updates based on GH comments

* ONNX Support (#2918)

* onnx dev

* add onnx loader

* clean up

* feat: add precision recall auc (#2941)

* feat: add precision recall auc

* add post processing for maskrcnn model (#2931)

* add mask postprocessing

* put image info to mask model

* fix TimeDistributedCriterion() lack of parameter of dimension issue (intel#2940)

* revert back api (intel#2943)

* fix: softmax and bn+scale fusion (intel#2937)

* feat: multi models support with MKL-DNN backend (intel#2936)

* feat: multi models support with MKL-DNN backend

* add COCO MAP (#2935)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* add COCO MAP

* revert merge conflict

* ignore non-existing images

* add IOU related API. MAP now parses RLEs

* BBox now inclusive

* updates based on GH comments

* add COCODataset.getImageById

* COCO topK default => -1, remove height: Int, width: Int in GroundTruthRLE

* update imageId2Image

* rename MAPObjectDetection utils, add cocoSegmentationAndBBox, refine formatting

* rename utils

* update documents

* check size of bbox & classes & scores & labels & iscrowd. Handle empty predictions

* add gt and target image size checking, add support for empty target bbox, add UT

* detection sorted before matching with GT. Optimize MAPResult merging. Add UT for merging

* COCO Seq file reader: grey to bgr (intel#2942)

* grey to bgr

* refactor isGrayScaleImage

* simplify grey scale image checking

* Add the flushing denormal values option on BigDL side (#2934)

* add no argument apply api for softmax (intel#2945)

* add no argument apply api for softmax

* add no argument apply api for softmax

* ONNX ResNet example (intel#2939)

* add onnx resnet example

* add doc for onnx

* add doc for onnx

* clean up

* add maskrcnn inference example (intel#2944)

* add maskrcnn inference example

* meet pr comments

* add model download url

* Update the RoiLabel and MTImageFeatureToBatch (intel#2925)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* Python MKLDNN examples for CNN(LeNet) and RNN(LSTM) (#2932)

* fix: takeSample only works for dnn backend and get one batch (intel#2947)

* fix: takeSample only works for dnn backend and get one batch

* edit doc (#2948)

* Rename filesToRoiImageFrame to filesToRoiImageFeatures (intel#2949)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* filesToRoiImageFrame -> filesToRoiImageFeatures, to public

* fix: move out setMklThreads of MklDnn (intel#2950)

* memory data cleanup (#2956)

* memory data cleanup

* Onnx support: RoiAlign and TopK parameter update (#2957)

* Topk add dim and increase parameter

* RoiAlign add max pooling mode

* add test cases

* add test cases

* remove masks requirements (intel#2959)

* fix: the squeeze should not be included in IRElement (intel#2962)

* enhance COCODataset (#2954)

* enhance COCODataset:
Add COCODataset.loadFromSeqFile
Add COCODataset.toImageFeatures
Add COCOImage.toTable

* rename and polish doc

* fix COCO serialize bug

* fix typo in function name

* typo fix (intel#2965)

* rename RoiImageFeatureToBatch APIs (#2964)

* RoiMiniBatch enhancement (#2953)

* SerializableIndexedSeq

* allow empty target & image size info

* rename RoiImageFeatureToBatch APIs

* set as private

* change back to array

* MTImageFeatureToBatch without labels

* handle iscrowd

* remove duplication in merge

* feat: add softmax backward (intel#2967)

* feat: add softmax backward

* fix: fuse bn scale and relu to bn. (intel#2966)

* fix: fuse bn scale and relu.

* fix mask unit tests (intel#2973)

* fix: nms stability when using treeset. (intel#2972)

* flip version to 0.11 (intel#2974)

* refactor anchor generator (#2963)

* refactor anchor generator

* meet pr comments

* fix code style

* ROIAlign refactor (intel#2960)

* ROIAlign refactor

* fix unit tests

* fix model load of maskrcnn (intel#2961)

* fix maskrcnn model load

* delete temp file

* fix maskrcnn tests

* support roialign backward (intel#2975)

* support roialign backward

* fix sparselinear unit test

* fix: bn nhwc error, the channel should be the last dim (#2981)

* refactor: move torch relevants unit tests to integration tests. (intel#2971)

* fix: enable integration accuracy tests (intel#2976)

* fix: softmax dnn backend wrong order of primitive (intel#2986)

* modify TextClassifier.scala (#2987)

* Add a method to merge nested StaticGraphs (intel#2985)

* NHWC support when running with MKL-DNN (#2989)

* support NHWC for MKLDNN

* fix unit tests

* Keras with MKL-DNN backend support (#2990)

* Update README.md

* Update README.md

* feat: add distri optimizer v2 (intel#2992)

* update error message in AllReduceParameter (#2997)

* update error message in AllReduceParameter

* use tensorflow proto jar (#2994)

* fix callBigDLFunc (intel#3002)

* Remove final for AbstractModule (intel#3001)

* DistriOptimizerV2 argument (intel#3003)

* call DistriOptimizerV2

* fix inception (intel#3010)

* fix top1 and treenn (intel#3011)

* remove final setExtraParameters (#3014)

* move pretrain in DistriOptimizerV2 (intel#3016)

* move getData

* rename

* remove time counting

* deprecate dlframe (intel#3012)

* deprecate dlframe

* fix throughput (#3017)

* fix throughput

* update

* add release doc for 0.10.0 (intel#3020)

* test examples by distrioptimizerv2 (intel#3007)

* enable scala examples by distrioptimizerv2

* update example's readme

* update integration test

* test python examples by distriOptimizerV2 (intel#3008)

* Test python examples by distriOptimizerV2

* deprecate nn.keras (intel#3013)

* deprecate nn.keras

* fix loss when minibatch size is different (intel#3021)

* fix loss

* fix ut

* fix style check (intel#3022)

* specify pyspark version (intel#3030)

* specify pyspark version

* add release doc for 0.11 (#3026)

* flip version to 0.12 (intel#3029)



* update

* fix KerasLayer new parameters() (#3034)

* Fix analytics zoo protobuf shading problem (intel#3033)

* change shade name and remove protobuf-java (already introduced by tf)

* remove protobuf

* add required dependencies (#3047)

* update doc (intel#3056)

* Updatedoc (#3060)

* Update install-from-pip.md

* [WIP] spark 3.0 (intel#3054)

* spark 3.0

* add spark3.0 deployment (intel#3061)

* add spark3.0 deployment

* add warning to remind Optimizer() deprecates (intel#3062)

* add warning to remind deprecates

* Update scala maven plugin (#3068)

* update scala maven plugin

* change to public (#3064)

* Add big model support (#3067)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* squeeze target dimension (corner case) in ClassNLLCriterion (intel#3072)

* fix target dimension match error

* update message (#3073)

* flip version to 0.13-snapshot (intel#3074)

* flip version to 0.13-snapshot

* Uncompressed Tensor  (intel#3079)

* support no compressing parameter

* address comments

* hotfix ClassNLLCriterion with cloned target (#3081)

* hotfix ClassNLLCriterion with cloned target

* Fix SerializationUtils clone issue of QuantizedTensor (intel#3088)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* update clone quantizedtensor

* update

* add OptimPredictorShutdownSpec UT in integration test (#3089)

* move integration UT to a general test script (intel#3094)

* back port master (intel#3096)

* set seed to avoid random error in PredictionServiceUT (intel#3097)

* Jdk11 support (intel#3098)

* update for jdk 11 support and doc

* add serializeUid (intel#3099)

* update doc (intel#3104)

* add doc for running in ide (intel#3106)

* fix callBigDLFunc return a Int while the true return value from java is a byte array. (intel#3111)

* add list of df support (intel#3113)

* Update readme (intel#3118)

* Update index.md

* add 0.12.2 release download (#3122)

* remove DLFrames (intel#3124)

* remove DLFrames

* update

* update

* update

* rm dlframe example from test script

* Add Utest about dividing zero (#3128)

* Add Utest about dividing zero

* add Utest and zero check of LocalData

* add Utest and zero check of LocalData

* change

* Add Utest about dividing zero

* fix test

* add python3 to Dockerfile (intel#3132)

* add python3 to Dockerfile

* update

* update jdk

* update

* make default DistriOptimizer as V2 (intel#3129)

* make default DistriOptimizer as V2

* update

* fix dlframe (intel#3133)

* DistriOptimizerV2 logger (intel#3135)

* DistriOptimizerV2 logger

* update

* fix style check

* validate epoch num

* move dlframe SharedParamsApater to AZ and roll back to OptimizerV1 (intel#3137)

* upgrade spark version (intel#3138)

* Update deploy-spark2.sh

* 0.13 release doc (#3144)

* upgrade log4j (intel#3141)

* flip0.14 (intel#3142)

* flip0.14

* update

* Update deploy-spark3.sh (#3145)

* update

* update

* update

* update

* fix make dist

* migrate path

* update

* update

Co-authored-by: zhangxiaoli73 <[email protected]>
Co-authored-by: Jerry Wu <[email protected]>
Co-authored-by: Xin Qiu <[email protected]>
Co-authored-by: Yanzhang Wang <[email protected]>
Co-authored-by: GenBrg <[email protected]>
Co-authored-by: LeicongLi <[email protected]>
Co-authored-by: Emiliano Martinez <[email protected]>
Co-authored-by: abdolence <[email protected]>
Co-authored-by: Enrique Garcia <[email protected]>
Co-authored-by: Louie Tsai <[email protected]>
Co-authored-by: yaochi <[email protected]>
Co-authored-by: Menooker <[email protected]>
Co-authored-by: Menooker <[email protected]>
Co-authored-by: Firecrackerxox <[email protected]>
Co-authored-by: majing921201 <[email protected]>
Co-authored-by: jenniew <[email protected]>
Co-authored-by: Xiao <[email protected]>
Co-authored-by: Firecrackerxox <[email protected]>
Co-authored-by: Hui Li <[email protected]>
Co-authored-by: Jason Dai <[email protected]>
Co-authored-by: dding3 <[email protected]>
Co-authored-by: Yang Wang <[email protected]>
Co-authored-by: Yina Chen <[email protected]>
Co-authored-by: Hangrui Cao <[email protected]>
Co-authored-by: pinggao18 <[email protected]>
Le-Zheng added a commit to Le-Zheng/analytics-zoo that referenced this pull request Sep 10, 2021
Le-Zheng added a commit to Le-Zheng/analytics-zoo that referenced this pull request Sep 10, 2021
Le-Zheng added a commit to Le-Zheng/analytics-zoo that referenced this pull request Sep 10, 2021
Le-Zheng added a commit to Le-Zheng/analytics-zoo that referenced this pull request Sep 10, 2021
Le-Zheng added a commit to Le-Zheng/analytics-zoo that referenced this pull request Sep 10, 2021
Le-Zheng added a commit to Le-Zheng/analytics-zoo that referenced this pull request Sep 10, 2021
Le-Zheng added a commit to Le-Zheng/analytics-zoo that referenced this pull request Sep 10, 2021
* convert static graph to IR graph and build (intel#2711)

* add static graph to IR graph

* meet pr comments

* [Enhancement] - Enhance unig test to avoid dynamic resource allocation issue by docker (intel#2713)

* make the core number fixed

* fix local predictor

* add Trigger and/or python API (intel#2682)

* add spark 2.4 support (intel#2715)

* update sparse tensor's document (#2714)

* Reserve all state in OptimMethod when calling Optimizer.optimize() multiple times (#2648)

* reserve optimMethod for each worker

* add valdiation throughput

* cache variable previousOptim

* fix: move mkldnn computing to a single thread pool (intel#2724)

Because if we use the parent thread directly, there will be two bugs,
1. The child threads forked from parent thread will be bound to core 0
because of the affinity settings.
2. The native thread has some unknown thread local variables. So if
the parent thread exits and is recreated, such as the thread from
Executors.newFixedThreadPool. The whole app will be segment fault.
The parent thread means the main thread (Local Mode) or worker thread of
mapPartition (Distributed Mode).

* add ceilMode for Pooling & fix batchNorm evaluate (#2708)

* add ceilMode for Pooling & fix batchNorm evaluate

* add training status for dnn layer

* fix comments

* fix IRGraph init & Add regualizer (#2736)

* fix IRGraph init & Add regualizer

* meet review comments

* fix: update mkldnn version to v0.17 issues. (intel#2712)

There're two issues,

1. the padding tensor required. mkl-dnn will use a padding tensor which
    will use more memory, such as 4x1x28x28 to 4x8x28x28(avx2). It will
    pad to times of simd width.
2. the TensorMMap between DenseTensor and DnnTensor. Previous impl
    will allocate DnnTensor when model is created, which will cost too much
    space. So this patch will allocate it at runtime.

* add computshape for some layers and add skip primitives in DnnGraph (intel#2740)

* add computshape for some layer and add skip primitives in DnnGraph

* meet pr comments

* Improve documentation (intel#2745)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* include edge case to cover all the data types (#2742)

* layer auto fusion for dnn graph (intel#2746)

* add auto fusion in dnn graph

* refactor predict for dnn model (intel#2737)

* refactor predict for dnn model

* remove some unit tests (intel#2752)

* remove some conflict tests (#2753)

* Update documentation (intel#2749)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Fix Add operation error when type is Double importing Tensorflow graph (#2721)

* feature: add byte supports for DnnTensor (intel#2751)

* feat: add byte supports for DnnTensor

* [New Feature] Calculating Scales (#2750)

* [New Feature]Calculating Scales

* recursively update mask for container module (intel#2754)

* recursively update mask for container module

* [Enhancement] - Speed up BlasWrapper performance under MKL-DNN (intel#2748)

* add parallel in Blaswrapper

* refactor to support ssd

* meet pr comments

* fix logger serialize

* Loss Function docs improvement (intel#2757)

* Improve Loss Function docs v2

* change asInstanceOf to toDistirbuted in optimizer (#2755)

* change asInstanceOf to toDistirbuted

* change asInstanceOf to toDistirbuted

* convert scale in blas to dnn (#2758)

* convert scale in blas to dnn

* meet pr comment

* feat: reorder for int8 supports (#2756)

1. Because the new data type, we should add a new attribute called dataType
    to the `MemoryData`.
2. Because we should transfer the scales between FP32->int8 and Int8->FP32.
    we should add two new attributes called `mask` and `scales`.

* fix conversion accuracy (intel#2760)

*  fix accuracy for saved model

* exclude mkldnn model when conversion

* feature: layer wise supports of int8 (intel#2762)

Enable the int8 data type in layers, especially for convolutions.
So for a specific layer, it can accept a int8 input. If you want to the fp32
output, should add a reorder.

* feature: mkldnn int8 layer wise supports (intel#2759)

including 3 steps.

1. generate scales of model.
   need an api like `generateScalesWithMask` to generate the scales of
   fp32 model. and the model returned is an fp32 model too.
2. quantize the model
   the `quantize()` api will be compatible with the `bigquant`
   backend, which will set the quantize flag. And when doing compile,
   the quantized weight, output, input will be generated by mkldnn at
   runtime.
3. do the inference (forward).

* update readme for v1 training (intel#2763)

* update release doc for preparation (intel#2764)

* change some docs about mkldnn (intel#2765)

* add comments about mkldnn

* meet pr comments

* examples for int8 (intel#2761)

This is an example of how to use mkldnn int8. There're two steps, use
GenInt8Scales to generate the scales first and save the new model. And than you
can use the quantized model as usual.

* enable fustion by default (intel#2766)

* fix: the influence of default value of fusion (#2768)

* fix: use too much memory of mkldnn models (intel#2783)

* fix: inplace of input/output and weight dimension error (intel#2779)

Some layer's input and output use the same memory. We can't do forward in the
`calcScales`. Because at that time, the input has been changed, its scales maybe
not right. Such as,

Seqeuntail().add(Conv).add(ReLU)

it will do two steps, seq.forward(input) first. and when go into the ReLU, it
will do another forward, so the input will be the output. And scales will be
wrong.

For convolution's weight, the dimension always is 5, although the group number
is 1. But for dnn convolution, if there's no group, the weight's dimension
should be 4.

* fix: the blas wrapper has no scales (intel#2778)

* fix softmax (intel#2777)

* fix: performance regression on resnet50 (intel#2774)

the u8 to s8 or s8 to u8 needs no reorder on this case.

* fix log init (#2781)

* fix: dropout should init primitive (#2789)

* Docs update for spark 2.3, build 0.7 and deps exlude (intel#2671)

* flip to 0.9.0 (intel#2792)

* Improve Layer documentation v1 (#2767)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Loss Function docs improvement v1

* Improve Loss Function docs v2

* Improve Layers documentation

* Improve documentation on Activations

* minor fix

* Update a code section with python style on Metrics.md (intel#2665)

* [Fix] doc : some changes for scalaUserGuide and release links according to … (intel#2791)

* doc : some changes for scalaUserGuide and release links according to v0.8.0 release

* Update build-bigdl-core.md

* Update build-bigdl-core.md

* test: should compare the right grad input (intel#2794)

* fix the wrong error message (#2800)

* [New feature] Add attention layer and ffn layer (intel#2795)

* add attention layer

* add ffn layer and more unit tests

* refactor according to pr comments

* add SerializationTest

* fix unit tests

* add python api

* update readme with newly adopted mkl-dnn (#2803)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer (intel#2802)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer:
Add LARS optimizer: Layer-wise scaled. Also with utility functions to build a set of LARS optim for a container.

Bug fix: The gradient block id of AllReduceParameter is originally composed of {id}{pidTo}gradientBytes{pidFrom}. But the combination of {id}{pidTo} will cause ambiguity. e.g., "112" can be {1}{12} or {11}{2}. Now a "_" is added to separate id from pidTo

* refine documents, correctly set the lrSchedulerOwner bit

* format the added code

* make Lars inherit SGD

* rename Lars -> LarsSGD and reformat

* style changes

* bugfix - set mask for container (intel#2807)

* bugfix - set mask for container

* bugfix #2805: set dimension mask

* Update Graph.scala

* Update Graph.scala

* change set mask indicator's name

* rename set mask params

* [Enhancement]: Scala Reflection: get default value for constructor parameters (intel#2808)

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* resolve conflict

* resolve conflict

* code style check

* remove print

* fix typos

fix typos

* replace randomcropper with centercrop for better performance (#2818)

* fix: memory data hash code should contain data type (intel#2821)

* Optimize backward graph generation and CAddTable (intel#2817)

* Optimize backward graph generation and caddtable

* refine add table

* change api name

* add layer norm and expand size layers (#2819)

* add layer norm and expand size

* meet pr comments

* feat: enable global average pooling (intel#2823)

* feat: enable global average pooling

* test: add more unit tests

* Optimizers: use member variable in parent class

* Revert "Optimizers: use member variable in parent class"

This reverts commit 7e47204

* Dilation in MKL-DNN Convolution (intel#2815)

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* fix typos

fix typos

* make todo all uppercase

* fix: calculate arbitrary mask of scales (intel#2822)

* Use one AllReduceParameter for multi-optim method  training (intel#2814)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* change random seed in UT

* [New feature] add transformer layer (intel#2825)

* add transformer

* refactor class name

* use same embedding for translation

* fix pr comments

* [Bug Fix] Fix Issue 2734 (#2816)

* fix issue 2734

* fix issue 2734

* fix issue 2734

* [Refactor] Reflection Utilization (#2831)

* refactor reflection utils

* refactor reflection utils

* feat: MKLDNN LSTM unidirectional/bidirectional inference support (intel#2806)

* LSTM draft

* MKLDNN LSTM fixed MD

* added hiddenSize

* setMemoryData NativeData

* weights NativeData format set to ldigo, all 1 test passed

* fixed format any problem

* LSTM weights bias initialisation

* add LSTM2 in nn

* Bidirectional LSTM inference enabled

* modified Bidirectional test

* LSTMSpec input format conversion bug between bigdl and mkldnn fixed, not support random weights, bias

* fixed the last problem 1 3 2 4

* Three inference tests with randomly generated parameters

* Added comments and modified the LSTMSpec (tests using Equivalent.nearequals)

* Deleted nn/LSTM2. Renamed methods. Added a requirement in nn/TimeDistributed

* combined initMemoryDescs() into initFwdPrimitives()

* Add require for input size and hidden size matching if layers of LSTM is more than one

* Refactor RNN

* Add comment on gate order to mkldnn/RNN

* Add unidirectional multilayer test

* add comments/ modify UTs

* phase is not used anymore/ use isTraining() in stead

* operationWant enhanced/ weight init/ release() parameters()

* remove input format check and change some variables names

* input format check / throw exception print info / release code

* comment style and RNNSerialTest

* remove unnecessary comments

* Softmax -> SoftMax (#2837)

* bug fix for cmul (intel#2836)

* bug fix for cmul

* meet pr comments

* set new storage to weight and bias for weight fusion (intel#2839)

* Add parameter processor for LARS (#2832)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* implement lars whole layer gradient norm calculation

* change random seed in UT

* add limitation on "trust" of LARS, remove debug output

* reformat

* add tests in DirtriOptimizer for LARS

* reformat

* update parameters in UT

* update parameters in UT

* Add transformer to LM example (intel#2835)

* add transformer to LM example

* refactor dropout in Transformer

* meet pr comments

* feat: MKLDNN LSTM unidirectional/bidirectional backward support (#2840)

* MKLDNN LSTM backward support with accuracy testing

* fix: require consistent between shape and layout of mkldnn (intel#2824)

* fix: fusion for multi-group of convolution (intel#2826)

* fix: support int8 of jointable (#2827)

* fix: support int8 of jointable
* doc: add more docs

* fix: invokeAndWait2 should throw the exception in the tasks (intel#2843)

* fix acc bug & init dnn thread (intel#2841)

* support tnc and ntc conversion (intel#2844)

* support ntc in dnn layer (intel#2847)

* support ntc in dnn layer

* meet pr comments

* [WIP]Add beam search feature in transformer model (intel#2834)

* add beam search feature

* Update beam search feature and unit test

* add symbolToLogits function set check

* update clearState and add serial test

* add SequenceBeamSearch to python layers

* add createSequenceBeamSearch method to python api

* feat: add a property to disable omp thread affinity (intel#2849)

* fix: use treeset to calc topk to upgrade the performance of DetectionOutputSSD (intel#2853)

* fix: wrong affinity settings (intel#2857)

* update beam search feature for interface with transformer model (#2855)

* update beam search for padding value and cache structure

* update python API for beam search

* add comments and update python layer

* modify comments format

* modify comments format

* Support converting blas lstm to dnn lstm (#2846)

* convert from blas lstm to dnn lstm

* meet pr comments

* fix load lstm error bug (intel#2858)

* Add beam search in transformer (intel#2856)

* Add beam search in transformer

* meet pr comments

* fix: upgrade the performance of normalize (intel#2854)

* feat: add axis to softmax (intel#2859)

* add release doc for 0.9 (intel#2862)

* fix: update core ref to master (intel#2865)

* flip version to 0.10.0 (intel#2869)

* [Bug Fix] - Fix module version comparison  (intel#2871)

* update serialization

* update serialization

* convert IRgraph momentum to mkldnn (intel#2872)

* tutorial fix (intel#2879)

* feat: RoiAlign Forward (intel#2874)

* Add set input output format API in Python (intel#2880)

* add set input output format

* add static graph check

* feat: Feature Pyramid Networks Forward (intel#2870)

* fix memory leak for ir graph training (intel#2895)

* add gemm layer (#2882)

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add gemm layer

* add gemm layer

* add Shape layer (intel#2885)

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add Gather layer (intel#2897)

* add gather layer

* [New feature] Add maskhead (intel#2892)

* support for maskhead

* fix unit tests (intel#2905)

* modify  predict/predictClass function  (#2868)

* predictClass output modification

* predict/predictClass function modification in Beta Api

* predict/predictClass function modification

* predict/predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* [New feature] Add Boxhead (intel#2894)

* add boxhead

* add SerialTest

* meet pr comments

* fix: Add TopBlocks to Feature Pyramid Networks (FPN) (#2899)

* Add Mean Average Precision validation method (intel#2906)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* fix boxhead unit tests (#2912)

* python api nested list input and pooler python api (intel#2900)

* Auto memory management for MKLDNN (#2867)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* style fixes

* change _implicitMemoryOwner -> _this

* [New feature] Add region proposal (intel#2896)

* add Regionproposal

* [New feature] add maskrcnn (#2908)

* add maskrcnn

* fix mask head

* move maskrcnn to models

* add maskrcnn serialTest

* Add Onnx Supported Layers (intel#2902)

* remove duplicated layers

* Update RoiLabel class and add RoiImageFeatureToBatch (intel#2913)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* update RoiLabel, add RoiImageFeatureToBatch

* fix typo in class name

* updates by suggestions

* minor updates

* Move RoiMiniBatch to MTImageFeatureToBatch.scala

* mask in RoiLabel now have Floats not Bytes

* use IndexedSeq for RoiLabel

* style fix

* add isCrowd and origSize to final target table

* style fix

* isCrowd change to float, add doc

* add tests and bug fixes

* add util getting RoiLabels from ImageFeatures

* add util getting RoiLabels from Table

* comment out the tests

* rename utils in RoiLabel

* feat: MKLDNN GRU forward/backward support (#2893)

* Onnx support: modify unsqueeze function (#2910)

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* add maskutils (intel#2921)

* add maskutils

* update tests & docs

* fix typo in document

* Fix memory leaks on training (intel#2914)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* fix memory leak in batch norm

* style fixes

* change _implicitMemoryOwner -> _this

* release submat

* release opencv submats

* support samples with different size  to one mini batch (intel#2929)

* add to batch with resize

* meet comments

* support batch for mask head and pooler (intel#2926)

* support batch for mask head

* meet comments

* Onnx support: add a dim parameter to ops.Gather (intel#2920)

* add dim parameter to ops.Gather

* improve and simplify code

* improve and simplify code

* improve and simplify code

* improve and simplify code

* support batch for regionproposal (#2928)

* support batch for regionproposal

* enable gru blas-to-dnn conversion (intel#2930)

* Onnx support: add pos parameter to softmax (intel#2933)

* add pos parameter to softmax

* add pos parameter to softmax

* add pos parameter to softmax

* fix review problem

* fix review problem

* Add resize for segmentation (intel#2923)

* add resize for segmentation

* meet pr comments

* support batch input for boxhead (#2924)

* boxhead support batch input

* meet pr comments

* COCO SeqFile (intel#2927)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* ignore non-existing images

* updates based on GH comments

* ONNX Support (#2918)

* onnx dev

* add onnx loader

* clean up

* feat: add precision recall auc (#2941)

* feat: add precision recall auc

* add post processing for maskrcnn model (#2931)

* add mask postprocessing

* put image info to mask model

* fix TimeDistributedCriterion() lack of parameter of dimension issue (intel#2940)

* revert back api (intel#2943)

* fix: softmax and bn+scale fusion (intel#2937)

* feat: multi models support with MKL-DNN backend (intel#2936)

* feat: multi models support with MKL-DNN backend

* add COCO MAP (#2935)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* add COCO MAP

* revert merge conflict

* ignore non-existing images

* add IOU related API. MAP now parses RLEs

* BBox now inclusive

* updates based on GH comments

* add COCODataset.getImageById

* COCO topK default => -1, remove height: Int, width: Int in GroundTruthRLE

* update imageId2Image

* rename MAPObjectDetection utils, add cocoSegmentationAndBBox, refine formatting

* rename utils

* update documents

* check size of bbox & classes & scores & labels & iscrowd. Handle empty predictions

* add gt and target image size checking, add support for empty target bbox, add UT

* detection sorted before matching with GT. Optimize MAPResult merging. Add UT for merging

* COCO Seq file reader: grey to bgr (intel#2942)

* grey to bgr

* refactor isGrayScaleImage

* simplify grey scale image checking

* Add the flushing denormal values option on BigDL side (#2934)

* add no argument apply api for softmax (intel#2945)

* add no argument apply api for softmax

* add no argument apply api for softmax

* ONNX ResNet example (intel#2939)

* add onnx resnet example

* add doc for onnx

* add doc for onnx

* clean up

* add maskrcnn inference example (intel#2944)

* add maskrcnn inference example

* meet pr comments

* add model download url

* Update the RoiLabel and MTImageFeatureToBatch (intel#2925)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* Python MKLDNN examples for CNN(LeNet) and RNN(LSTM) (#2932)

* fix: takeSample only works for dnn backend and get one batch (intel#2947)

* fix: takeSample only works for dnn backend and get one batch

* edit doc (#2948)

* Rename filesToRoiImageFrame to filesToRoiImageFeatures (intel#2949)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* filesToRoiImageFrame -> filesToRoiImageFeatures, to public

* fix: move out setMklThreads of MklDnn (intel#2950)

* memory data cleanup (#2956)

* memory data cleanup

* Onnx support: RoiAlign and TopK parameter update (#2957)

* Topk add dim and increase parameter

* RoiAlign add max pooling mode

* add test cases

* add test cases

* remove masks requirements (intel#2959)

* fix: the squeeze should not be included in IRElement (intel#2962)

* enhance COCODataset (#2954)

* enhance COCODataset:
Add COCODataset.loadFromSeqFile
Add COCODataset.toImageFeatures
Add COCOImage.toTable

* rename and polish doc

* fix COCO serialize bug

* fix typo in function name

* typo fix (intel#2965)

* rename RoiImageFeatureToBatch APIs (#2964)

* RoiMiniBatch enhancement (#2953)

* SerializableIndexedSeq

* allow empty target & image size info

* rename RoiImageFeatureToBatch APIs

* set as private

* change back to array

* MTImageFeatureToBatch without labels

* handle iscrowd

* remove duplication in merge

* feat: add softmax backward (intel#2967)

* feat: add softmax backward

* fix: fuse bn scale and relu to bn. (intel#2966)

* fix: fuse bn scale and relu.

* fix mask unit tests (intel#2973)

* fix: nms stability when using treeset. (intel#2972)

* flip version to 0.11 (intel#2974)

* refactor anchor generator (#2963)

* refactor anchor generator

* meet pr comments

* fix code style

* ROIAlign refactor (intel#2960)

* ROIAlign refactor

* fix unit tests

* fix model load of maskrcnn (intel#2961)

* fix maskrcnn model load

* delete temp file

* fix maskrcnn tests

* support roialign backward (intel#2975)

* support roialign backward

* fix sparselinear unit test

* fix: bn nhwc error, the channel should be the last dim (#2981)

* refactor: move torch relevants unit tests to integration tests. (intel#2971)

* fix: enable integration accuracy tests (intel#2976)

* fix: softmax dnn backend wrong order of primitive (intel#2986)

* modify TextClassifier.scala (#2987)

* Add a method to merge nested StaticGraphs (intel#2985)

* NHWC support when running with MKL-DNN (#2989)

* support NHWC for MKLDNN

* fix unit tests

* Keras with MKL-DNN backend support (#2990)

* Update README.md

* Update README.md

* feat: add distri optimizer v2 (intel#2992)

* update error message in AllReduceParameter (#2997)

* update error message in AllReduceParameter

* use tensorflow proto jar (#2994)

* fix callBigDLFunc (intel#3002)

* Remove final for AbstractModule (intel#3001)

* DistriOptimizerV2 argument (intel#3003)

* call DistriOptimizerV2

* fix inception (intel#3010)

* fix top1 and treenn (intel#3011)

* remove final setExtraParameters (#3014)

* move pretrain in DistriOptimizerV2 (intel#3016)

* move getData

* rename

* remove time counting

* deprecate dlframe (intel#3012)

* deprecate dlframe

* fix throughput (#3017)

* fix throughput

* update

* add release doc for 0.10.0 (intel#3020)

* test examples by distrioptimizerv2 (intel#3007)

* enable scala examples by distrioptimizerv2

* update example's readme

* update integration test

* test python examples by distriOptimizerV2 (intel#3008)

* Test python examples by distriOptimizerV2

* deprecate nn.keras (intel#3013)

* deprecate nn.keras

* fix loss when minibatch size is different (intel#3021)

* fix loss

* fix ut

* fix style check (intel#3022)

* specify pyspark version (intel#3030)

* specify pyspark version

* add release doc for 0.11 (#3026)

* flip version to 0.12 (intel#3029)



* update

* fix KerasLayer new parameters() (#3034)

* Fix analytics zoo protobuf shading problem (intel#3033)

* change shade name and remove protobuf-java (already introduced by tf)

* remove protobuf

* add required dependencies (#3047)

* update doc (intel#3056)

* Updatedoc (#3060)

* Update install-from-pip.md

* [WIP] spark 3.0 (intel#3054)

* spark 3.0

* add spark3.0 deployment (intel#3061)

* add spark3.0 deployment

* add warning to remind Optimizer() deprecates (intel#3062)

* add warning to remind deprecates

* Update scala maven plugin (#3068)

* update scala maven plugin

* change to public (#3064)

* Add big model support (#3067)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* squeeze target dimension (corner case) in ClassNLLCriterion (intel#3072)

* fix target dimension match error

* update message (#3073)

* flip version to 0.13-snapshot (intel#3074)

* flip version to 0.13-snapshot

* Uncompressed Tensor  (intel#3079)

* support no compressing parameter

* address comments

* hotfix ClassNLLCriterion with cloned target (#3081)

* hotfix ClassNLLCriterion with cloned target

* Fix SerializationUtils clone issue of QuantizedTensor (intel#3088)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* update clone quantizedtensor

* update

* add OptimPredictorShutdownSpec UT in integration test (#3089)

* move integration UT to a general test script (intel#3094)

* back port master (intel#3096)

* set seed to avoid random error in PredictionServiceUT (intel#3097)

* Jdk11 support (intel#3098)

* update for jdk 11 support and doc

* add serializeUid (intel#3099)

* update doc (intel#3104)

* add doc for running in ide (intel#3106)

* fix callBigDLFunc return a Int while the true return value from java is a byte array. (intel#3111)

* add list of df support (intel#3113)

* Update readme (intel#3118)

* Update index.md

* add 0.12.2 release download (#3122)

* remove DLFrames (intel#3124)

* remove DLFrames

* update

* update

* update

* rm dlframe example from test script

* Add Utest about dividing zero (#3128)

* Add Utest about dividing zero

* add Utest and zero check of LocalData

* add Utest and zero check of LocalData

* change

* Add Utest about dividing zero

* fix test

* add python3 to Dockerfile (intel#3132)

* add python3 to Dockerfile

* update

* update jdk

* update

* make default DistriOptimizer as V2 (intel#3129)

* make default DistriOptimizer as V2

* update

* fix dlframe (intel#3133)

* DistriOptimizerV2 logger (intel#3135)

* DistriOptimizerV2 logger

* update

* fix style check

* validate epoch num

* move dlframe SharedParamsApater to AZ and roll back to OptimizerV1 (intel#3137)

* upgrade spark version (intel#3138)

* Update deploy-spark2.sh

* 0.13 release doc (#3144)

* upgrade log4j (intel#3141)

* flip0.14 (intel#3142)

* flip0.14

* update

* Update deploy-spark3.sh (#3145)

* update

* update

* update

* update

* fix make dist

* migrate path

* update

* update

Co-authored-by: zhangxiaoli73 <[email protected]>
Co-authored-by: Jerry Wu <[email protected]>
Co-authored-by: Xin Qiu <[email protected]>
Co-authored-by: Yanzhang Wang <[email protected]>
Co-authored-by: GenBrg <[email protected]>
Co-authored-by: LeicongLi <[email protected]>
Co-authored-by: Emiliano Martinez <[email protected]>
Co-authored-by: abdolence <[email protected]>
Co-authored-by: Enrique Garcia <[email protected]>
Co-authored-by: Louie Tsai <[email protected]>
Co-authored-by: yaochi <[email protected]>
Co-authored-by: Menooker <[email protected]>
Co-authored-by: Menooker <[email protected]>
Co-authored-by: Firecrackerxox <[email protected]>
Co-authored-by: majing921201 <[email protected]>
Co-authored-by: jenniew <[email protected]>
Co-authored-by: Xiao <[email protected]>
Co-authored-by: Firecrackerxox <[email protected]>
Co-authored-by: Hui Li <[email protected]>
Co-authored-by: Jason Dai <[email protected]>
Co-authored-by: dding3 <[email protected]>
Co-authored-by: Yang Wang <[email protected]>
Co-authored-by: Yina Chen <[email protected]>
Co-authored-by: Hangrui Cao <[email protected]>
Co-authored-by: pinggao18 <[email protected]>
Le-Zheng added a commit to Le-Zheng/analytics-zoo that referenced this pull request Sep 14, 2021
* convert static graph to IR graph and build (intel#2711)

* add static graph to IR graph

* meet pr comments

* [Enhancement] - Enhance unig test to avoid dynamic resource allocation issue by docker (intel#2713)

* make the core number fixed

* fix local predictor

* add Trigger and/or python API (intel#2682)

* add spark 2.4 support (intel#2715)

* update sparse tensor's document (#2714)

* Reserve all state in OptimMethod when calling Optimizer.optimize() multiple times (#2648)

* reserve optimMethod for each worker

* add valdiation throughput

* cache variable previousOptim

* fix: move mkldnn computing to a single thread pool (intel#2724)

Because if we use the parent thread directly, there will be two bugs,
1. The child threads forked from parent thread will be bound to core 0
because of the affinity settings.
2. The native thread has some unknown thread local variables. So if
the parent thread exits and is recreated, such as the thread from
Executors.newFixedThreadPool. The whole app will be segment fault.
The parent thread means the main thread (Local Mode) or worker thread of
mapPartition (Distributed Mode).

* add ceilMode for Pooling & fix batchNorm evaluate (#2708)

* add ceilMode for Pooling & fix batchNorm evaluate

* add training status for dnn layer

* fix comments

* fix IRGraph init & Add regualizer (#2736)

* fix IRGraph init & Add regualizer

* meet review comments

* fix: update mkldnn version to v0.17 issues. (intel#2712)

There're two issues,

1. the padding tensor required. mkl-dnn will use a padding tensor which
    will use more memory, such as 4x1x28x28 to 4x8x28x28(avx2). It will
    pad to times of simd width.
2. the TensorMMap between DenseTensor and DnnTensor. Previous impl
    will allocate DnnTensor when model is created, which will cost too much
    space. So this patch will allocate it at runtime.

* add computshape for some layers and add skip primitives in DnnGraph (intel#2740)

* add computshape for some layer and add skip primitives in DnnGraph

* meet pr comments

* Improve documentation (intel#2745)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* include edge case to cover all the data types (#2742)

* layer auto fusion for dnn graph (intel#2746)

* add auto fusion in dnn graph

* refactor predict for dnn model (intel#2737)

* refactor predict for dnn model

* remove some unit tests (intel#2752)

* remove some conflict tests (#2753)

* Update documentation (intel#2749)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Fix Add operation error when type is Double importing Tensorflow graph (#2721)

* feature: add byte supports for DnnTensor (intel#2751)

* feat: add byte supports for DnnTensor

* [New Feature] Calculating Scales (#2750)

* [New Feature]Calculating Scales

* recursively update mask for container module (intel#2754)

* recursively update mask for container module

* [Enhancement] - Speed up BlasWrapper performance under MKL-DNN (intel#2748)

* add parallel in Blaswrapper

* refactor to support ssd

* meet pr comments

* fix logger serialize

* Loss Function docs improvement (intel#2757)

* Improve Loss Function docs v2

* change asInstanceOf to toDistirbuted in optimizer (#2755)

* change asInstanceOf to toDistirbuted

* change asInstanceOf to toDistirbuted

* convert scale in blas to dnn (#2758)

* convert scale in blas to dnn

* meet pr comment

* feat: reorder for int8 supports (#2756)

1. Because the new data type, we should add a new attribute called dataType
    to the `MemoryData`.
2. Because we should transfer the scales between FP32->int8 and Int8->FP32.
    we should add two new attributes called `mask` and `scales`.

* fix conversion accuracy (intel#2760)

*  fix accuracy for saved model

* exclude mkldnn model when conversion

* feature: layer wise supports of int8 (intel#2762)

Enable the int8 data type in layers, especially for convolutions.
So for a specific layer, it can accept a int8 input. If you want to the fp32
output, should add a reorder.

* feature: mkldnn int8 layer wise supports (intel#2759)

including 3 steps.

1. generate scales of model.
   need an api like `generateScalesWithMask` to generate the scales of
   fp32 model. and the model returned is an fp32 model too.
2. quantize the model
   the `quantize()` api will be compatible with the `bigquant`
   backend, which will set the quantize flag. And when doing compile,
   the quantized weight, output, input will be generated by mkldnn at
   runtime.
3. do the inference (forward).

* update readme for v1 training (intel#2763)

* update release doc for preparation (intel#2764)

* change some docs about mkldnn (intel#2765)

* add comments about mkldnn

* meet pr comments

* examples for int8 (intel#2761)

This is an example of how to use mkldnn int8. There're two steps, use
GenInt8Scales to generate the scales first and save the new model. And than you
can use the quantized model as usual.

* enable fustion by default (intel#2766)

* fix: the influence of default value of fusion (#2768)

* fix: use too much memory of mkldnn models (intel#2783)

* fix: inplace of input/output and weight dimension error (intel#2779)

Some layer's input and output use the same memory. We can't do forward in the
`calcScales`. Because at that time, the input has been changed, its scales maybe
not right. Such as,

Seqeuntail().add(Conv).add(ReLU)

it will do two steps, seq.forward(input) first. and when go into the ReLU, it
will do another forward, so the input will be the output. And scales will be
wrong.

For convolution's weight, the dimension always is 5, although the group number
is 1. But for dnn convolution, if there's no group, the weight's dimension
should be 4.

* fix: the blas wrapper has no scales (intel#2778)

* fix softmax (intel#2777)

* fix: performance regression on resnet50 (intel#2774)

the u8 to s8 or s8 to u8 needs no reorder on this case.

* fix log init (#2781)

* fix: dropout should init primitive (#2789)

* Docs update for spark 2.3, build 0.7 and deps exlude (intel#2671)

* flip to 0.9.0 (intel#2792)

* Improve Layer documentation v1 (#2767)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Loss Function docs improvement v1

* Improve Loss Function docs v2

* Improve Layers documentation

* Improve documentation on Activations

* minor fix

* Update a code section with python style on Metrics.md (intel#2665)

* [Fix] doc : some changes for scalaUserGuide and release links according to … (intel#2791)

* doc : some changes for scalaUserGuide and release links according to v0.8.0 release

* Update build-bigdl-core.md

* Update build-bigdl-core.md

* test: should compare the right grad input (intel#2794)

* fix the wrong error message (#2800)

* [New feature] Add attention layer and ffn layer (intel#2795)

* add attention layer

* add ffn layer and more unit tests

* refactor according to pr comments

* add SerializationTest

* fix unit tests

* add python api

* update readme with newly adopted mkl-dnn (#2803)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer (intel#2802)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer:
Add LARS optimizer: Layer-wise scaled. Also with utility functions to build a set of LARS optim for a container.

Bug fix: The gradient block id of AllReduceParameter is originally composed of {id}{pidTo}gradientBytes{pidFrom}. But the combination of {id}{pidTo} will cause ambiguity. e.g., "112" can be {1}{12} or {11}{2}. Now a "_" is added to separate id from pidTo

* refine documents, correctly set the lrSchedulerOwner bit

* format the added code

* make Lars inherit SGD

* rename Lars -> LarsSGD and reformat

* style changes

* bugfix - set mask for container (intel#2807)

* bugfix - set mask for container

* bugfix #2805: set dimension mask

* Update Graph.scala

* Update Graph.scala

* change set mask indicator's name

* rename set mask params

* [Enhancement]: Scala Reflection: get default value for constructor parameters (intel#2808)

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* resolve conflict

* resolve conflict

* code style check

* remove print

* fix typos

fix typos

* replace randomcropper with centercrop for better performance (#2818)

* fix: memory data hash code should contain data type (intel#2821)

* Optimize backward graph generation and CAddTable (intel#2817)

* Optimize backward graph generation and caddtable

* refine add table

* change api name

* add layer norm and expand size layers (#2819)

* add layer norm and expand size

* meet pr comments

* feat: enable global average pooling (intel#2823)

* feat: enable global average pooling

* test: add more unit tests

* Optimizers: use member variable in parent class

* Revert "Optimizers: use member variable in parent class"

This reverts commit 7e47204

* Dilation in MKL-DNN Convolution (intel#2815)

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* fix typos

fix typos

* make todo all uppercase

* fix: calculate arbitrary mask of scales (intel#2822)

* Use one AllReduceParameter for multi-optim method  training (intel#2814)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* change random seed in UT

* [New feature] add transformer layer (intel#2825)

* add transformer

* refactor class name

* use same embedding for translation

* fix pr comments

* [Bug Fix] Fix Issue 2734 (#2816)

* fix issue 2734

* fix issue 2734

* fix issue 2734

* [Refactor] Reflection Utilization (#2831)

* refactor reflection utils

* refactor reflection utils

* feat: MKLDNN LSTM unidirectional/bidirectional inference support (intel#2806)

* LSTM draft

* MKLDNN LSTM fixed MD

* added hiddenSize

* setMemoryData NativeData

* weights NativeData format set to ldigo, all 1 test passed

* fixed format any problem

* LSTM weights bias initialisation

* add LSTM2 in nn

* Bidirectional LSTM inference enabled

* modified Bidirectional test

* LSTMSpec input format conversion bug between bigdl and mkldnn fixed, not support random weights, bias

* fixed the last problem 1 3 2 4

* Three inference tests with randomly generated parameters

* Added comments and modified the LSTMSpec (tests using Equivalent.nearequals)

* Deleted nn/LSTM2. Renamed methods. Added a requirement in nn/TimeDistributed

* combined initMemoryDescs() into initFwdPrimitives()

* Add require for input size and hidden size matching if layers of LSTM is more than one

* Refactor RNN

* Add comment on gate order to mkldnn/RNN

* Add unidirectional multilayer test

* add comments/ modify UTs

* phase is not used anymore/ use isTraining() in stead

* operationWant enhanced/ weight init/ release() parameters()

* remove input format check and change some variables names

* input format check / throw exception print info / release code

* comment style and RNNSerialTest

* remove unnecessary comments

* Softmax -> SoftMax (#2837)

* bug fix for cmul (intel#2836)

* bug fix for cmul

* meet pr comments

* set new storage to weight and bias for weight fusion (intel#2839)

* Add parameter processor for LARS (#2832)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* implement lars whole layer gradient norm calculation

* change random seed in UT

* add limitation on "trust" of LARS, remove debug output

* reformat

* add tests in DirtriOptimizer for LARS

* reformat

* update parameters in UT

* update parameters in UT

* Add transformer to LM example (intel#2835)

* add transformer to LM example

* refactor dropout in Transformer

* meet pr comments

* feat: MKLDNN LSTM unidirectional/bidirectional backward support (#2840)

* MKLDNN LSTM backward support with accuracy testing

* fix: require consistent between shape and layout of mkldnn (intel#2824)

* fix: fusion for multi-group of convolution (intel#2826)

* fix: support int8 of jointable (#2827)

* fix: support int8 of jointable
* doc: add more docs

* fix: invokeAndWait2 should throw the exception in the tasks (intel#2843)

* fix acc bug & init dnn thread (intel#2841)

* support tnc and ntc conversion (intel#2844)

* support ntc in dnn layer (intel#2847)

* support ntc in dnn layer

* meet pr comments

* [WIP]Add beam search feature in transformer model (intel#2834)

* add beam search feature

* Update beam search feature and unit test

* add symbolToLogits function set check

* update clearState and add serial test

* add SequenceBeamSearch to python layers

* add createSequenceBeamSearch method to python api

* feat: add a property to disable omp thread affinity (intel#2849)

* fix: use treeset to calc topk to upgrade the performance of DetectionOutputSSD (intel#2853)

* fix: wrong affinity settings (intel#2857)

* update beam search feature for interface with transformer model (#2855)

* update beam search for padding value and cache structure

* update python API for beam search

* add comments and update python layer

* modify comments format

* modify comments format

* Support converting blas lstm to dnn lstm (#2846)

* convert from blas lstm to dnn lstm

* meet pr comments

* fix load lstm error bug (intel#2858)

* Add beam search in transformer (intel#2856)

* Add beam search in transformer

* meet pr comments

* fix: upgrade the performance of normalize (intel#2854)

* feat: add axis to softmax (intel#2859)

* add release doc for 0.9 (intel#2862)

* fix: update core ref to master (intel#2865)

* flip version to 0.10.0 (intel#2869)

* [Bug Fix] - Fix module version comparison  (intel#2871)

* update serialization

* update serialization

* convert IRgraph momentum to mkldnn (intel#2872)

* tutorial fix (intel#2879)

* feat: RoiAlign Forward (intel#2874)

* Add set input output format API in Python (intel#2880)

* add set input output format

* add static graph check

* feat: Feature Pyramid Networks Forward (intel#2870)

* fix memory leak for ir graph training (intel#2895)

* add gemm layer (#2882)

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add gemm layer

* add gemm layer

* add Shape layer (intel#2885)

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add Gather layer (intel#2897)

* add gather layer

* [New feature] Add maskhead (intel#2892)

* support for maskhead

* fix unit tests (intel#2905)

* modify  predict/predictClass function  (#2868)

* predictClass output modification

* predict/predictClass function modification in Beta Api

* predict/predictClass function modification

* predict/predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* [New feature] Add Boxhead (intel#2894)

* add boxhead

* add SerialTest

* meet pr comments

* fix: Add TopBlocks to Feature Pyramid Networks (FPN) (#2899)

* Add Mean Average Precision validation method (intel#2906)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* fix boxhead unit tests (#2912)

* python api nested list input and pooler python api (intel#2900)

* Auto memory management for MKLDNN (#2867)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* style fixes

* change _implicitMemoryOwner -> _this

* [New feature] Add region proposal (intel#2896)

* add Regionproposal

* [New feature] add maskrcnn (#2908)

* add maskrcnn

* fix mask head

* move maskrcnn to models

* add maskrcnn serialTest

* Add Onnx Supported Layers (intel#2902)

* remove duplicated layers

* Update RoiLabel class and add RoiImageFeatureToBatch (intel#2913)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* update RoiLabel, add RoiImageFeatureToBatch

* fix typo in class name

* updates by suggestions

* minor updates

* Move RoiMiniBatch to MTImageFeatureToBatch.scala

* mask in RoiLabel now have Floats not Bytes

* use IndexedSeq for RoiLabel

* style fix

* add isCrowd and origSize to final target table

* style fix

* isCrowd change to float, add doc

* add tests and bug fixes

* add util getting RoiLabels from ImageFeatures

* add util getting RoiLabels from Table

* comment out the tests

* rename utils in RoiLabel

* feat: MKLDNN GRU forward/backward support (#2893)

* Onnx support: modify unsqueeze function (#2910)

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* add maskutils (intel#2921)

* add maskutils

* update tests & docs

* fix typo in document

* Fix memory leaks on training (intel#2914)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* fix memory leak in batch norm

* style fixes

* change _implicitMemoryOwner -> _this

* release submat

* release opencv submats

* support samples with different size  to one mini batch (intel#2929)

* add to batch with resize

* meet comments

* support batch for mask head and pooler (intel#2926)

* support batch for mask head

* meet comments

* Onnx support: add a dim parameter to ops.Gather (intel#2920)

* add dim parameter to ops.Gather

* improve and simplify code

* improve and simplify code

* improve and simplify code

* improve and simplify code

* support batch for regionproposal (#2928)

* support batch for regionproposal

* enable gru blas-to-dnn conversion (intel#2930)

* Onnx support: add pos parameter to softmax (intel#2933)

* add pos parameter to softmax

* add pos parameter to softmax

* add pos parameter to softmax

* fix review problem

* fix review problem

* Add resize for segmentation (intel#2923)

* add resize for segmentation

* meet pr comments

* support batch input for boxhead (#2924)

* boxhead support batch input

* meet pr comments

* COCO SeqFile (intel#2927)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* ignore non-existing images

* updates based on GH comments

* ONNX Support (#2918)

* onnx dev

* add onnx loader

* clean up

* feat: add precision recall auc (#2941)

* feat: add precision recall auc

* add post processing for maskrcnn model (#2931)

* add mask postprocessing

* put image info to mask model

* fix TimeDistributedCriterion() lack of parameter of dimension issue (intel#2940)

* revert back api (intel#2943)

* fix: softmax and bn+scale fusion (intel#2937)

* feat: multi models support with MKL-DNN backend (intel#2936)

* feat: multi models support with MKL-DNN backend

* add COCO MAP (#2935)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* add COCO MAP

* revert merge conflict

* ignore non-existing images

* add IOU related API. MAP now parses RLEs

* BBox now inclusive

* updates based on GH comments

* add COCODataset.getImageById

* COCO topK default => -1, remove height: Int, width: Int in GroundTruthRLE

* update imageId2Image

* rename MAPObjectDetection utils, add cocoSegmentationAndBBox, refine formatting

* rename utils

* update documents

* check size of bbox & classes & scores & labels & iscrowd. Handle empty predictions

* add gt and target image size checking, add support for empty target bbox, add UT

* detection sorted before matching with GT. Optimize MAPResult merging. Add UT for merging

* COCO Seq file reader: grey to bgr (intel#2942)

* grey to bgr

* refactor isGrayScaleImage

* simplify grey scale image checking

* Add the flushing denormal values option on BigDL side (#2934)

* add no argument apply api for softmax (intel#2945)

* add no argument apply api for softmax

* add no argument apply api for softmax

* ONNX ResNet example (intel#2939)

* add onnx resnet example

* add doc for onnx

* add doc for onnx

* clean up

* add maskrcnn inference example (intel#2944)

* add maskrcnn inference example

* meet pr comments

* add model download url

* Update the RoiLabel and MTImageFeatureToBatch (intel#2925)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* Python MKLDNN examples for CNN(LeNet) and RNN(LSTM) (#2932)

* fix: takeSample only works for dnn backend and get one batch (intel#2947)

* fix: takeSample only works for dnn backend and get one batch

* edit doc (#2948)

* Rename filesToRoiImageFrame to filesToRoiImageFeatures (intel#2949)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* filesToRoiImageFrame -> filesToRoiImageFeatures, to public

* fix: move out setMklThreads of MklDnn (intel#2950)

* memory data cleanup (#2956)

* memory data cleanup

* Onnx support: RoiAlign and TopK parameter update (#2957)

* Topk add dim and increase parameter

* RoiAlign add max pooling mode

* add test cases

* add test cases

* remove masks requirements (intel#2959)

* fix: the squeeze should not be included in IRElement (intel#2962)

* enhance COCODataset (#2954)

* enhance COCODataset:
Add COCODataset.loadFromSeqFile
Add COCODataset.toImageFeatures
Add COCOImage.toTable

* rename and polish doc

* fix COCO serialize bug

* fix typo in function name

* typo fix (intel#2965)

* rename RoiImageFeatureToBatch APIs (#2964)

* RoiMiniBatch enhancement (#2953)

* SerializableIndexedSeq

* allow empty target & image size info

* rename RoiImageFeatureToBatch APIs

* set as private

* change back to array

* MTImageFeatureToBatch without labels

* handle iscrowd

* remove duplication in merge

* feat: add softmax backward (intel#2967)

* feat: add softmax backward

* fix: fuse bn scale and relu to bn. (intel#2966)

* fix: fuse bn scale and relu.

* fix mask unit tests (intel#2973)

* fix: nms stability when using treeset. (intel#2972)

* flip version to 0.11 (intel#2974)

* refactor anchor generator (#2963)

* refactor anchor generator

* meet pr comments

* fix code style

* ROIAlign refactor (intel#2960)

* ROIAlign refactor

* fix unit tests

* fix model load of maskrcnn (intel#2961)

* fix maskrcnn model load

* delete temp file

* fix maskrcnn tests

* support roialign backward (intel#2975)

* support roialign backward

* fix sparselinear unit test

* fix: bn nhwc error, the channel should be the last dim (#2981)

* refactor: move torch relevants unit tests to integration tests. (intel#2971)

* fix: enable integration accuracy tests (intel#2976)

* fix: softmax dnn backend wrong order of primitive (intel#2986)

* modify TextClassifier.scala (#2987)

* Add a method to merge nested StaticGraphs (intel#2985)

* NHWC support when running with MKL-DNN (#2989)

* support NHWC for MKLDNN

* fix unit tests

* Keras with MKL-DNN backend support (#2990)

* Update README.md

* Update README.md

* feat: add distri optimizer v2 (intel#2992)

* update error message in AllReduceParameter (#2997)

* update error message in AllReduceParameter

* use tensorflow proto jar (#2994)

* fix callBigDLFunc (intel#3002)

* Remove final for AbstractModule (intel#3001)

* DistriOptimizerV2 argument (intel#3003)

* call DistriOptimizerV2

* fix inception (intel#3010)

* fix top1 and treenn (intel#3011)

* remove final setExtraParameters (#3014)

* move pretrain in DistriOptimizerV2 (intel#3016)

* move getData

* rename

* remove time counting

* deprecate dlframe (intel#3012)

* deprecate dlframe

* fix throughput (#3017)

* fix throughput

* update

* add release doc for 0.10.0 (intel#3020)

* test examples by distrioptimizerv2 (intel#3007)

* enable scala examples by distrioptimizerv2

* update example's readme

* update integration test

* test python examples by distriOptimizerV2 (intel#3008)

* Test python examples by distriOptimizerV2

* deprecate nn.keras (intel#3013)

* deprecate nn.keras

* fix loss when minibatch size is different (intel#3021)

* fix loss

* fix ut

* fix style check (intel#3022)

* specify pyspark version (intel#3030)

* specify pyspark version

* add release doc for 0.11 (#3026)

* flip version to 0.12 (intel#3029)



* update

* fix KerasLayer new parameters() (#3034)

* Fix analytics zoo protobuf shading problem (intel#3033)

* change shade name and remove protobuf-java (already introduced by tf)

* remove protobuf

* add required dependencies (#3047)

* update doc (intel#3056)

* Updatedoc (#3060)

* Update install-from-pip.md

* [WIP] spark 3.0 (intel#3054)

* spark 3.0

* add spark3.0 deployment (intel#3061)

* add spark3.0 deployment

* add warning to remind Optimizer() deprecates (intel#3062)

* add warning to remind deprecates

* Update scala maven plugin (#3068)

* update scala maven plugin

* change to public (#3064)

* Add big model support (#3067)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* squeeze target dimension (corner case) in ClassNLLCriterion (intel#3072)

* fix target dimension match error

* update message (#3073)

* flip version to 0.13-snapshot (intel#3074)

* flip version to 0.13-snapshot

* Uncompressed Tensor  (intel#3079)

* support no compressing parameter

* address comments

* hotfix ClassNLLCriterion with cloned target (#3081)

* hotfix ClassNLLCriterion with cloned target

* Fix SerializationUtils clone issue of QuantizedTensor (intel#3088)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* update clone quantizedtensor

* update

* add OptimPredictorShutdownSpec UT in integration test (#3089)

* move integration UT to a general test script (intel#3094)

* back port master (intel#3096)

* set seed to avoid random error in PredictionServiceUT (intel#3097)

* Jdk11 support (intel#3098)

* update for jdk 11 support and doc

* add serializeUid (intel#3099)

* update doc (intel#3104)

* add doc for running in ide (intel#3106)

* fix callBigDLFunc return a Int while the true return value from java is a byte array. (intel#3111)

* add list of df support (intel#3113)

* Update readme (intel#3118)

* Update index.md

* add 0.12.2 release download (#3122)

* remove DLFrames (intel#3124)

* remove DLFrames

* update

* update

* update

* rm dlframe example from test script

* Add Utest about dividing zero (#3128)

* Add Utest about dividing zero

* add Utest and zero check of LocalData

* add Utest and zero check of LocalData

* change

* Add Utest about dividing zero

* fix test

* add python3 to Dockerfile (intel#3132)

* add python3 to Dockerfile

* update

* update jdk

* update

* make default DistriOptimizer as V2 (intel#3129)

* make default DistriOptimizer as V2

* update

* fix dlframe (intel#3133)

* DistriOptimizerV2 logger (intel#3135)

* DistriOptimizerV2 logger

* update

* fix style check

* validate epoch num

* move dlframe SharedParamsApater to AZ and roll back to OptimizerV1 (intel#3137)

* upgrade spark version (intel#3138)

* Update deploy-spark2.sh

* 0.13 release doc (#3144)

* upgrade log4j (intel#3141)

* flip0.14 (intel#3142)

* flip0.14

* update

* Update deploy-spark3.sh (#3145)

* update

* update

* update

* update

* fix make dist

* migrate path

* update

* update

Co-authored-by: zhangxiaoli73 <[email protected]>
Co-authored-by: Jerry Wu <[email protected]>
Co-authored-by: Xin Qiu <[email protected]>
Co-authored-by: Yanzhang Wang <[email protected]>
Co-authored-by: GenBrg <[email protected]>
Co-authored-by: LeicongLi <[email protected]>
Co-authored-by: Emiliano Martinez <[email protected]>
Co-authored-by: abdolence <[email protected]>
Co-authored-by: Enrique Garcia <[email protected]>
Co-authored-by: Louie Tsai <[email protected]>
Co-authored-by: yaochi <[email protected]>
Co-authored-by: Menooker <[email protected]>
Co-authored-by: Menooker <[email protected]>
Co-authored-by: Firecrackerxox <[email protected]>
Co-authored-by: majing921201 <[email protected]>
Co-authored-by: jenniew <[email protected]>
Co-authored-by: Xiao <[email protected]>
Co-authored-by: Firecrackerxox <[email protected]>
Co-authored-by: Hui Li <[email protected]>
Co-authored-by: Jason Dai <[email protected]>
Co-authored-by: dding3 <[email protected]>
Co-authored-by: Yang Wang <[email protected]>
Co-authored-by: Yina Chen <[email protected]>
Co-authored-by: Hangrui Cao <[email protected]>
Co-authored-by: pinggao18 <[email protected]>
Le-Zheng added a commit to Le-Zheng/analytics-zoo that referenced this pull request Sep 14, 2021
* convert static graph to IR graph and build (intel#2711)

* add static graph to IR graph

* meet pr comments

* [Enhancement] - Enhance unig test to avoid dynamic resource allocation issue by docker (intel#2713)

* make the core number fixed

* fix local predictor

* add Trigger and/or python API (intel#2682)

* add spark 2.4 support (intel#2715)

* update sparse tensor's document (#2714)

* Reserve all state in OptimMethod when calling Optimizer.optimize() multiple times (#2648)

* reserve optimMethod for each worker

* add valdiation throughput

* cache variable previousOptim

* fix: move mkldnn computing to a single thread pool (intel#2724)

Because if we use the parent thread directly, there will be two bugs,
1. The child threads forked from parent thread will be bound to core 0
because of the affinity settings.
2. The native thread has some unknown thread local variables. So if
the parent thread exits and is recreated, such as the thread from
Executors.newFixedThreadPool. The whole app will be segment fault.
The parent thread means the main thread (Local Mode) or worker thread of
mapPartition (Distributed Mode).

* add ceilMode for Pooling & fix batchNorm evaluate (#2708)

* add ceilMode for Pooling & fix batchNorm evaluate

* add training status for dnn layer

* fix comments

* fix IRGraph init & Add regualizer (#2736)

* fix IRGraph init & Add regualizer

* meet review comments

* fix: update mkldnn version to v0.17 issues. (intel#2712)

There're two issues,

1. the padding tensor required. mkl-dnn will use a padding tensor which
    will use more memory, such as 4x1x28x28 to 4x8x28x28(avx2). It will
    pad to times of simd width.
2. the TensorMMap between DenseTensor and DnnTensor. Previous impl
    will allocate DnnTensor when model is created, which will cost too much
    space. So this patch will allocate it at runtime.

* add computshape for some layers and add skip primitives in DnnGraph (intel#2740)

* add computshape for some layer and add skip primitives in DnnGraph

* meet pr comments

* Improve documentation (intel#2745)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* include edge case to cover all the data types (#2742)

* layer auto fusion for dnn graph (intel#2746)

* add auto fusion in dnn graph

* refactor predict for dnn model (intel#2737)

* refactor predict for dnn model

* remove some unit tests (intel#2752)

* remove some conflict tests (#2753)

* Update documentation (intel#2749)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Fix Add operation error when type is Double importing Tensorflow graph (#2721)

* feature: add byte supports for DnnTensor (intel#2751)

* feat: add byte supports for DnnTensor

* [New Feature] Calculating Scales (#2750)

* [New Feature]Calculating Scales

* recursively update mask for container module (intel#2754)

* recursively update mask for container module

* [Enhancement] - Speed up BlasWrapper performance under MKL-DNN (intel#2748)

* add parallel in Blaswrapper

* refactor to support ssd

* meet pr comments

* fix logger serialize

* Loss Function docs improvement (intel#2757)

* Improve Loss Function docs v2

* change asInstanceOf to toDistirbuted in optimizer (#2755)

* change asInstanceOf to toDistirbuted

* change asInstanceOf to toDistirbuted

* convert scale in blas to dnn (#2758)

* convert scale in blas to dnn

* meet pr comment

* feat: reorder for int8 supports (#2756)

1. Because the new data type, we should add a new attribute called dataType
    to the `MemoryData`.
2. Because we should transfer the scales between FP32->int8 and Int8->FP32.
    we should add two new attributes called `mask` and `scales`.

* fix conversion accuracy (intel#2760)

*  fix accuracy for saved model

* exclude mkldnn model when conversion

* feature: layer wise supports of int8 (intel#2762)

Enable the int8 data type in layers, especially for convolutions.
So for a specific layer, it can accept a int8 input. If you want to the fp32
output, should add a reorder.

* feature: mkldnn int8 layer wise supports (intel#2759)

including 3 steps.

1. generate scales of model.
   need an api like `generateScalesWithMask` to generate the scales of
   fp32 model. and the model returned is an fp32 model too.
2. quantize the model
   the `quantize()` api will be compatible with the `bigquant`
   backend, which will set the quantize flag. And when doing compile,
   the quantized weight, output, input will be generated by mkldnn at
   runtime.
3. do the inference (forward).

* update readme for v1 training (intel#2763)

* update release doc for preparation (intel#2764)

* change some docs about mkldnn (intel#2765)

* add comments about mkldnn

* meet pr comments

* examples for int8 (intel#2761)

This is an example of how to use mkldnn int8. There're two steps, use
GenInt8Scales to generate the scales first and save the new model. And than you
can use the quantized model as usual.

* enable fustion by default (intel#2766)

* fix: the influence of default value of fusion (#2768)

* fix: use too much memory of mkldnn models (intel#2783)

* fix: inplace of input/output and weight dimension error (intel#2779)

Some layer's input and output use the same memory. We can't do forward in the
`calcScales`. Because at that time, the input has been changed, its scales maybe
not right. Such as,

Seqeuntail().add(Conv).add(ReLU)

it will do two steps, seq.forward(input) first. and when go into the ReLU, it
will do another forward, so the input will be the output. And scales will be
wrong.

For convolution's weight, the dimension always is 5, although the group number
is 1. But for dnn convolution, if there's no group, the weight's dimension
should be 4.

* fix: the blas wrapper has no scales (intel#2778)

* fix softmax (intel#2777)

* fix: performance regression on resnet50 (intel#2774)

the u8 to s8 or s8 to u8 needs no reorder on this case.

* fix log init (#2781)

* fix: dropout should init primitive (#2789)

* Docs update for spark 2.3, build 0.7 and deps exlude (intel#2671)

* flip to 0.9.0 (intel#2792)

* Improve Layer documentation v1 (#2767)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Loss Function docs improvement v1

* Improve Loss Function docs v2

* Improve Layers documentation

* Improve documentation on Activations

* minor fix

* Update a code section with python style on Metrics.md (intel#2665)

* [Fix] doc : some changes for scalaUserGuide and release links according to … (intel#2791)

* doc : some changes for scalaUserGuide and release links according to v0.8.0 release

* Update build-bigdl-core.md

* Update build-bigdl-core.md

* test: should compare the right grad input (intel#2794)

* fix the wrong error message (#2800)

* [New feature] Add attention layer and ffn layer (intel#2795)

* add attention layer

* add ffn layer and more unit tests

* refactor according to pr comments

* add SerializationTest

* fix unit tests

* add python api

* update readme with newly adopted mkl-dnn (#2803)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer (intel#2802)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer:
Add LARS optimizer: Layer-wise scaled. Also with utility functions to build a set of LARS optim for a container.

Bug fix: The gradient block id of AllReduceParameter is originally composed of {id}{pidTo}gradientBytes{pidFrom}. But the combination of {id}{pidTo} will cause ambiguity. e.g., "112" can be {1}{12} or {11}{2}. Now a "_" is added to separate id from pidTo

* refine documents, correctly set the lrSchedulerOwner bit

* format the added code

* make Lars inherit SGD

* rename Lars -> LarsSGD and reformat

* style changes

* bugfix - set mask for container (intel#2807)

* bugfix - set mask for container

* bugfix #2805: set dimension mask

* Update Graph.scala

* Update Graph.scala

* change set mask indicator's name

* rename set mask params

* [Enhancement]: Scala Reflection: get default value for constructor parameters (intel#2808)

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* resolve conflict

* resolve conflict

* code style check

* remove print

* fix typos

fix typos

* replace randomcropper with centercrop for better performance (#2818)

* fix: memory data hash code should contain data type (intel#2821)

* Optimize backward graph generation and CAddTable (intel#2817)

* Optimize backward graph generation and caddtable

* refine add table

* change api name

* add layer norm and expand size layers (#2819)

* add layer norm and expand size

* meet pr comments

* feat: enable global average pooling (intel#2823)

* feat: enable global average pooling

* test: add more unit tests

* Optimizers: use member variable in parent class

* Revert "Optimizers: use member variable in parent class"

This reverts commit 7e47204

* Dilation in MKL-DNN Convolution (intel#2815)

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* fix typos

fix typos

* make todo all uppercase

* fix: calculate arbitrary mask of scales (intel#2822)

* Use one AllReduceParameter for multi-optim method  training (intel#2814)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* change random seed in UT

* [New feature] add transformer layer (intel#2825)

* add transformer

* refactor class name

* use same embedding for translation

* fix pr comments

* [Bug Fix] Fix Issue 2734 (#2816)

* fix issue 2734

* fix issue 2734

* fix issue 2734

* [Refactor] Reflection Utilization (#2831)

* refactor reflection utils

* refactor reflection utils

* feat: MKLDNN LSTM unidirectional/bidirectional inference support (intel#2806)

* LSTM draft

* MKLDNN LSTM fixed MD

* added hiddenSize

* setMemoryData NativeData

* weights NativeData format set to ldigo, all 1 test passed

* fixed format any problem

* LSTM weights bias initialisation

* add LSTM2 in nn

* Bidirectional LSTM inference enabled

* modified Bidirectional test

* LSTMSpec input format conversion bug between bigdl and mkldnn fixed, not support random weights, bias

* fixed the last problem 1 3 2 4

* Three inference tests with randomly generated parameters

* Added comments and modified the LSTMSpec (tests using Equivalent.nearequals)

* Deleted nn/LSTM2. Renamed methods. Added a requirement in nn/TimeDistributed

* combined initMemoryDescs() into initFwdPrimitives()

* Add require for input size and hidden size matching if layers of LSTM is more than one

* Refactor RNN

* Add comment on gate order to mkldnn/RNN

* Add unidirectional multilayer test

* add comments/ modify UTs

* phase is not used anymore/ use isTraining() in stead

* operationWant enhanced/ weight init/ release() parameters()

* remove input format check and change some variables names

* input format check / throw exception print info / release code

* comment style and RNNSerialTest

* remove unnecessary comments

* Softmax -> SoftMax (#2837)

* bug fix for cmul (intel#2836)

* bug fix for cmul

* meet pr comments

* set new storage to weight and bias for weight fusion (intel#2839)

* Add parameter processor for LARS (#2832)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* implement lars whole layer gradient norm calculation

* change random seed in UT

* add limitation on "trust" of LARS, remove debug output

* reformat

* add tests in DirtriOptimizer for LARS

* reformat

* update parameters in UT

* update parameters in UT

* Add transformer to LM example (intel#2835)

* add transformer to LM example

* refactor dropout in Transformer

* meet pr comments

* feat: MKLDNN LSTM unidirectional/bidirectional backward support (#2840)

* MKLDNN LSTM backward support with accuracy testing

* fix: require consistent between shape and layout of mkldnn (intel#2824)

* fix: fusion for multi-group of convolution (intel#2826)

* fix: support int8 of jointable (#2827)

* fix: support int8 of jointable
* doc: add more docs

* fix: invokeAndWait2 should throw the exception in the tasks (intel#2843)

* fix acc bug & init dnn thread (intel#2841)

* support tnc and ntc conversion (intel#2844)

* support ntc in dnn layer (intel#2847)

* support ntc in dnn layer

* meet pr comments

* [WIP]Add beam search feature in transformer model (intel#2834)

* add beam search feature

* Update beam search feature and unit test

* add symbolToLogits function set check

* update clearState and add serial test

* add SequenceBeamSearch to python layers

* add createSequenceBeamSearch method to python api

* feat: add a property to disable omp thread affinity (intel#2849)

* fix: use treeset to calc topk to upgrade the performance of DetectionOutputSSD (intel#2853)

* fix: wrong affinity settings (intel#2857)

* update beam search feature for interface with transformer model (#2855)

* update beam search for padding value and cache structure

* update python API for beam search

* add comments and update python layer

* modify comments format

* modify comments format

* Support converting blas lstm to dnn lstm (#2846)

* convert from blas lstm to dnn lstm

* meet pr comments

* fix load lstm error bug (intel#2858)

* Add beam search in transformer (intel#2856)

* Add beam search in transformer

* meet pr comments

* fix: upgrade the performance of normalize (intel#2854)

* feat: add axis to softmax (intel#2859)

* add release doc for 0.9 (intel#2862)

* fix: update core ref to master (intel#2865)

* flip version to 0.10.0 (intel#2869)

* [Bug Fix] - Fix module version comparison  (intel#2871)

* update serialization

* update serialization

* convert IRgraph momentum to mkldnn (intel#2872)

* tutorial fix (intel#2879)

* feat: RoiAlign Forward (intel#2874)

* Add set input output format API in Python (intel#2880)

* add set input output format

* add static graph check

* feat: Feature Pyramid Networks Forward (intel#2870)

* fix memory leak for ir graph training (intel#2895)

* add gemm layer (#2882)

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add gemm layer

* add gemm layer

* add Shape layer (intel#2885)

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add Gather layer (intel#2897)

* add gather layer

* [New feature] Add maskhead (intel#2892)

* support for maskhead

* fix unit tests (intel#2905)

* modify  predict/predictClass function  (#2868)

* predictClass output modification

* predict/predictClass function modification in Beta Api

* predict/predictClass function modification

* predict/predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* [New feature] Add Boxhead (intel#2894)

* add boxhead

* add SerialTest

* meet pr comments

* fix: Add TopBlocks to Feature Pyramid Networks (FPN) (#2899)

* Add Mean Average Precision validation method (intel#2906)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* fix boxhead unit tests (#2912)

* python api nested list input and pooler python api (intel#2900)

* Auto memory management for MKLDNN (#2867)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* style fixes

* change _implicitMemoryOwner -> _this

* [New feature] Add region proposal (intel#2896)

* add Regionproposal

* [New feature] add maskrcnn (#2908)

* add maskrcnn

* fix mask head

* move maskrcnn to models

* add maskrcnn serialTest

* Add Onnx Supported Layers (intel#2902)

* remove duplicated layers

* Update RoiLabel class and add RoiImageFeatureToBatch (intel#2913)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* update RoiLabel, add RoiImageFeatureToBatch

* fix typo in class name

* updates by suggestions

* minor updates

* Move RoiMiniBatch to MTImageFeatureToBatch.scala

* mask in RoiLabel now have Floats not Bytes

* use IndexedSeq for RoiLabel

* style fix

* add isCrowd and origSize to final target table

* style fix

* isCrowd change to float, add doc

* add tests and bug fixes

* add util getting RoiLabels from ImageFeatures

* add util getting RoiLabels from Table

* comment out the tests

* rename utils in RoiLabel

* feat: MKLDNN GRU forward/backward support (#2893)

* Onnx support: modify unsqueeze function (#2910)

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* add maskutils (intel#2921)

* add maskutils

* update tests & docs

* fix typo in document

* Fix memory leaks on training (intel#2914)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* fix memory leak in batch norm

* style fixes

* change _implicitMemoryOwner -> _this

* release submat

* release opencv submats

* support samples with different size  to one mini batch (intel#2929)

* add to batch with resize

* meet comments

* support batch for mask head and pooler (intel#2926)

* support batch for mask head

* meet comments

* Onnx support: add a dim parameter to ops.Gather (intel#2920)

* add dim parameter to ops.Gather

* improve and simplify code

* improve and simplify code

* improve and simplify code

* improve and simplify code

* support batch for regionproposal (#2928)

* support batch for regionproposal

* enable gru blas-to-dnn conversion (intel#2930)

* Onnx support: add pos parameter to softmax (intel#2933)

* add pos parameter to softmax

* add pos parameter to softmax

* add pos parameter to softmax

* fix review problem

* fix review problem

* Add resize for segmentation (intel#2923)

* add resize for segmentation

* meet pr comments

* support batch input for boxhead (#2924)

* boxhead support batch input

* meet pr comments

* COCO SeqFile (intel#2927)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* ignore non-existing images

* updates based on GH comments

* ONNX Support (#2918)

* onnx dev

* add onnx loader

* clean up

* feat: add precision recall auc (#2941)

* feat: add precision recall auc

* add post processing for maskrcnn model (#2931)

* add mask postprocessing

* put image info to mask model

* fix TimeDistributedCriterion() lack of parameter of dimension issue (intel#2940)

* revert back api (intel#2943)

* fix: softmax and bn+scale fusion (intel#2937)

* feat: multi models support with MKL-DNN backend (intel#2936)

* feat: multi models support with MKL-DNN backend

* add COCO MAP (#2935)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* add COCO MAP

* revert merge conflict

* ignore non-existing images

* add IOU related API. MAP now parses RLEs

* BBox now inclusive

* updates based on GH comments

* add COCODataset.getImageById

* COCO topK default => -1, remove height: Int, width: Int in GroundTruthRLE

* update imageId2Image

* rename MAPObjectDetection utils, add cocoSegmentationAndBBox, refine formatting

* rename utils

* update documents

* check size of bbox & classes & scores & labels & iscrowd. Handle empty predictions

* add gt and target image size checking, add support for empty target bbox, add UT

* detection sorted before matching with GT. Optimize MAPResult merging. Add UT for merging

* COCO Seq file reader: grey to bgr (intel#2942)

* grey to bgr

* refactor isGrayScaleImage

* simplify grey scale image checking

* Add the flushing denormal values option on BigDL side (#2934)

* add no argument apply api for softmax (intel#2945)

* add no argument apply api for softmax

* add no argument apply api for softmax

* ONNX ResNet example (intel#2939)

* add onnx resnet example

* add doc for onnx

* add doc for onnx

* clean up

* add maskrcnn inference example (intel#2944)

* add maskrcnn inference example

* meet pr comments

* add model download url

* Update the RoiLabel and MTImageFeatureToBatch (intel#2925)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* Python MKLDNN examples for CNN(LeNet) and RNN(LSTM) (#2932)

* fix: takeSample only works for dnn backend and get one batch (intel#2947)

* fix: takeSample only works for dnn backend and get one batch

* edit doc (#2948)

* Rename filesToRoiImageFrame to filesToRoiImageFeatures (intel#2949)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* filesToRoiImageFrame -> filesToRoiImageFeatures, to public

* fix: move out setMklThreads of MklDnn (intel#2950)

* memory data cleanup (#2956)

* memory data cleanup

* Onnx support: RoiAlign and TopK parameter update (#2957)

* Topk add dim and increase parameter

* RoiAlign add max pooling mode

* add test cases

* add test cases

* remove masks requirements (intel#2959)

* fix: the squeeze should not be included in IRElement (intel#2962)

* enhance COCODataset (#2954)

* enhance COCODataset:
Add COCODataset.loadFromSeqFile
Add COCODataset.toImageFeatures
Add COCOImage.toTable

* rename and polish doc

* fix COCO serialize bug

* fix typo in function name

* typo fix (intel#2965)

* rename RoiImageFeatureToBatch APIs (#2964)

* RoiMiniBatch enhancement (#2953)

* SerializableIndexedSeq

* allow empty target & image size info

* rename RoiImageFeatureToBatch APIs

* set as private

* change back to array

* MTImageFeatureToBatch without labels

* handle iscrowd

* remove duplication in merge

* feat: add softmax backward (intel#2967)

* feat: add softmax backward

* fix: fuse bn scale and relu to bn. (intel#2966)

* fix: fuse bn scale and relu.

* fix mask unit tests (intel#2973)

* fix: nms stability when using treeset. (intel#2972)

* flip version to 0.11 (intel#2974)

* refactor anchor generator (#2963)

* refactor anchor generator

* meet pr comments

* fix code style

* ROIAlign refactor (intel#2960)

* ROIAlign refactor

* fix unit tests

* fix model load of maskrcnn (intel#2961)

* fix maskrcnn model load

* delete temp file

* fix maskrcnn tests

* support roialign backward (intel#2975)

* support roialign backward

* fix sparselinear unit test

* fix: bn nhwc error, the channel should be the last dim (#2981)

* refactor: move torch relevants unit tests to integration tests. (intel#2971)

* fix: enable integration accuracy tests (intel#2976)

* fix: softmax dnn backend wrong order of primitive (intel#2986)

* modify TextClassifier.scala (#2987)

* Add a method to merge nested StaticGraphs (intel#2985)

* NHWC support when running with MKL-DNN (#2989)

* support NHWC for MKLDNN

* fix unit tests

* Keras with MKL-DNN backend support (#2990)

* Update README.md

* Update README.md

* feat: add distri optimizer v2 (intel#2992)

* update error message in AllReduceParameter (#2997)

* update error message in AllReduceParameter

* use tensorflow proto jar (#2994)

* fix callBigDLFunc (intel#3002)

* Remove final for AbstractModule (intel#3001)

* DistriOptimizerV2 argument (intel#3003)

* call DistriOptimizerV2

* fix inception (intel#3010)

* fix top1 and treenn (intel#3011)

* remove final setExtraParameters (#3014)

* move pretrain in DistriOptimizerV2 (intel#3016)

* move getData

* rename

* remove time counting

* deprecate dlframe (intel#3012)

* deprecate dlframe

* fix throughput (#3017)

* fix throughput

* update

* add release doc for 0.10.0 (intel#3020)

* test examples by distrioptimizerv2 (intel#3007)

* enable scala examples by distrioptimizerv2

* update example's readme

* update integration test

* test python examples by distriOptimizerV2 (intel#3008)

* Test python examples by distriOptimizerV2

* deprecate nn.keras (intel#3013)

* deprecate nn.keras

* fix loss when minibatch size is different (intel#3021)

* fix loss

* fix ut

* fix style check (intel#3022)

* specify pyspark version (intel#3030)

* specify pyspark version

* add release doc for 0.11 (#3026)

* flip version to 0.12 (intel#3029)



* update

* fix KerasLayer new parameters() (#3034)

* Fix analytics zoo protobuf shading problem (intel#3033)

* change shade name and remove protobuf-java (already introduced by tf)

* remove protobuf

* add required dependencies (#3047)

* update doc (intel#3056)

* Updatedoc (#3060)

* Update install-from-pip.md

* [WIP] spark 3.0 (intel#3054)

* spark 3.0

* add spark3.0 deployment (intel#3061)

* add spark3.0 deployment

* add warning to remind Optimizer() deprecates (intel#3062)

* add warning to remind deprecates

* Update scala maven plugin (#3068)

* update scala maven plugin

* change to public (#3064)

* Add big model support (#3067)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* squeeze target dimension (corner case) in ClassNLLCriterion (intel#3072)

* fix target dimension match error

* update message (#3073)

* flip version to 0.13-snapshot (intel#3074)

* flip version to 0.13-snapshot

* Uncompressed Tensor  (intel#3079)

* support no compressing parameter

* address comments

* hotfix ClassNLLCriterion with cloned target (#3081)

* hotfix ClassNLLCriterion with cloned target

* Fix SerializationUtils clone issue of QuantizedTensor (intel#3088)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* update clone quantizedtensor

* update

* add OptimPredictorShutdownSpec UT in integration test (#3089)

* move integration UT to a general test script (intel#3094)

* back port master (intel#3096)

* set seed to avoid random error in PredictionServiceUT (intel#3097)

* Jdk11 support (intel#3098)

* update for jdk 11 support and doc

* add serializeUid (intel#3099)

* update doc (intel#3104)

* add doc for running in ide (intel#3106)

* fix callBigDLFunc return a Int while the true return value from java is a byte array. (intel#3111)

* add list of df support (intel#3113)

* Update readme (intel#3118)

* Update index.md

* add 0.12.2 release download (#3122)

* remove DLFrames (intel#3124)

* remove DLFrames

* update

* update

* update

* rm dlframe example from test script

* Add Utest about dividing zero (#3128)

* Add Utest about dividing zero

* add Utest and zero check of LocalData

* add Utest and zero check of LocalData

* change

* Add Utest about dividing zero

* fix test

* add python3 to Dockerfile (intel#3132)

* add python3 to Dockerfile

* update

* update jdk

* update

* make default DistriOptimizer as V2 (intel#3129)

* make default DistriOptimizer as V2

* update

* fix dlframe (intel#3133)

* DistriOptimizerV2 logger (intel#3135)

* DistriOptimizerV2 logger

* update

* fix style check

* validate epoch num

* move dlframe SharedParamsApater to AZ and roll back to OptimizerV1 (intel#3137)

* upgrade spark version (intel#3138)

* Update deploy-spark2.sh

* 0.13 release doc (#3144)

* upgrade log4j (intel#3141)

* flip0.14 (intel#3142)

* flip0.14

* update

* Update deploy-spark3.sh (#3145)

* update

* update

* update

* update

* fix make dist

* migrate path

* update

* update

Co-authored-by: zhangxiaoli73 <[email protected]>
Co-authored-by: Jerry Wu <[email protected]>
Co-authored-by: Xin Qiu <[email protected]>
Co-authored-by: Yanzhang Wang <[email protected]>
Co-authored-by: GenBrg <[email protected]>
Co-authored-by: LeicongLi <[email protected]>
Co-authored-by: Emiliano Martinez <[email protected]>
Co-authored-by: abdolence <[email protected]>
Co-authored-by: Enrique Garcia <[email protected]>
Co-authored-by: Louie Tsai <[email protected]>
Co-authored-by: yaochi <[email protected]>
Co-authored-by: Menooker <[email protected]>
Co-authored-by: Menooker <[email protected]>
Co-authored-by: Firecrackerxox <[email protected]>
Co-authored-by: majing921201 <[email protected]>
Co-authored-by: jenniew <[email protected]>
Co-authored-by: Xiao <[email protected]>
Co-authored-by: Firecrackerxox <[email protected]>
Co-authored-by: Hui Li <[email protected]>
Co-authored-by: Jason Dai <[email protected]>
Co-authored-by: dding3 <[email protected]>
Co-authored-by: Yang Wang <[email protected]>
Co-authored-by: Yina Chen <[email protected]>
Co-authored-by: Hangrui Cao <[email protected]>
Co-authored-by: pinggao18 <[email protected]>
Le-Zheng added a commit to Le-Zheng/analytics-zoo that referenced this pull request Sep 14, 2021
* convert static graph to IR graph and build (intel#2711)

* add static graph to IR graph

* meet pr comments

* [Enhancement] - Enhance unig test to avoid dynamic resource allocation issue by docker (intel#2713)

* make the core number fixed

* fix local predictor

* add Trigger and/or python API (intel#2682)

* add spark 2.4 support (intel#2715)

* update sparse tensor's document (#2714)

* Reserve all state in OptimMethod when calling Optimizer.optimize() multiple times (#2648)

* reserve optimMethod for each worker

* add valdiation throughput

* cache variable previousOptim

* fix: move mkldnn computing to a single thread pool (intel#2724)

Because if we use the parent thread directly, there will be two bugs,
1. The child threads forked from parent thread will be bound to core 0
because of the affinity settings.
2. The native thread has some unknown thread local variables. So if
the parent thread exits and is recreated, such as the thread from
Executors.newFixedThreadPool. The whole app will be segment fault.
The parent thread means the main thread (Local Mode) or worker thread of
mapPartition (Distributed Mode).

* add ceilMode for Pooling & fix batchNorm evaluate (#2708)

* add ceilMode for Pooling & fix batchNorm evaluate

* add training status for dnn layer

* fix comments

* fix IRGraph init & Add regualizer (#2736)

* fix IRGraph init & Add regualizer

* meet review comments

* fix: update mkldnn version to v0.17 issues. (intel#2712)

There're two issues,

1. the padding tensor required. mkl-dnn will use a padding tensor which
    will use more memory, such as 4x1x28x28 to 4x8x28x28(avx2). It will
    pad to times of simd width.
2. the TensorMMap between DenseTensor and DnnTensor. Previous impl
    will allocate DnnTensor when model is created, which will cost too much
    space. So this patch will allocate it at runtime.

* add computshape for some layers and add skip primitives in DnnGraph (intel#2740)

* add computshape for some layer and add skip primitives in DnnGraph

* meet pr comments

* Improve documentation (intel#2745)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* include edge case to cover all the data types (#2742)

* layer auto fusion for dnn graph (intel#2746)

* add auto fusion in dnn graph

* refactor predict for dnn model (intel#2737)

* refactor predict for dnn model

* remove some unit tests (intel#2752)

* remove some conflict tests (#2753)

* Update documentation (intel#2749)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Fix Add operation error when type is Double importing Tensorflow graph (#2721)

* feature: add byte supports for DnnTensor (intel#2751)

* feat: add byte supports for DnnTensor

* [New Feature] Calculating Scales (#2750)

* [New Feature]Calculating Scales

* recursively update mask for container module (intel#2754)

* recursively update mask for container module

* [Enhancement] - Speed up BlasWrapper performance under MKL-DNN (intel#2748)

* add parallel in Blaswrapper

* refactor to support ssd

* meet pr comments

* fix logger serialize

* Loss Function docs improvement (intel#2757)

* Improve Loss Function docs v2

* change asInstanceOf to toDistirbuted in optimizer (#2755)

* change asInstanceOf to toDistirbuted

* change asInstanceOf to toDistirbuted

* convert scale in blas to dnn (#2758)

* convert scale in blas to dnn

* meet pr comment

* feat: reorder for int8 supports (#2756)

1. Because the new data type, we should add a new attribute called dataType
    to the `MemoryData`.
2. Because we should transfer the scales between FP32->int8 and Int8->FP32.
    we should add two new attributes called `mask` and `scales`.

* fix conversion accuracy (intel#2760)

*  fix accuracy for saved model

* exclude mkldnn model when conversion

* feature: layer wise supports of int8 (intel#2762)

Enable the int8 data type in layers, especially for convolutions.
So for a specific layer, it can accept a int8 input. If you want to the fp32
output, should add a reorder.

* feature: mkldnn int8 layer wise supports (intel#2759)

including 3 steps.

1. generate scales of model.
   need an api like `generateScalesWithMask` to generate the scales of
   fp32 model. and the model returned is an fp32 model too.
2. quantize the model
   the `quantize()` api will be compatible with the `bigquant`
   backend, which will set the quantize flag. And when doing compile,
   the quantized weight, output, input will be generated by mkldnn at
   runtime.
3. do the inference (forward).

* update readme for v1 training (intel#2763)

* update release doc for preparation (intel#2764)

* change some docs about mkldnn (intel#2765)

* add comments about mkldnn

* meet pr comments

* examples for int8 (intel#2761)

This is an example of how to use mkldnn int8. There're two steps, use
GenInt8Scales to generate the scales first and save the new model. And than you
can use the quantized model as usual.

* enable fustion by default (intel#2766)

* fix: the influence of default value of fusion (#2768)

* fix: use too much memory of mkldnn models (intel#2783)

* fix: inplace of input/output and weight dimension error (intel#2779)

Some layer's input and output use the same memory. We can't do forward in the
`calcScales`. Because at that time, the input has been changed, its scales maybe
not right. Such as,

Seqeuntail().add(Conv).add(ReLU)

it will do two steps, seq.forward(input) first. and when go into the ReLU, it
will do another forward, so the input will be the output. And scales will be
wrong.

For convolution's weight, the dimension always is 5, although the group number
is 1. But for dnn convolution, if there's no group, the weight's dimension
should be 4.

* fix: the blas wrapper has no scales (intel#2778)

* fix softmax (intel#2777)

* fix: performance regression on resnet50 (intel#2774)

the u8 to s8 or s8 to u8 needs no reorder on this case.

* fix log init (#2781)

* fix: dropout should init primitive (#2789)

* Docs update for spark 2.3, build 0.7 and deps exlude (intel#2671)

* flip to 0.9.0 (intel#2792)

* Improve Layer documentation v1 (#2767)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Loss Function docs improvement v1

* Improve Loss Function docs v2

* Improve Layers documentation

* Improve documentation on Activations

* minor fix

* Update a code section with python style on Metrics.md (intel#2665)

* [Fix] doc : some changes for scalaUserGuide and release links according to … (intel#2791)

* doc : some changes for scalaUserGuide and release links according to v0.8.0 release

* Update build-bigdl-core.md

* Update build-bigdl-core.md

* test: should compare the right grad input (intel#2794)

* fix the wrong error message (#2800)

* [New feature] Add attention layer and ffn layer (intel#2795)

* add attention layer

* add ffn layer and more unit tests

* refactor according to pr comments

* add SerializationTest

* fix unit tests

* add python api

* update readme with newly adopted mkl-dnn (#2803)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer (intel#2802)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer:
Add LARS optimizer: Layer-wise scaled. Also with utility functions to build a set of LARS optim for a container.

Bug fix: The gradient block id of AllReduceParameter is originally composed of {id}{pidTo}gradientBytes{pidFrom}. But the combination of {id}{pidTo} will cause ambiguity. e.g., "112" can be {1}{12} or {11}{2}. Now a "_" is added to separate id from pidTo

* refine documents, correctly set the lrSchedulerOwner bit

* format the added code

* make Lars inherit SGD

* rename Lars -> LarsSGD and reformat

* style changes

* bugfix - set mask for container (intel#2807)

* bugfix - set mask for container

* bugfix #2805: set dimension mask

* Update Graph.scala

* Update Graph.scala

* change set mask indicator's name

* rename set mask params

* [Enhancement]: Scala Reflection: get default value for constructor parameters (intel#2808)

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* resolve conflict

* resolve conflict

* code style check

* remove print

* fix typos

fix typos

* replace randomcropper with centercrop for better performance (#2818)

* fix: memory data hash code should contain data type (intel#2821)

* Optimize backward graph generation and CAddTable (intel#2817)

* Optimize backward graph generation and caddtable

* refine add table

* change api name

* add layer norm and expand size layers (#2819)

* add layer norm and expand size

* meet pr comments

* feat: enable global average pooling (intel#2823)

* feat: enable global average pooling

* test: add more unit tests

* Optimizers: use member variable in parent class

* Revert "Optimizers: use member variable in parent class"

This reverts commit 7e47204

* Dilation in MKL-DNN Convolution (intel#2815)

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* fix typos

fix typos

* make todo all uppercase

* fix: calculate arbitrary mask of scales (intel#2822)

* Use one AllReduceParameter for multi-optim method  training (intel#2814)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* change random seed in UT

* [New feature] add transformer layer (intel#2825)

* add transformer

* refactor class name

* use same embedding for translation

* fix pr comments

* [Bug Fix] Fix Issue 2734 (#2816)

* fix issue 2734

* fix issue 2734

* fix issue 2734

* [Refactor] Reflection Utilization (#2831)

* refactor reflection utils

* refactor reflection utils

* feat: MKLDNN LSTM unidirectional/bidirectional inference support (intel#2806)

* LSTM draft

* MKLDNN LSTM fixed MD

* added hiddenSize

* setMemoryData NativeData

* weights NativeData format set to ldigo, all 1 test passed

* fixed format any problem

* LSTM weights bias initialisation

* add LSTM2 in nn

* Bidirectional LSTM inference enabled

* modified Bidirectional test

* LSTMSpec input format conversion bug between bigdl and mkldnn fixed, not support random weights, bias

* fixed the last problem 1 3 2 4

* Three inference tests with randomly generated parameters

* Added comments and modified the LSTMSpec (tests using Equivalent.nearequals)

* Deleted nn/LSTM2. Renamed methods. Added a requirement in nn/TimeDistributed

* combined initMemoryDescs() into initFwdPrimitives()

* Add require for input size and hidden size matching if layers of LSTM is more than one

* Refactor RNN

* Add comment on gate order to mkldnn/RNN

* Add unidirectional multilayer test

* add comments/ modify UTs

* phase is not used anymore/ use isTraining() in stead

* operationWant enhanced/ weight init/ release() parameters()

* remove input format check and change some variables names

* input format check / throw exception print info / release code

* comment style and RNNSerialTest

* remove unnecessary comments

* Softmax -> SoftMax (#2837)

* bug fix for cmul (intel#2836)

* bug fix for cmul

* meet pr comments

* set new storage to weight and bias for weight fusion (intel#2839)

* Add parameter processor for LARS (#2832)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* implement lars whole layer gradient norm calculation

* change random seed in UT

* add limitation on "trust" of LARS, remove debug output

* reformat

* add tests in DirtriOptimizer for LARS

* reformat

* update parameters in UT

* update parameters in UT

* Add transformer to LM example (intel#2835)

* add transformer to LM example

* refactor dropout in Transformer

* meet pr comments

* feat: MKLDNN LSTM unidirectional/bidirectional backward support (#2840)

* MKLDNN LSTM backward support with accuracy testing

* fix: require consistent between shape and layout of mkldnn (intel#2824)

* fix: fusion for multi-group of convolution (intel#2826)

* fix: support int8 of jointable (#2827)

* fix: support int8 of jointable
* doc: add more docs

* fix: invokeAndWait2 should throw the exception in the tasks (intel#2843)

* fix acc bug & init dnn thread (intel#2841)

* support tnc and ntc conversion (intel#2844)

* support ntc in dnn layer (intel#2847)

* support ntc in dnn layer

* meet pr comments

* [WIP]Add beam search feature in transformer model (intel#2834)

* add beam search feature

* Update beam search feature and unit test

* add symbolToLogits function set check

* update clearState and add serial test

* add SequenceBeamSearch to python layers

* add createSequenceBeamSearch method to python api

* feat: add a property to disable omp thread affinity (intel#2849)

* fix: use treeset to calc topk to upgrade the performance of DetectionOutputSSD (intel#2853)

* fix: wrong affinity settings (intel#2857)

* update beam search feature for interface with transformer model (#2855)

* update beam search for padding value and cache structure

* update python API for beam search

* add comments and update python layer

* modify comments format

* modify comments format

* Support converting blas lstm to dnn lstm (#2846)

* convert from blas lstm to dnn lstm

* meet pr comments

* fix load lstm error bug (intel#2858)

* Add beam search in transformer (intel#2856)

* Add beam search in transformer

* meet pr comments

* fix: upgrade the performance of normalize (intel#2854)

* feat: add axis to softmax (intel#2859)

* add release doc for 0.9 (intel#2862)

* fix: update core ref to master (intel#2865)

* flip version to 0.10.0 (intel#2869)

* [Bug Fix] - Fix module version comparison  (intel#2871)

* update serialization

* update serialization

* convert IRgraph momentum to mkldnn (intel#2872)

* tutorial fix (intel#2879)

* feat: RoiAlign Forward (intel#2874)

* Add set input output format API in Python (intel#2880)

* add set input output format

* add static graph check

* feat: Feature Pyramid Networks Forward (intel#2870)

* fix memory leak for ir graph training (intel#2895)

* add gemm layer (#2882)

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add gemm layer

* add gemm layer

* add Shape layer (intel#2885)

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add Gather layer (intel#2897)

* add gather layer

* [New feature] Add maskhead (intel#2892)

* support for maskhead

* fix unit tests (intel#2905)

* modify  predict/predictClass function  (#2868)

* predictClass output modification

* predict/predictClass function modification in Beta Api

* predict/predictClass function modification

* predict/predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* [New feature] Add Boxhead (intel#2894)

* add boxhead

* add SerialTest

* meet pr comments

* fix: Add TopBlocks to Feature Pyramid Networks (FPN) (#2899)

* Add Mean Average Precision validation method (intel#2906)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* fix boxhead unit tests (#2912)

* python api nested list input and pooler python api (intel#2900)

* Auto memory management for MKLDNN (#2867)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* style fixes

* change _implicitMemoryOwner -> _this

* [New feature] Add region proposal (intel#2896)

* add Regionproposal

* [New feature] add maskrcnn (#2908)

* add maskrcnn

* fix mask head

* move maskrcnn to models

* add maskrcnn serialTest

* Add Onnx Supported Layers (intel#2902)

* remove duplicated layers

* Update RoiLabel class and add RoiImageFeatureToBatch (intel#2913)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* update RoiLabel, add RoiImageFeatureToBatch

* fix typo in class name

* updates by suggestions

* minor updates

* Move RoiMiniBatch to MTImageFeatureToBatch.scala

* mask in RoiLabel now have Floats not Bytes

* use IndexedSeq for RoiLabel

* style fix

* add isCrowd and origSize to final target table

* style fix

* isCrowd change to float, add doc

* add tests and bug fixes

* add util getting RoiLabels from ImageFeatures

* add util getting RoiLabels from Table

* comment out the tests

* rename utils in RoiLabel

* feat: MKLDNN GRU forward/backward support (#2893)

* Onnx support: modify unsqueeze function (#2910)

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* add maskutils (intel#2921)

* add maskutils

* update tests & docs

* fix typo in document

* Fix memory leaks on training (intel#2914)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* fix memory leak in batch norm

* style fixes

* change _implicitMemoryOwner -> _this

* release submat

* release opencv submats

* support samples with different size  to one mini batch (intel#2929)

* add to batch with resize

* meet comments

* support batch for mask head and pooler (intel#2926)

* support batch for mask head

* meet comments

* Onnx support: add a dim parameter to ops.Gather (intel#2920)

* add dim parameter to ops.Gather

* improve and simplify code

* improve and simplify code

* improve and simplify code

* improve and simplify code

* support batch for regionproposal (#2928)

* support batch for regionproposal

* enable gru blas-to-dnn conversion (intel#2930)

* Onnx support: add pos parameter to softmax (intel#2933)

* add pos parameter to softmax

* add pos parameter to softmax

* add pos parameter to softmax

* fix review problem

* fix review problem

* Add resize for segmentation (intel#2923)

* add resize for segmentation

* meet pr comments

* support batch input for boxhead (#2924)

* boxhead support batch input

* meet pr comments

* COCO SeqFile (intel#2927)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* ignore non-existing images

* updates based on GH comments

* ONNX Support (#2918)

* onnx dev

* add onnx loader

* clean up

* feat: add precision recall auc (#2941)

* feat: add precision recall auc

* add post processing for maskrcnn model (#2931)

* add mask postprocessing

* put image info to mask model

* fix TimeDistributedCriterion() lack of parameter of dimension issue (intel#2940)

* revert back api (intel#2943)

* fix: softmax and bn+scale fusion (intel#2937)

* feat: multi models support with MKL-DNN backend (intel#2936)

* feat: multi models support with MKL-DNN backend

* add COCO MAP (#2935)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* add COCO MAP

* revert merge conflict

* ignore non-existing images

* add IOU related API. MAP now parses RLEs

* BBox now inclusive

* updates based on GH comments

* add COCODataset.getImageById

* COCO topK default => -1, remove height: Int, width: Int in GroundTruthRLE

* update imageId2Image

* rename MAPObjectDetection utils, add cocoSegmentationAndBBox, refine formatting

* rename utils

* update documents

* check size of bbox & classes & scores & labels & iscrowd. Handle empty predictions

* add gt and target image size checking, add support for empty target bbox, add UT

* detection sorted before matching with GT. Optimize MAPResult merging. Add UT for merging

* COCO Seq file reader: grey to bgr (intel#2942)

* grey to bgr

* refactor isGrayScaleImage

* simplify grey scale image checking

* Add the flushing denormal values option on BigDL side (#2934)

* add no argument apply api for softmax (intel#2945)

* add no argument apply api for softmax

* add no argument apply api for softmax

* ONNX ResNet example (intel#2939)

* add onnx resnet example

* add doc for onnx

* add doc for onnx

* clean up

* add maskrcnn inference example (intel#2944)

* add maskrcnn inference example

* meet pr comments

* add model download url

* Update the RoiLabel and MTImageFeatureToBatch (intel#2925)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* Python MKLDNN examples for CNN(LeNet) and RNN(LSTM) (#2932)

* fix: takeSample only works for dnn backend and get one batch (intel#2947)

* fix: takeSample only works for dnn backend and get one batch

* edit doc (#2948)

* Rename filesToRoiImageFrame to filesToRoiImageFeatures (intel#2949)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* filesToRoiImageFrame -> filesToRoiImageFeatures, to public

* fix: move out setMklThreads of MklDnn (intel#2950)

* memory data cleanup (#2956)

* memory data cleanup

* Onnx support: RoiAlign and TopK parameter update (#2957)

* Topk add dim and increase parameter

* RoiAlign add max pooling mode

* add test cases

* add test cases

* remove masks requirements (intel#2959)

* fix: the squeeze should not be included in IRElement (intel#2962)

* enhance COCODataset (#2954)

* enhance COCODataset:
Add COCODataset.loadFromSeqFile
Add COCODataset.toImageFeatures
Add COCOImage.toTable

* rename and polish doc

* fix COCO serialize bug

* fix typo in function name

* typo fix (intel#2965)

* rename RoiImageFeatureToBatch APIs (#2964)

* RoiMiniBatch enhancement (#2953)

* SerializableIndexedSeq

* allow empty target & image size info

* rename RoiImageFeatureToBatch APIs

* set as private

* change back to array

* MTImageFeatureToBatch without labels

* handle iscrowd

* remove duplication in merge

* feat: add softmax backward (intel#2967)

* feat: add softmax backward

* fix: fuse bn scale and relu to bn. (intel#2966)

* fix: fuse bn scale and relu.

* fix mask unit tests (intel#2973)

* fix: nms stability when using treeset. (intel#2972)

* flip version to 0.11 (intel#2974)

* refactor anchor generator (#2963)

* refactor anchor generator

* meet pr comments

* fix code style

* ROIAlign refactor (intel#2960)

* ROIAlign refactor

* fix unit tests

* fix model load of maskrcnn (intel#2961)

* fix maskrcnn model load

* delete temp file

* fix maskrcnn tests

* support roialign backward (intel#2975)

* support roialign backward

* fix sparselinear unit test

* fix: bn nhwc error, the channel should be the last dim (#2981)

* refactor: move torch relevants unit tests to integration tests. (intel#2971)

* fix: enable integration accuracy tests (intel#2976)

* fix: softmax dnn backend wrong order of primitive (intel#2986)

* modify TextClassifier.scala (#2987)

* Add a method to merge nested StaticGraphs (intel#2985)

* NHWC support when running with MKL-DNN (#2989)

* support NHWC for MKLDNN

* fix unit tests

* Keras with MKL-DNN backend support (#2990)

* Update README.md

* Update README.md

* feat: add distri optimizer v2 (intel#2992)

* update error message in AllReduceParameter (#2997)

* update error message in AllReduceParameter

* use tensorflow proto jar (#2994)

* fix callBigDLFunc (intel#3002)

* Remove final for AbstractModule (intel#3001)

* DistriOptimizerV2 argument (intel#3003)

* call DistriOptimizerV2

* fix inception (intel#3010)

* fix top1 and treenn (intel#3011)

* remove final setExtraParameters (#3014)

* move pretrain in DistriOptimizerV2 (intel#3016)

* move getData

* rename

* remove time counting

* deprecate dlframe (intel#3012)

* deprecate dlframe

* fix throughput (#3017)

* fix throughput

* update

* add release doc for 0.10.0 (intel#3020)

* test examples by distrioptimizerv2 (intel#3007)

* enable scala examples by distrioptimizerv2

* update example's readme

* update integration test

* test python examples by distriOptimizerV2 (intel#3008)

* Test python examples by distriOptimizerV2

* deprecate nn.keras (intel#3013)

* deprecate nn.keras

* fix loss when minibatch size is different (intel#3021)

* fix loss

* fix ut

* fix style check (intel#3022)

* specify pyspark version (intel#3030)

* specify pyspark version

* add release doc for 0.11 (#3026)

* flip version to 0.12 (intel#3029)



* update

* fix KerasLayer new parameters() (#3034)

* Fix analytics zoo protobuf shading problem (intel#3033)

* change shade name and remove protobuf-java (already introduced by tf)

* remove protobuf

* add required dependencies (#3047)

* update doc (intel#3056)

* Updatedoc (#3060)

* Update install-from-pip.md

* [WIP] spark 3.0 (intel#3054)

* spark 3.0

* add spark3.0 deployment (intel#3061)

* add spark3.0 deployment

* add warning to remind Optimizer() deprecates (intel#3062)

* add warning to remind deprecates

* Update scala maven plugin (#3068)

* update scala maven plugin

* change to public (#3064)

* Add big model support (#3067)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* squeeze target dimension (corner case) in ClassNLLCriterion (intel#3072)

* fix target dimension match error

* update message (#3073)

* flip version to 0.13-snapshot (intel#3074)

* flip version to 0.13-snapshot

* Uncompressed Tensor  (intel#3079)

* support no compressing parameter

* address comments

* hotfix ClassNLLCriterion with cloned target (#3081)

* hotfix ClassNLLCriterion with cloned target

* Fix SerializationUtils clone issue of QuantizedTensor (intel#3088)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* update clone quantizedtensor

* update

* add OptimPredictorShutdownSpec UT in integration test (#3089)

* move integration UT to a general test script (intel#3094)

* back port master (intel#3096)

* set seed to avoid random error in PredictionServiceUT (intel#3097)

* Jdk11 support (intel#3098)

* update for jdk 11 support and doc

* add serializeUid (intel#3099)

* update doc (intel#3104)

* add doc for running in ide (intel#3106)

* fix callBigDLFunc return a Int while the true return value from java is a byte array. (intel#3111)

* add list of df support (intel#3113)

* Update readme (intel#3118)

* Update index.md

* add 0.12.2 release download (#3122)

* remove DLFrames (intel#3124)

* remove DLFrames

* update

* update

* update

* rm dlframe example from test script

* Add Utest about dividing zero (#3128)

* Add Utest about dividing zero

* add Utest and zero check of LocalData

* add Utest and zero check of LocalData

* change

* Add Utest about dividing zero

* fix test

* add python3 to Dockerfile (intel#3132)

* add python3 to Dockerfile

* update

* update jdk

* update

* make default DistriOptimizer as V2 (intel#3129)

* make default DistriOptimizer as V2

* update

* fix dlframe (intel#3133)

* DistriOptimizerV2 logger (intel#3135)

* DistriOptimizerV2 logger

* update

* fix style check

* validate epoch num

* move dlframe SharedParamsApater to AZ and roll back to OptimizerV1 (intel#3137)

* upgrade spark version (intel#3138)

* Update deploy-spark2.sh

* 0.13 release doc (#3144)

* upgrade log4j (intel#3141)

* flip0.14 (intel#3142)

* flip0.14

* update

* Update deploy-spark3.sh (#3145)

* update

* update

* update

* update

* fix make dist

* migrate path

* update

* update

Co-authored-by: zhangxiaoli73 <[email protected]>
Co-authored-by: Jerry Wu <[email protected]>
Co-authored-by: Xin Qiu <[email protected]>
Co-authored-by: Yanzhang Wang <[email protected]>
Co-authored-by: GenBrg <[email protected]>
Co-authored-by: LeicongLi <[email protected]>
Co-authored-by: Emiliano Martinez <[email protected]>
Co-authored-by: abdolence <[email protected]>
Co-authored-by: Enrique Garcia <[email protected]>
Co-authored-by: Louie Tsai <[email protected]>
Co-authored-by: yaochi <[email protected]>
Co-authored-by: Menooker <[email protected]>
Co-authored-by: Menooker <[email protected]>
Co-authored-by: Firecrackerxox <[email protected]>
Co-authored-by: majing921201 <[email protected]>
Co-authored-by: jenniew <[email protected]>
Co-authored-by: Xiao <[email protected]>
Co-authored-by: Firecrackerxox <[email protected]>
Co-authored-by: Hui Li <[email protected]>
Co-authored-by: Jason Dai <[email protected]>
Co-authored-by: dding3 <[email protected]>
Co-authored-by: Yang Wang <[email protected]>
Co-authored-by: Yina Chen <[email protected]>
Co-authored-by: Hangrui Cao <[email protected]>
Co-authored-by: pinggao18 <[email protected]>
Le-Zheng added a commit to Le-Zheng/analytics-zoo that referenced this pull request Sep 14, 2021
* convert static graph to IR graph and build (intel#2711)

* add static graph to IR graph

* meet pr comments

* [Enhancement] - Enhance unig test to avoid dynamic resource allocation issue by docker (intel#2713)

* make the core number fixed

* fix local predictor

* add Trigger and/or python API (intel#2682)

* add spark 2.4 support (intel#2715)

* update sparse tensor's document (#2714)

* Reserve all state in OptimMethod when calling Optimizer.optimize() multiple times (#2648)

* reserve optimMethod for each worker

* add valdiation throughput

* cache variable previousOptim

* fix: move mkldnn computing to a single thread pool (intel#2724)

Because if we use the parent thread directly, there will be two bugs,
1. The child threads forked from parent thread will be bound to core 0
because of the affinity settings.
2. The native thread has some unknown thread local variables. So if
the parent thread exits and is recreated, such as the thread from
Executors.newFixedThreadPool. The whole app will be segment fault.
The parent thread means the main thread (Local Mode) or worker thread of
mapPartition (Distributed Mode).

* add ceilMode for Pooling & fix batchNorm evaluate (#2708)

* add ceilMode for Pooling & fix batchNorm evaluate

* add training status for dnn layer

* fix comments

* fix IRGraph init & Add regualizer (#2736)

* fix IRGraph init & Add regualizer

* meet review comments

* fix: update mkldnn version to v0.17 issues. (intel#2712)

There're two issues,

1. the padding tensor required. mkl-dnn will use a padding tensor which
    will use more memory, such as 4x1x28x28 to 4x8x28x28(avx2). It will
    pad to times of simd width.
2. the TensorMMap between DenseTensor and DnnTensor. Previous impl
    will allocate DnnTensor when model is created, which will cost too much
    space. So this patch will allocate it at runtime.

* add computshape for some layers and add skip primitives in DnnGraph (intel#2740)

* add computshape for some layer and add skip primitives in DnnGraph

* meet pr comments

* Improve documentation (intel#2745)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* include edge case to cover all the data types (#2742)

* layer auto fusion for dnn graph (intel#2746)

* add auto fusion in dnn graph

* refactor predict for dnn model (intel#2737)

* refactor predict for dnn model

* remove some unit tests (intel#2752)

* remove some conflict tests (#2753)

* Update documentation (intel#2749)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Fix Add operation error when type is Double importing Tensorflow graph (#2721)

* feature: add byte supports for DnnTensor (intel#2751)

* feat: add byte supports for DnnTensor

* [New Feature] Calculating Scales (#2750)

* [New Feature]Calculating Scales

* recursively update mask for container module (intel#2754)

* recursively update mask for container module

* [Enhancement] - Speed up BlasWrapper performance under MKL-DNN (intel#2748)

* add parallel in Blaswrapper

* refactor to support ssd

* meet pr comments

* fix logger serialize

* Loss Function docs improvement (intel#2757)

* Improve Loss Function docs v2

* change asInstanceOf to toDistirbuted in optimizer (#2755)

* change asInstanceOf to toDistirbuted

* change asInstanceOf to toDistirbuted

* convert scale in blas to dnn (#2758)

* convert scale in blas to dnn

* meet pr comment

* feat: reorder for int8 supports (#2756)

1. Because the new data type, we should add a new attribute called dataType
    to the `MemoryData`.
2. Because we should transfer the scales between FP32->int8 and Int8->FP32.
    we should add two new attributes called `mask` and `scales`.

* fix conversion accuracy (intel#2760)

*  fix accuracy for saved model

* exclude mkldnn model when conversion

* feature: layer wise supports of int8 (intel#2762)

Enable the int8 data type in layers, especially for convolutions.
So for a specific layer, it can accept a int8 input. If you want to the fp32
output, should add a reorder.

* feature: mkldnn int8 layer wise supports (intel#2759)

including 3 steps.

1. generate scales of model.
   need an api like `generateScalesWithMask` to generate the scales of
   fp32 model. and the model returned is an fp32 model too.
2. quantize the model
   the `quantize()` api will be compatible with the `bigquant`
   backend, which will set the quantize flag. And when doing compile,
   the quantized weight, output, input will be generated by mkldnn at
   runtime.
3. do the inference (forward).

* update readme for v1 training (intel#2763)

* update release doc for preparation (intel#2764)

* change some docs about mkldnn (intel#2765)

* add comments about mkldnn

* meet pr comments

* examples for int8 (intel#2761)

This is an example of how to use mkldnn int8. There're two steps, use
GenInt8Scales to generate the scales first and save the new model. And than you
can use the quantized model as usual.

* enable fustion by default (intel#2766)

* fix: the influence of default value of fusion (#2768)

* fix: use too much memory of mkldnn models (intel#2783)

* fix: inplace of input/output and weight dimension error (intel#2779)

Some layer's input and output use the same memory. We can't do forward in the
`calcScales`. Because at that time, the input has been changed, its scales maybe
not right. Such as,

Seqeuntail().add(Conv).add(ReLU)

it will do two steps, seq.forward(input) first. and when go into the ReLU, it
will do another forward, so the input will be the output. And scales will be
wrong.

For convolution's weight, the dimension always is 5, although the group number
is 1. But for dnn convolution, if there's no group, the weight's dimension
should be 4.

* fix: the blas wrapper has no scales (intel#2778)

* fix softmax (intel#2777)

* fix: performance regression on resnet50 (intel#2774)

the u8 to s8 or s8 to u8 needs no reorder on this case.

* fix log init (#2781)

* fix: dropout should init primitive (#2789)

* Docs update for spark 2.3, build 0.7 and deps exlude (intel#2671)

* flip to 0.9.0 (intel#2792)

* Improve Layer documentation v1 (#2767)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Loss Function docs improvement v1

* Improve Loss Function docs v2

* Improve Layers documentation

* Improve documentation on Activations

* minor fix

* Update a code section with python style on Metrics.md (intel#2665)

* [Fix] doc : some changes for scalaUserGuide and release links according to … (intel#2791)

* doc : some changes for scalaUserGuide and release links according to v0.8.0 release

* Update build-bigdl-core.md

* Update build-bigdl-core.md

* test: should compare the right grad input (intel#2794)

* fix the wrong error message (#2800)

* [New feature] Add attention layer and ffn layer (intel#2795)

* add attention layer

* add ffn layer and more unit tests

* refactor according to pr comments

* add SerializationTest

* fix unit tests

* add python api

* update readme with newly adopted mkl-dnn (#2803)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer (intel#2802)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer:
Add LARS optimizer: Layer-wise scaled. Also with utility functions to build a set of LARS optim for a container.

Bug fix: The gradient block id of AllReduceParameter is originally composed of {id}{pidTo}gradientBytes{pidFrom}. But the combination of {id}{pidTo} will cause ambiguity. e.g., "112" can be {1}{12} or {11}{2}. Now a "_" is added to separate id from pidTo

* refine documents, correctly set the lrSchedulerOwner bit

* format the added code

* make Lars inherit SGD

* rename Lars -> LarsSGD and reformat

* style changes

* bugfix - set mask for container (intel#2807)

* bugfix - set mask for container

* bugfix #2805: set dimension mask

* Update Graph.scala

* Update Graph.scala

* change set mask indicator's name

* rename set mask params

* [Enhancement]: Scala Reflection: get default value for constructor parameters (intel#2808)

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* resolve conflict

* resolve conflict

* code style check

* remove print

* fix typos

fix typos

* replace randomcropper with centercrop for better performance (#2818)

* fix: memory data hash code should contain data type (intel#2821)

* Optimize backward graph generation and CAddTable (intel#2817)

* Optimize backward graph generation and caddtable

* refine add table

* change api name

* add layer norm and expand size layers (#2819)

* add layer norm and expand size

* meet pr comments

* feat: enable global average pooling (intel#2823)

* feat: enable global average pooling

* test: add more unit tests

* Optimizers: use member variable in parent class

* Revert "Optimizers: use member variable in parent class"

This reverts commit 7e47204

* Dilation in MKL-DNN Convolution (intel#2815)

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* fix typos

fix typos

* make todo all uppercase

* fix: calculate arbitrary mask of scales (intel#2822)

* Use one AllReduceParameter for multi-optim method  training (intel#2814)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* change random seed in UT

* [New feature] add transformer layer (intel#2825)

* add transformer

* refactor class name

* use same embedding for translation

* fix pr comments

* [Bug Fix] Fix Issue 2734 (#2816)

* fix issue 2734

* fix issue 2734

* fix issue 2734

* [Refactor] Reflection Utilization (#2831)

* refactor reflection utils

* refactor reflection utils

* feat: MKLDNN LSTM unidirectional/bidirectional inference support (intel#2806)

* LSTM draft

* MKLDNN LSTM fixed MD

* added hiddenSize

* setMemoryData NativeData

* weights NativeData format set to ldigo, all 1 test passed

* fixed format any problem

* LSTM weights bias initialisation

* add LSTM2 in nn

* Bidirectional LSTM inference enabled

* modified Bidirectional test

* LSTMSpec input format conversion bug between bigdl and mkldnn fixed, not support random weights, bias

* fixed the last problem 1 3 2 4

* Three inference tests with randomly generated parameters

* Added comments and modified the LSTMSpec (tests using Equivalent.nearequals)

* Deleted nn/LSTM2. Renamed methods. Added a requirement in nn/TimeDistributed

* combined initMemoryDescs() into initFwdPrimitives()

* Add require for input size and hidden size matching if layers of LSTM is more than one

* Refactor RNN

* Add comment on gate order to mkldnn/RNN

* Add unidirectional multilayer test

* add comments/ modify UTs

* phase is not used anymore/ use isTraining() in stead

* operationWant enhanced/ weight init/ release() parameters()

* remove input format check and change some variables names

* input format check / throw exception print info / release code

* comment style and RNNSerialTest

* remove unnecessary comments

* Softmax -> SoftMax (#2837)

* bug fix for cmul (intel#2836)

* bug fix for cmul

* meet pr comments

* set new storage to weight and bias for weight fusion (intel#2839)

* Add parameter processor for LARS (#2832)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* implement lars whole layer gradient norm calculation

* change random seed in UT

* add limitation on "trust" of LARS, remove debug output

* reformat

* add tests in DirtriOptimizer for LARS

* reformat

* update parameters in UT

* update parameters in UT

* Add transformer to LM example (intel#2835)

* add transformer to LM example

* refactor dropout in Transformer

* meet pr comments

* feat: MKLDNN LSTM unidirectional/bidirectional backward support (#2840)

* MKLDNN LSTM backward support with accuracy testing

* fix: require consistent between shape and layout of mkldnn (intel#2824)

* fix: fusion for multi-group of convolution (intel#2826)

* fix: support int8 of jointable (#2827)

* fix: support int8 of jointable
* doc: add more docs

* fix: invokeAndWait2 should throw the exception in the tasks (intel#2843)

* fix acc bug & init dnn thread (intel#2841)

* support tnc and ntc conversion (intel#2844)

* support ntc in dnn layer (intel#2847)

* support ntc in dnn layer

* meet pr comments

* [WIP]Add beam search feature in transformer model (intel#2834)

* add beam search feature

* Update beam search feature and unit test

* add symbolToLogits function set check

* update clearState and add serial test

* add SequenceBeamSearch to python layers

* add createSequenceBeamSearch method to python api

* feat: add a property to disable omp thread affinity (intel#2849)

* fix: use treeset to calc topk to upgrade the performance of DetectionOutputSSD (intel#2853)

* fix: wrong affinity settings (intel#2857)

* update beam search feature for interface with transformer model (#2855)

* update beam search for padding value and cache structure

* update python API for beam search

* add comments and update python layer

* modify comments format

* modify comments format

* Support converting blas lstm to dnn lstm (#2846)

* convert from blas lstm to dnn lstm

* meet pr comments

* fix load lstm error bug (intel#2858)

* Add beam search in transformer (intel#2856)

* Add beam search in transformer

* meet pr comments

* fix: upgrade the performance of normalize (intel#2854)

* feat: add axis to softmax (intel#2859)

* add release doc for 0.9 (intel#2862)

* fix: update core ref to master (intel#2865)

* flip version to 0.10.0 (intel#2869)

* [Bug Fix] - Fix module version comparison  (intel#2871)

* update serialization

* update serialization

* convert IRgraph momentum to mkldnn (intel#2872)

* tutorial fix (intel#2879)

* feat: RoiAlign Forward (intel#2874)

* Add set input output format API in Python (intel#2880)

* add set input output format

* add static graph check

* feat: Feature Pyramid Networks Forward (intel#2870)

* fix memory leak for ir graph training (intel#2895)

* add gemm layer (#2882)

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add gemm layer

* add gemm layer

* add Shape layer (intel#2885)

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add Gather layer (intel#2897)

* add gather layer

* [New feature] Add maskhead (intel#2892)

* support for maskhead

* fix unit tests (intel#2905)

* modify  predict/predictClass function  (#2868)

* predictClass output modification

* predict/predictClass function modification in Beta Api

* predict/predictClass function modification

* predict/predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* [New feature] Add Boxhead (intel#2894)

* add boxhead

* add SerialTest

* meet pr comments

* fix: Add TopBlocks to Feature Pyramid Networks (FPN) (#2899)

* Add Mean Average Precision validation method (intel#2906)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* fix boxhead unit tests (#2912)

* python api nested list input and pooler python api (intel#2900)

* Auto memory management for MKLDNN (#2867)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* style fixes

* change _implicitMemoryOwner -> _this

* [New feature] Add region proposal (intel#2896)

* add Regionproposal

* [New feature] add maskrcnn (#2908)

* add maskrcnn

* fix mask head

* move maskrcnn to models

* add maskrcnn serialTest

* Add Onnx Supported Layers (intel#2902)

* remove duplicated layers

* Update RoiLabel class and add RoiImageFeatureToBatch (intel#2913)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* update RoiLabel, add RoiImageFeatureToBatch

* fix typo in class name

* updates by suggestions

* minor updates

* Move RoiMiniBatch to MTImageFeatureToBatch.scala

* mask in RoiLabel now have Floats not Bytes

* use IndexedSeq for RoiLabel

* style fix

* add isCrowd and origSize to final target table

* style fix

* isCrowd change to float, add doc

* add tests and bug fixes

* add util getting RoiLabels from ImageFeatures

* add util getting RoiLabels from Table

* comment out the tests

* rename utils in RoiLabel

* feat: MKLDNN GRU forward/backward support (#2893)

* Onnx support: modify unsqueeze function (#2910)

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* add maskutils (intel#2921)

* add maskutils

* update tests & docs

* fix typo in document

* Fix memory leaks on training (intel#2914)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* fix memory leak in batch norm

* style fixes

* change _implicitMemoryOwner -> _this

* release submat

* release opencv submats

* support samples with different size  to one mini batch (intel#2929)

* add to batch with resize

* meet comments

* support batch for mask head and pooler (intel#2926)

* support batch for mask head

* meet comments

* Onnx support: add a dim parameter to ops.Gather (intel#2920)

* add dim parameter to ops.Gather

* improve and simplify code

* improve and simplify code

* improve and simplify code

* improve and simplify code

* support batch for regionproposal (#2928)

* support batch for regionproposal

* enable gru blas-to-dnn conversion (intel#2930)

* Onnx support: add pos parameter to softmax (intel#2933)

* add pos parameter to softmax

* add pos parameter to softmax

* add pos parameter to softmax

* fix review problem

* fix review problem

* Add resize for segmentation (intel#2923)

* add resize for segmentation

* meet pr comments

* support batch input for boxhead (#2924)

* boxhead support batch input

* meet pr comments

* COCO SeqFile (intel#2927)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* ignore non-existing images

* updates based on GH comments

* ONNX Support (#2918)

* onnx dev

* add onnx loader

* clean up

* feat: add precision recall auc (#2941)

* feat: add precision recall auc

* add post processing for maskrcnn model (#2931)

* add mask postprocessing

* put image info to mask model

* fix TimeDistributedCriterion() lack of parameter of dimension issue (intel#2940)

* revert back api (intel#2943)

* fix: softmax and bn+scale fusion (intel#2937)

* feat: multi models support with MKL-DNN backend (intel#2936)

* feat: multi models support with MKL-DNN backend

* add COCO MAP (#2935)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* add COCO MAP

* revert merge conflict

* ignore non-existing images

* add IOU related API. MAP now parses RLEs

* BBox now inclusive

* updates based on GH comments

* add COCODataset.getImageById

* COCO topK default => -1, remove height: Int, width: Int in GroundTruthRLE

* update imageId2Image

* rename MAPObjectDetection utils, add cocoSegmentationAndBBox, refine formatting

* rename utils

* update documents

* check size of bbox & classes & scores & labels & iscrowd. Handle empty predictions

* add gt and target image size checking, add support for empty target bbox, add UT

* detection sorted before matching with GT. Optimize MAPResult merging. Add UT for merging

* COCO Seq file reader: grey to bgr (intel#2942)

* grey to bgr

* refactor isGrayScaleImage

* simplify grey scale image checking

* Add the flushing denormal values option on BigDL side (#2934)

* add no argument apply api for softmax (intel#2945)

* add no argument apply api for softmax

* add no argument apply api for softmax

* ONNX ResNet example (intel#2939)

* add onnx resnet example

* add doc for onnx

* add doc for onnx

* clean up

* add maskrcnn inference example (intel#2944)

* add maskrcnn inference example

* meet pr comments

* add model download url

* Update the RoiLabel and MTImageFeatureToBatch (intel#2925)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* Python MKLDNN examples for CNN(LeNet) and RNN(LSTM) (#2932)

* fix: takeSample only works for dnn backend and get one batch (intel#2947)

* fix: takeSample only works for dnn backend and get one batch

* edit doc (#2948)

* Rename filesToRoiImageFrame to filesToRoiImageFeatures (intel#2949)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* filesToRoiImageFrame -> filesToRoiImageFeatures, to public

* fix: move out setMklThreads of MklDnn (intel#2950)

* memory data cleanup (#2956)

* memory data cleanup

* Onnx support: RoiAlign and TopK parameter update (#2957)

* Topk add dim and increase parameter

* RoiAlign add max pooling mode

* add test cases

* add test cases

* remove masks requirements (intel#2959)

* fix: the squeeze should not be included in IRElement (intel#2962)

* enhance COCODataset (#2954)

* enhance COCODataset:
Add COCODataset.loadFromSeqFile
Add COCODataset.toImageFeatures
Add COCOImage.toTable

* rename and polish doc

* fix COCO serialize bug

* fix typo in function name

* typo fix (intel#2965)

* rename RoiImageFeatureToBatch APIs (#2964)

* RoiMiniBatch enhancement (#2953)

* SerializableIndexedSeq

* allow empty target & image size info

* rename RoiImageFeatureToBatch APIs

* set as private

* change back to array

* MTImageFeatureToBatch without labels

* handle iscrowd

* remove duplication in merge

* feat: add softmax backward (intel#2967)

* feat: add softmax backward

* fix: fuse bn scale and relu to bn. (intel#2966)

* fix: fuse bn scale and relu.

* fix mask unit tests (intel#2973)

* fix: nms stability when using treeset. (intel#2972)

* flip version to 0.11 (intel#2974)

* refactor anchor generator (#2963)

* refactor anchor generator

* meet pr comments

* fix code style

* ROIAlign refactor (intel#2960)

* ROIAlign refactor

* fix unit tests

* fix model load of maskrcnn (intel#2961)

* fix maskrcnn model load

* delete temp file

* fix maskrcnn tests

* support roialign backward (intel#2975)

* support roialign backward

* fix sparselinear unit test

* fix: bn nhwc error, the channel should be the last dim (#2981)

* refactor: move torch relevants unit tests to integration tests. (intel#2971)

* fix: enable integration accuracy tests (intel#2976)

* fix: softmax dnn backend wrong order of primitive (intel#2986)

* modify TextClassifier.scala (#2987)

* Add a method to merge nested StaticGraphs (intel#2985)

* NHWC support when running with MKL-DNN (#2989)

* support NHWC for MKLDNN

* fix unit tests

* Keras with MKL-DNN backend support (#2990)

* Update README.md

* Update README.md

* feat: add distri optimizer v2 (intel#2992)

* update error message in AllReduceParameter (#2997)

* update error message in AllReduceParameter

* use tensorflow proto jar (#2994)

* fix callBigDLFunc (intel#3002)

* Remove final for AbstractModule (intel#3001)

* DistriOptimizerV2 argument (intel#3003)

* call DistriOptimizerV2

* fix inception (intel#3010)

* fix top1 and treenn (intel#3011)

* remove final setExtraParameters (#3014)

* move pretrain in DistriOptimizerV2 (intel#3016)

* move getData

* rename

* remove time counting

* deprecate dlframe (intel#3012)

* deprecate dlframe

* fix throughput (#3017)

* fix throughput

* update

* add release doc for 0.10.0 (intel#3020)

* test examples by distrioptimizerv2 (intel#3007)

* enable scala examples by distrioptimizerv2

* update example's readme

* update integration test

* test python examples by distriOptimizerV2 (intel#3008)

* Test python examples by distriOptimizerV2

* deprecate nn.keras (intel#3013)

* deprecate nn.keras

* fix loss when minibatch size is different (intel#3021)

* fix loss

* fix ut

* fix style check (intel#3022)

* specify pyspark version (intel#3030)

* specify pyspark version

* add release doc for 0.11 (#3026)

* flip version to 0.12 (intel#3029)



* update

* fix KerasLayer new parameters() (#3034)

* Fix analytics zoo protobuf shading problem (intel#3033)

* change shade name and remove protobuf-java (already introduced by tf)

* remove protobuf

* add required dependencies (#3047)

* update doc (intel#3056)

* Updatedoc (#3060)

* Update install-from-pip.md

* [WIP] spark 3.0 (intel#3054)

* spark 3.0

* add spark3.0 deployment (intel#3061)

* add spark3.0 deployment

* add warning to remind Optimizer() deprecates (intel#3062)

* add warning to remind deprecates

* Update scala maven plugin (#3068)

* update scala maven plugin

* change to public (#3064)

* Add big model support (#3067)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* squeeze target dimension (corner case) in ClassNLLCriterion (intel#3072)

* fix target dimension match error

* update message (#3073)

* flip version to 0.13-snapshot (intel#3074)

* flip version to 0.13-snapshot

* Uncompressed Tensor  (intel#3079)

* support no compressing parameter

* address comments

* hotfix ClassNLLCriterion with cloned target (#3081)

* hotfix ClassNLLCriterion with cloned target

* Fix SerializationUtils clone issue of QuantizedTensor (intel#3088)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* update clone quantizedtensor

* update

* add OptimPredictorShutdownSpec UT in integration test (#3089)

* move integration UT to a general test script (intel#3094)

* back port master (intel#3096)

* set seed to avoid random error in PredictionServiceUT (intel#3097)

* Jdk11 support (intel#3098)

* update for jdk 11 support and doc

* add serializeUid (intel#3099)

* update doc (intel#3104)

* add doc for running in ide (intel#3106)

* fix callBigDLFunc return a Int while the true return value from java is a byte array. (intel#3111)

* add list of df support (intel#3113)

* Update readme (intel#3118)

* Update index.md

* add 0.12.2 release download (#3122)

* remove DLFrames (intel#3124)

* remove DLFrames

* update

* update

* update

* rm dlframe example from test script

* Add Utest about dividing zero (#3128)

* Add Utest about dividing zero

* add Utest and zero check of LocalData

* add Utest and zero check of LocalData

* change

* Add Utest about dividing zero

* fix test

* add python3 to Dockerfile (intel#3132)

* add python3 to Dockerfile

* update

* update jdk

* update

* make default DistriOptimizer as V2 (intel#3129)

* make default DistriOptimizer as V2

* update

* fix dlframe (intel#3133)

* DistriOptimizerV2 logger (intel#3135)

* DistriOptimizerV2 logger

* update

* fix style check

* validate epoch num

* move dlframe SharedParamsApater to AZ and roll back to OptimizerV1 (intel#3137)

* upgrade spark version (intel#3138)

* Update deploy-spark2.sh

* 0.13 release doc (#3144)

* upgrade log4j (intel#3141)

* flip0.14 (intel#3142)

* flip0.14

* update

* Update deploy-spark3.sh (#3145)

* update

* update

* update

* update

* fix make dist

* migrate path

* update

* update

Co-authored-by: zhangxiaoli73 <[email protected]>
Co-authored-by: Jerry Wu <[email protected]>
Co-authored-by: Xin Qiu <[email protected]>
Co-authored-by: Yanzhang Wang <[email protected]>
Co-authored-by: GenBrg <[email protected]>
Co-authored-by: LeicongLi <[email protected]>
Co-authored-by: Emiliano Martinez <[email protected]>
Co-authored-by: abdolence <[email protected]>
Co-authored-by: Enrique Garcia <[email protected]>
Co-authored-by: Louie Tsai <[email protected]>
Co-authored-by: yaochi <[email protected]>
Co-authored-by: Menooker <[email protected]>
Co-authored-by: Menooker <[email protected]>
Co-authored-by: Firecrackerxox <[email protected]>
Co-authored-by: majing921201 <[email protected]>
Co-authored-by: jenniew <[email protected]>
Co-authored-by: Xiao <[email protected]>
Co-authored-by: Firecrackerxox <[email protected]>
Co-authored-by: Hui Li <[email protected]>
Co-authored-by: Jason Dai <[email protected]>
Co-authored-by: dding3 <[email protected]>
Co-authored-by: Yang Wang <[email protected]>
Co-authored-by: Yina Chen <[email protected]>
Co-authored-by: Hangrui Cao <[email protected]>
Co-authored-by: pinggao18 <[email protected]>
Le-Zheng added a commit to Le-Zheng/analytics-zoo that referenced this pull request Sep 14, 2021
* convert static graph to IR graph and build (intel#2711)

* add static graph to IR graph

* meet pr comments

* [Enhancement] - Enhance unig test to avoid dynamic resource allocation issue by docker (intel#2713)

* make the core number fixed

* fix local predictor

* add Trigger and/or python API (intel#2682)

* add spark 2.4 support (intel#2715)

* update sparse tensor's document (#2714)

* Reserve all state in OptimMethod when calling Optimizer.optimize() multiple times (#2648)

* reserve optimMethod for each worker

* add valdiation throughput

* cache variable previousOptim

* fix: move mkldnn computing to a single thread pool (intel#2724)

Because if we use the parent thread directly, there will be two bugs,
1. The child threads forked from parent thread will be bound to core 0
because of the affinity settings.
2. The native thread has some unknown thread local variables. So if
the parent thread exits and is recreated, such as the thread from
Executors.newFixedThreadPool. The whole app will be segment fault.
The parent thread means the main thread (Local Mode) or worker thread of
mapPartition (Distributed Mode).

* add ceilMode for Pooling & fix batchNorm evaluate (#2708)

* add ceilMode for Pooling & fix batchNorm evaluate

* add training status for dnn layer

* fix comments

* fix IRGraph init & Add regualizer (#2736)

* fix IRGraph init & Add regualizer

* meet review comments

* fix: update mkldnn version to v0.17 issues. (intel#2712)

There're two issues,

1. the padding tensor required. mkl-dnn will use a padding tensor which
    will use more memory, such as 4x1x28x28 to 4x8x28x28(avx2). It will
    pad to times of simd width.
2. the TensorMMap between DenseTensor and DnnTensor. Previous impl
    will allocate DnnTensor when model is created, which will cost too much
    space. So this patch will allocate it at runtime.

* add computshape for some layers and add skip primitives in DnnGraph (intel#2740)

* add computshape for some layer and add skip primitives in DnnGraph

* meet pr comments

* Improve documentation (intel#2745)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* include edge case to cover all the data types (#2742)

* layer auto fusion for dnn graph (intel#2746)

* add auto fusion in dnn graph

* refactor predict for dnn model (intel#2737)

* refactor predict for dnn model

* remove some unit tests (intel#2752)

* remove some conflict tests (#2753)

* Update documentation (intel#2749)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Fix Add operation error when type is Double importing Tensorflow graph (#2721)

* feature: add byte supports for DnnTensor (intel#2751)

* feat: add byte supports for DnnTensor

* [New Feature] Calculating Scales (#2750)

* [New Feature]Calculating Scales

* recursively update mask for container module (intel#2754)

* recursively update mask for container module

* [Enhancement] - Speed up BlasWrapper performance under MKL-DNN (intel#2748)

* add parallel in Blaswrapper

* refactor to support ssd

* meet pr comments

* fix logger serialize

* Loss Function docs improvement (intel#2757)

* Improve Loss Function docs v2

* change asInstanceOf to toDistirbuted in optimizer (#2755)

* change asInstanceOf to toDistirbuted

* change asInstanceOf to toDistirbuted

* convert scale in blas to dnn (#2758)

* convert scale in blas to dnn

* meet pr comment

* feat: reorder for int8 supports (#2756)

1. Because the new data type, we should add a new attribute called dataType
    to the `MemoryData`.
2. Because we should transfer the scales between FP32->int8 and Int8->FP32.
    we should add two new attributes called `mask` and `scales`.

* fix conversion accuracy (intel#2760)

*  fix accuracy for saved model

* exclude mkldnn model when conversion

* feature: layer wise supports of int8 (intel#2762)

Enable the int8 data type in layers, especially for convolutions.
So for a specific layer, it can accept a int8 input. If you want to the fp32
output, should add a reorder.

* feature: mkldnn int8 layer wise supports (intel#2759)

including 3 steps.

1. generate scales of model.
   need an api like `generateScalesWithMask` to generate the scales of
   fp32 model. and the model returned is an fp32 model too.
2. quantize the model
   the `quantize()` api will be compatible with the `bigquant`
   backend, which will set the quantize flag. And when doing compile,
   the quantized weight, output, input will be generated by mkldnn at
   runtime.
3. do the inference (forward).

* update readme for v1 training (intel#2763)

* update release doc for preparation (intel#2764)

* change some docs about mkldnn (intel#2765)

* add comments about mkldnn

* meet pr comments

* examples for int8 (intel#2761)

This is an example of how to use mkldnn int8. There're two steps, use
GenInt8Scales to generate the scales first and save the new model. And than you
can use the quantized model as usual.

* enable fustion by default (intel#2766)

* fix: the influence of default value of fusion (#2768)

* fix: use too much memory of mkldnn models (intel#2783)

* fix: inplace of input/output and weight dimension error (intel#2779)

Some layer's input and output use the same memory. We can't do forward in the
`calcScales`. Because at that time, the input has been changed, its scales maybe
not right. Such as,

Seqeuntail().add(Conv).add(ReLU)

it will do two steps, seq.forward(input) first. and when go into the ReLU, it
will do another forward, so the input will be the output. And scales will be
wrong.

For convolution's weight, the dimension always is 5, although the group number
is 1. But for dnn convolution, if there's no group, the weight's dimension
should be 4.

* fix: the blas wrapper has no scales (intel#2778)

* fix softmax (intel#2777)

* fix: performance regression on resnet50 (intel#2774)

the u8 to s8 or s8 to u8 needs no reorder on this case.

* fix log init (#2781)

* fix: dropout should init primitive (#2789)

* Docs update for spark 2.3, build 0.7 and deps exlude (intel#2671)

* flip to 0.9.0 (intel#2792)

* Improve Layer documentation v1 (#2767)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Loss Function docs improvement v1

* Improve Loss Function docs v2

* Improve Layers documentation

* Improve documentation on Activations

* minor fix

* Update a code section with python style on Metrics.md (intel#2665)

* [Fix] doc : some changes for scalaUserGuide and release links according to … (intel#2791)

* doc : some changes for scalaUserGuide and release links according to v0.8.0 release

* Update build-bigdl-core.md

* Update build-bigdl-core.md

* test: should compare the right grad input (intel#2794)

* fix the wrong error message (#2800)

* [New feature] Add attention layer and ffn layer (intel#2795)

* add attention layer

* add ffn layer and more unit tests

* refactor according to pr comments

* add SerializationTest

* fix unit tests

* add python api

* update readme with newly adopted mkl-dnn (#2803)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer (intel#2802)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer:
Add LARS optimizer: Layer-wise scaled. Also with utility functions to build a set of LARS optim for a container.

Bug fix: The gradient block id of AllReduceParameter is originally composed of {id}{pidTo}gradientBytes{pidFrom}. But the combination of {id}{pidTo} will cause ambiguity. e.g., "112" can be {1}{12} or {11}{2}. Now a "_" is added to separate id from pidTo

* refine documents, correctly set the lrSchedulerOwner bit

* format the added code

* make Lars inherit SGD

* rename Lars -> LarsSGD and reformat

* style changes

* bugfix - set mask for container (intel#2807)

* bugfix - set mask for container

* bugfix #2805: set dimension mask

* Update Graph.scala

* Update Graph.scala

* change set mask indicator's name

* rename set mask params

* [Enhancement]: Scala Reflection: get default value for constructor parameters (intel#2808)

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* resolve conflict

* resolve conflict

* code style check

* remove print

* fix typos

fix typos

* replace randomcropper with centercrop for better performance (#2818)

* fix: memory data hash code should contain data type (intel#2821)

* Optimize backward graph generation and CAddTable (intel#2817)

* Optimize backward graph generation and caddtable

* refine add table

* change api name

* add layer norm and expand size layers (#2819)

* add layer norm and expand size

* meet pr comments

* feat: enable global average pooling (intel#2823)

* feat: enable global average pooling

* test: add more unit tests

* Optimizers: use member variable in parent class

* Revert "Optimizers: use member variable in parent class"

This reverts commit 7e47204

* Dilation in MKL-DNN Convolution (intel#2815)

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* fix typos

fix typos

* make todo all uppercase

* fix: calculate arbitrary mask of scales (intel#2822)

* Use one AllReduceParameter for multi-optim method  training (intel#2814)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* change random seed in UT

* [New feature] add transformer layer (intel#2825)

* add transformer

* refactor class name

* use same embedding for translation

* fix pr comments

* [Bug Fix] Fix Issue 2734 (#2816)

* fix issue 2734

* fix issue 2734

* fix issue 2734

* [Refactor] Reflection Utilization (#2831)

* refactor reflection utils

* refactor reflection utils

* feat: MKLDNN LSTM unidirectional/bidirectional inference support (intel#2806)

* LSTM draft

* MKLDNN LSTM fixed MD

* added hiddenSize

* setMemoryData NativeData

* weights NativeData format set to ldigo, all 1 test passed

* fixed format any problem

* LSTM weights bias initialisation

* add LSTM2 in nn

* Bidirectional LSTM inference enabled

* modified Bidirectional test

* LSTMSpec input format conversion bug between bigdl and mkldnn fixed, not support random weights, bias

* fixed the last problem 1 3 2 4

* Three inference tests with randomly generated parameters

* Added comments and modified the LSTMSpec (tests using Equivalent.nearequals)

* Deleted nn/LSTM2. Renamed methods. Added a requirement in nn/TimeDistributed

* combined initMemoryDescs() into initFwdPrimitives()

* Add require for input size and hidden size matching if layers of LSTM is more than one

* Refactor RNN

* Add comment on gate order to mkldnn/RNN

* Add unidirectional multilayer test

* add comments/ modify UTs

* phase is not used anymore/ use isTraining() in stead

* operationWant enhanced/ weight init/ release() parameters()

* remove input format check and change some variables names

* input format check / throw exception print info / release code

* comment style and RNNSerialTest

* remove unnecessary comments

* Softmax -> SoftMax (#2837)

* bug fix for cmul (intel#2836)

* bug fix for cmul

* meet pr comments

* set new storage to weight and bias for weight fusion (intel#2839)

* Add parameter processor for LARS (#2832)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* implement lars whole layer gradient norm calculation

* change random seed in UT

* add limitation on "trust" of LARS, remove debug output

* reformat

* add tests in DirtriOptimizer for LARS

* reformat

* update parameters in UT

* update parameters in UT

* Add transformer to LM example (intel#2835)

* add transformer to LM example

* refactor dropout in Transformer

* meet pr comments

* feat: MKLDNN LSTM unidirectional/bidirectional backward support (#2840)

* MKLDNN LSTM backward support with accuracy testing

* fix: require consistent between shape and layout of mkldnn (intel#2824)

* fix: fusion for multi-group of convolution (intel#2826)

* fix: support int8 of jointable (#2827)

* fix: support int8 of jointable
* doc: add more docs

* fix: invokeAndWait2 should throw the exception in the tasks (intel#2843)

* fix acc bug & init dnn thread (intel#2841)

* support tnc and ntc conversion (intel#2844)

* support ntc in dnn layer (intel#2847)

* support ntc in dnn layer

* meet pr comments

* [WIP]Add beam search feature in transformer model (intel#2834)

* add beam search feature

* Update beam search feature and unit test

* add symbolToLogits function set check

* update clearState and add serial test

* add SequenceBeamSearch to python layers

* add createSequenceBeamSearch method to python api

* feat: add a property to disable omp thread affinity (intel#2849)

* fix: use treeset to calc topk to upgrade the performance of DetectionOutputSSD (intel#2853)

* fix: wrong affinity settings (intel#2857)

* update beam search feature for interface with transformer model (#2855)

* update beam search for padding value and cache structure

* update python API for beam search

* add comments and update python layer

* modify comments format

* modify comments format

* Support converting blas lstm to dnn lstm (#2846)

* convert from blas lstm to dnn lstm

* meet pr comments

* fix load lstm error bug (intel#2858)

* Add beam search in transformer (intel#2856)

* Add beam search in transformer

* meet pr comments

* fix: upgrade the performance of normalize (intel#2854)

* feat: add axis to softmax (intel#2859)

* add release doc for 0.9 (intel#2862)

* fix: update core ref to master (intel#2865)

* flip version to 0.10.0 (intel#2869)

* [Bug Fix] - Fix module version comparison  (intel#2871)

* update serialization

* update serialization

* convert IRgraph momentum to mkldnn (intel#2872)

* tutorial fix (intel#2879)

* feat: RoiAlign Forward (intel#2874)

* Add set input output format API in Python (intel#2880)

* add set input output format

* add static graph check

* feat: Feature Pyramid Networks Forward (intel#2870)

* fix memory leak for ir graph training (intel#2895)

* add gemm layer (#2882)

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add gemm layer

* add gemm layer

* add Shape layer (intel#2885)

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add Gather layer (intel#2897)

* add gather layer

* [New feature] Add maskhead (intel#2892)

* support for maskhead

* fix unit tests (intel#2905)

* modify  predict/predictClass function  (#2868)

* predictClass output modification

* predict/predictClass function modification in Beta Api

* predict/predictClass function modification

* predict/predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* [New feature] Add Boxhead (intel#2894)

* add boxhead

* add SerialTest

* meet pr comments

* fix: Add TopBlocks to Feature Pyramid Networks (FPN) (#2899)

* Add Mean Average Precision validation method (intel#2906)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* fix boxhead unit tests (#2912)

* python api nested list input and pooler python api (intel#2900)

* Auto memory management for MKLDNN (#2867)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* style fixes

* change _implicitMemoryOwner -> _this

* [New feature] Add region proposal (intel#2896)

* add Regionproposal

* [New feature] add maskrcnn (#2908)

* add maskrcnn

* fix mask head

* move maskrcnn to models

* add maskrcnn serialTest

* Add Onnx Supported Layers (intel#2902)

* remove duplicated layers

* Update RoiLabel class and add RoiImageFeatureToBatch (intel#2913)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* update RoiLabel, add RoiImageFeatureToBatch

* fix typo in class name

* updates by suggestions

* minor updates

* Move RoiMiniBatch to MTImageFeatureToBatch.scala

* mask in RoiLabel now have Floats not Bytes

* use IndexedSeq for RoiLabel

* style fix

* add isCrowd and origSize to final target table

* style fix

* isCrowd change to float, add doc

* add tests and bug fixes

* add util getting RoiLabels from ImageFeatures

* add util getting RoiLabels from Table

* comment out the tests

* rename utils in RoiLabel

* feat: MKLDNN GRU forward/backward support (#2893)

* Onnx support: modify unsqueeze function (#2910)

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* add maskutils (intel#2921)

* add maskutils

* update tests & docs

* fix typo in document

* Fix memory leaks on training (intel#2914)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* fix memory leak in batch norm

* style fixes

* change _implicitMemoryOwner -> _this

* release submat

* release opencv submats

* support samples with different size  to one mini batch (intel#2929)

* add to batch with resize

* meet comments

* support batch for mask head and pooler (intel#2926)

* support batch for mask head

* meet comments

* Onnx support: add a dim parameter to ops.Gather (intel#2920)

* add dim parameter to ops.Gather

* improve and simplify code

* improve and simplify code

* improve and simplify code

* improve and simplify code

* support batch for regionproposal (#2928)

* support batch for regionproposal

* enable gru blas-to-dnn conversion (intel#2930)

* Onnx support: add pos parameter to softmax (intel#2933)

* add pos parameter to softmax

* add pos parameter to softmax

* add pos parameter to softmax

* fix review problem

* fix review problem

* Add resize for segmentation (intel#2923)

* add resize for segmentation

* meet pr comments

* support batch input for boxhead (#2924)

* boxhead support batch input

* meet pr comments

* COCO SeqFile (intel#2927)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* ignore non-existing images

* updates based on GH comments

* ONNX Support (#2918)

* onnx dev

* add onnx loader

* clean up

* feat: add precision recall auc (#2941)

* feat: add precision recall auc

* add post processing for maskrcnn model (#2931)

* add mask postprocessing

* put image info to mask model

* fix TimeDistributedCriterion() lack of parameter of dimension issue (intel#2940)

* revert back api (intel#2943)

* fix: softmax and bn+scale fusion (intel#2937)

* feat: multi models support with MKL-DNN backend (intel#2936)

* feat: multi models support with MKL-DNN backend

* add COCO MAP (#2935)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* add COCO MAP

* revert merge conflict

* ignore non-existing images

* add IOU related API. MAP now parses RLEs

* BBox now inclusive

* updates based on GH comments

* add COCODataset.getImageById

* COCO topK default => -1, remove height: Int, width: Int in GroundTruthRLE

* update imageId2Image

* rename MAPObjectDetection utils, add cocoSegmentationAndBBox, refine formatting

* rename utils

* update documents

* check size of bbox & classes & scores & labels & iscrowd. Handle empty predictions

* add gt and target image size checking, add support for empty target bbox, add UT

* detection sorted before matching with GT. Optimize MAPResult merging. Add UT for merging

* COCO Seq file reader: grey to bgr (intel#2942)

* grey to bgr

* refactor isGrayScaleImage

* simplify grey scale image checking

* Add the flushing denormal values option on BigDL side (#2934)

* add no argument apply api for softmax (intel#2945)

* add no argument apply api for softmax

* add no argument apply api for softmax

* ONNX ResNet example (intel#2939)

* add onnx resnet example

* add doc for onnx

* add doc for onnx

* clean up

* add maskrcnn inference example (intel#2944)

* add maskrcnn inference example

* meet pr comments

* add model download url

* Update the RoiLabel and MTImageFeatureToBatch (intel#2925)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* Python MKLDNN examples for CNN(LeNet) and RNN(LSTM) (#2932)

* fix: takeSample only works for dnn backend and get one batch (intel#2947)

* fix: takeSample only works for dnn backend and get one batch

* edit doc (#2948)

* Rename filesToRoiImageFrame to filesToRoiImageFeatures (intel#2949)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* filesToRoiImageFrame -> filesToRoiImageFeatures, to public

* fix: move out setMklThreads of MklDnn (intel#2950)

* memory data cleanup (#2956)

* memory data cleanup

* Onnx support: RoiAlign and TopK parameter update (#2957)

* Topk add dim and increase parameter

* RoiAlign add max pooling mode

* add test cases

* add test cases

* remove masks requirements (intel#2959)

* fix: the squeeze should not be included in IRElement (intel#2962)

* enhance COCODataset (#2954)

* enhance COCODataset:
Add COCODataset.loadFromSeqFile
Add COCODataset.toImageFeatures
Add COCOImage.toTable

* rename and polish doc

* fix COCO serialize bug

* fix typo in function name

* typo fix (intel#2965)

* rename RoiImageFeatureToBatch APIs (#2964)

* RoiMiniBatch enhancement (#2953)

* SerializableIndexedSeq

* allow empty target & image size info

* rename RoiImageFeatureToBatch APIs

* set as private

* change back to array

* MTImageFeatureToBatch without labels

* handle iscrowd

* remove duplication in merge

* feat: add softmax backward (intel#2967)

* feat: add softmax backward

* fix: fuse bn scale and relu to bn. (intel#2966)

* fix: fuse bn scale and relu.

* fix mask unit tests (intel#2973)

* fix: nms stability when using treeset. (intel#2972)

* flip version to 0.11 (intel#2974)

* refactor anchor generator (#2963)

* refactor anchor generator

* meet pr comments

* fix code style

* ROIAlign refactor (intel#2960)

* ROIAlign refactor

* fix unit tests

* fix model load of maskrcnn (intel#2961)

* fix maskrcnn model load

* delete temp file

* fix maskrcnn tests

* support roialign backward (intel#2975)

* support roialign backward

* fix sparselinear unit test

* fix: bn nhwc error, the channel should be the last dim (#2981)

* refactor: move torch relevants unit tests to integration tests. (intel#2971)

* fix: enable integration accuracy tests (intel#2976)

* fix: softmax dnn backend wrong order of primitive (intel#2986)

* modify TextClassifier.scala (#2987)

* Add a method to merge nested StaticGraphs (intel#2985)

* NHWC support when running with MKL-DNN (#2989)

* support NHWC for MKLDNN

* fix unit tests

* Keras with MKL-DNN backend support (#2990)

* Update README.md

* Update README.md

* feat: add distri optimizer v2 (intel#2992)

* update error message in AllReduceParameter (#2997)

* update error message in AllReduceParameter

* use tensorflow proto jar (#2994)

* fix callBigDLFunc (intel#3002)

* Remove final for AbstractModule (intel#3001)

* DistriOptimizerV2 argument (intel#3003)

* call DistriOptimizerV2

* fix inception (intel#3010)

* fix top1 and treenn (intel#3011)

* remove final setExtraParameters (#3014)

* move pretrain in DistriOptimizerV2 (intel#3016)

* move getData

* rename

* remove time counting

* deprecate dlframe (intel#3012)

* deprecate dlframe

* fix throughput (#3017)

* fix throughput

* update

* add release doc for 0.10.0 (intel#3020)

* test examples by distrioptimizerv2 (intel#3007)

* enable scala examples by distrioptimizerv2

* update example's readme

* update integration test

* test python examples by distriOptimizerV2 (intel#3008)

* Test python examples by distriOptimizerV2

* deprecate nn.keras (intel#3013)

* deprecate nn.keras

* fix loss when minibatch size is different (intel#3021)

* fix loss

* fix ut

* fix style check (intel#3022)

* specify pyspark version (intel#3030)

* specify pyspark version

* add release doc for 0.11 (#3026)

* flip version to 0.12 (intel#3029)



* update

* fix KerasLayer new parameters() (#3034)

* Fix analytics zoo protobuf shading problem (intel#3033)

* change shade name and remove protobuf-java (already introduced by tf)

* remove protobuf

* add required dependencies (#3047)

* update doc (intel#3056)

* Updatedoc (#3060)

* Update install-from-pip.md

* [WIP] spark 3.0 (intel#3054)

* spark 3.0

* add spark3.0 deployment (intel#3061)

* add spark3.0 deployment

* add warning to remind Optimizer() deprecates (intel#3062)

* add warning to remind deprecates

* Update scala maven plugin (#3068)

* update scala maven plugin

* change to public (#3064)

* Add big model support (#3067)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* squeeze target dimension (corner case) in ClassNLLCriterion (intel#3072)

* fix target dimension match error

* update message (#3073)

* flip version to 0.13-snapshot (intel#3074)

* flip version to 0.13-snapshot

* Uncompressed Tensor  (intel#3079)

* support no compressing parameter

* address comments

* hotfix ClassNLLCriterion with cloned target (#3081)

* hotfix ClassNLLCriterion with cloned target

* Fix SerializationUtils clone issue of QuantizedTensor (intel#3088)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* update clone quantizedtensor

* update

* add OptimPredictorShutdownSpec UT in integration test (#3089)

* move integration UT to a general test script (intel#3094)

* back port master (intel#3096)

* set seed to avoid random error in PredictionServiceUT (intel#3097)

* Jdk11 support (intel#3098)

* update for jdk 11 support and doc

* add serializeUid (intel#3099)

* update doc (intel#3104)

* add doc for running in ide (intel#3106)

* fix callBigDLFunc return a Int while the true return value from java is a byte array. (intel#3111)

* add list of df support (intel#3113)

* Update readme (intel#3118)

* Update index.md

* add 0.12.2 release download (#3122)

* remove DLFrames (intel#3124)

* remove DLFrames

* update

* update

* update

* rm dlframe example from test script

* Add Utest about dividing zero (#3128)

* Add Utest about dividing zero

* add Utest and zero check of LocalData

* add Utest and zero check of LocalData

* change

* Add Utest about dividing zero

* fix test

* add python3 to Dockerfile (intel#3132)

* add python3 to Dockerfile

* update

* update jdk

* update

* make default DistriOptimizer as V2 (intel#3129)

* make default DistriOptimizer as V2

* update

* fix dlframe (intel#3133)

* DistriOptimizerV2 logger (intel#3135)

* DistriOptimizerV2 logger

* update

* fix style check

* validate epoch num

* move dlframe SharedParamsApater to AZ and roll back to OptimizerV1 (intel#3137)

* upgrade spark version (intel#3138)

* Update deploy-spark2.sh

* 0.13 release doc (#3144)

* upgrade log4j (intel#3141)

* flip0.14 (intel#3142)

* flip0.14

* update

* Update deploy-spark3.sh (#3145)

* update

* update

* update

* update

* fix make dist

* migrate path

* update

* update

Co-authored-by: zhangxiaoli73 <[email protected]>
Co-authored-by: Jerry Wu <[email protected]>
Co-authored-by: Xin Qiu <[email protected]>
Co-authored-by: Yanzhang Wang <[email protected]>
Co-authored-by: GenBrg <[email protected]>
Co-authored-by: LeicongLi <[email protected]>
Co-authored-by: Emiliano Martinez <[email protected]>
Co-authored-by: abdolence <[email protected]>
Co-authored-by: Enrique Garcia <[email protected]>
Co-authored-by: Louie Tsai <[email protected]>
Co-authored-by: yaochi <[email protected]>
Co-authored-by: Menooker <[email protected]>
Co-authored-by: Menooker <[email protected]>
Co-authored-by: Firecrackerxox <[email protected]>
Co-authored-by: majing921201 <[email protected]>
Co-authored-by: jenniew <[email protected]>
Co-authored-by: Xiao <[email protected]>
Co-authored-by: Firecrackerxox <[email protected]>
Co-authored-by: Hui Li <[email protected]>
Co-authored-by: Jason Dai <[email protected]>
Co-authored-by: dding3 <[email protected]>
Co-authored-by: Yang Wang <[email protected]>
Co-authored-by: Yina Chen <[email protected]>
Co-authored-by: Hangrui Cao <[email protected]>
Co-authored-by: pinggao18 <[email protected]>
Le-Zheng added a commit to Le-Zheng/analytics-zoo that referenced this pull request Sep 14, 2021
* convert static graph to IR graph and build (intel#2711)

* add static graph to IR graph

* meet pr comments

* [Enhancement] - Enhance unig test to avoid dynamic resource allocation issue by docker (intel#2713)

* make the core number fixed

* fix local predictor

* add Trigger and/or python API (intel#2682)

* add spark 2.4 support (intel#2715)

* update sparse tensor's document (#2714)

* Reserve all state in OptimMethod when calling Optimizer.optimize() multiple times (#2648)

* reserve optimMethod for each worker

* add valdiation throughput

* cache variable previousOptim

* fix: move mkldnn computing to a single thread pool (intel#2724)

Because if we use the parent thread directly, there will be two bugs,
1. The child threads forked from parent thread will be bound to core 0
because of the affinity settings.
2. The native thread has some unknown thread local variables. So if
the parent thread exits and is recreated, such as the thread from
Executors.newFixedThreadPool. The whole app will be segment fault.
The parent thread means the main thread (Local Mode) or worker thread of
mapPartition (Distributed Mode).

* add ceilMode for Pooling & fix batchNorm evaluate (#2708)

* add ceilMode for Pooling & fix batchNorm evaluate

* add training status for dnn layer

* fix comments

* fix IRGraph init & Add regualizer (#2736)

* fix IRGraph init & Add regualizer

* meet review comments

* fix: update mkldnn version to v0.17 issues. (intel#2712)

There're two issues,

1. the padding tensor required. mkl-dnn will use a padding tensor which
    will use more memory, such as 4x1x28x28 to 4x8x28x28(avx2). It will
    pad to times of simd width.
2. the TensorMMap between DenseTensor and DnnTensor. Previous impl
    will allocate DnnTensor when model is created, which will cost too much
    space. So this patch will allocate it at runtime.

* add computshape for some layers and add skip primitives in DnnGraph (intel#2740)

* add computshape for some layer and add skip primitives in DnnGraph

* meet pr comments

* Improve documentation (intel#2745)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* include edge case to cover all the data types (#2742)

* layer auto fusion for dnn graph (intel#2746)

* add auto fusion in dnn graph

* refactor predict for dnn model (intel#2737)

* refactor predict for dnn model

* remove some unit tests (intel#2752)

* remove some conflict tests (#2753)

* Update documentation (intel#2749)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Fix Add operation error when type is Double importing Tensorflow graph (#2721)

* feature: add byte supports for DnnTensor (intel#2751)

* feat: add byte supports for DnnTensor

* [New Feature] Calculating Scales (#2750)

* [New Feature]Calculating Scales

* recursively update mask for container module (intel#2754)

* recursively update mask for container module

* [Enhancement] - Speed up BlasWrapper performance under MKL-DNN (intel#2748)

* add parallel in Blaswrapper

* refactor to support ssd

* meet pr comments

* fix logger serialize

* Loss Function docs improvement (intel#2757)

* Improve Loss Function docs v2

* change asInstanceOf to toDistirbuted in optimizer (#2755)

* change asInstanceOf to toDistirbuted

* change asInstanceOf to toDistirbuted

* convert scale in blas to dnn (#2758)

* convert scale in blas to dnn

* meet pr comment

* feat: reorder for int8 supports (#2756)

1. Because the new data type, we should add a new attribute called dataType
    to the `MemoryData`.
2. Because we should transfer the scales between FP32->int8 and Int8->FP32.
    we should add two new attributes called `mask` and `scales`.

* fix conversion accuracy (intel#2760)

*  fix accuracy for saved model

* exclude mkldnn model when conversion

* feature: layer wise supports of int8 (intel#2762)

Enable the int8 data type in layers, especially for convolutions.
So for a specific layer, it can accept a int8 input. If you want to the fp32
output, should add a reorder.

* feature: mkldnn int8 layer wise supports (intel#2759)

including 3 steps.

1. generate scales of model.
   need an api like `generateScalesWithMask` to generate the scales of
   fp32 model. and the model returned is an fp32 model too.
2. quantize the model
   the `quantize()` api will be compatible with the `bigquant`
   backend, which will set the quantize flag. And when doing compile,
   the quantized weight, output, input will be generated by mkldnn at
   runtime.
3. do the inference (forward).

* update readme for v1 training (intel#2763)

* update release doc for preparation (intel#2764)

* change some docs about mkldnn (intel#2765)

* add comments about mkldnn

* meet pr comments

* examples for int8 (intel#2761)

This is an example of how to use mkldnn int8. There're two steps, use
GenInt8Scales to generate the scales first and save the new model. And than you
can use the quantized model as usual.

* enable fustion by default (intel#2766)

* fix: the influence of default value of fusion (#2768)

* fix: use too much memory of mkldnn models (intel#2783)

* fix: inplace of input/output and weight dimension error (intel#2779)

Some layer's input and output use the same memory. We can't do forward in the
`calcScales`. Because at that time, the input has been changed, its scales maybe
not right. Such as,

Seqeuntail().add(Conv).add(ReLU)

it will do two steps, seq.forward(input) first. and when go into the ReLU, it
will do another forward, so the input will be the output. And scales will be
wrong.

For convolution's weight, the dimension always is 5, although the group number
is 1. But for dnn convolution, if there's no group, the weight's dimension
should be 4.

* fix: the blas wrapper has no scales (intel#2778)

* fix softmax (intel#2777)

* fix: performance regression on resnet50 (intel#2774)

the u8 to s8 or s8 to u8 needs no reorder on this case.

* fix log init (#2781)

* fix: dropout should init primitive (#2789)

* Docs update for spark 2.3, build 0.7 and deps exlude (intel#2671)

* flip to 0.9.0 (intel#2792)

* Improve Layer documentation v1 (#2767)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Loss Function docs improvement v1

* Improve Loss Function docs v2

* Improve Layers documentation

* Improve documentation on Activations

* minor fix

* Update a code section with python style on Metrics.md (intel#2665)

* [Fix] doc : some changes for scalaUserGuide and release links according to … (intel#2791)

* doc : some changes for scalaUserGuide and release links according to v0.8.0 release

* Update build-bigdl-core.md

* Update build-bigdl-core.md

* test: should compare the right grad input (intel#2794)

* fix the wrong error message (#2800)

* [New feature] Add attention layer and ffn layer (intel#2795)

* add attention layer

* add ffn layer and more unit tests

* refactor according to pr comments

* add SerializationTest

* fix unit tests

* add python api

* update readme with newly adopted mkl-dnn (#2803)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer (intel#2802)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer:
Add LARS optimizer: Layer-wise scaled. Also with utility functions to build a set of LARS optim for a container.

Bug fix: The gradient block id of AllReduceParameter is originally composed of {id}{pidTo}gradientBytes{pidFrom}. But the combination of {id}{pidTo} will cause ambiguity. e.g., "112" can be {1}{12} or {11}{2}. Now a "_" is added to separate id from pidTo

* refine documents, correctly set the lrSchedulerOwner bit

* format the added code

* make Lars inherit SGD

* rename Lars -> LarsSGD and reformat

* style changes

* bugfix - set mask for container (intel#2807)

* bugfix - set mask for container

* bugfix #2805: set dimension mask

* Update Graph.scala

* Update Graph.scala

* change set mask indicator's name

* rename set mask params

* [Enhancement]: Scala Reflection: get default value for constructor parameters (intel#2808)

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* resolve conflict

* resolve conflict

* code style check

* remove print

* fix typos

fix typos

* replace randomcropper with centercrop for better performance (#2818)

* fix: memory data hash code should contain data type (intel#2821)

* Optimize backward graph generation and CAddTable (intel#2817)

* Optimize backward graph generation and caddtable

* refine add table

* change api name

* add layer norm and expand size layers (#2819)

* add layer norm and expand size

* meet pr comments

* feat: enable global average pooling (intel#2823)

* feat: enable global average pooling

* test: add more unit tests

* Optimizers: use member variable in parent class

* Revert "Optimizers: use member variable in parent class"

This reverts commit 7e47204

* Dilation in MKL-DNN Convolution (intel#2815)

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* fix typos

fix typos

* make todo all uppercase

* fix: calculate arbitrary mask of scales (intel#2822)

* Use one AllReduceParameter for multi-optim method  training (intel#2814)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* change random seed in UT

* [New feature] add transformer layer (intel#2825)

* add transformer

* refactor class name

* use same embedding for translation

* fix pr comments

* [Bug Fix] Fix Issue 2734 (#2816)

* fix issue 2734

* fix issue 2734

* fix issue 2734

* [Refactor] Reflection Utilization (#2831)

* refactor reflection utils

* refactor reflection utils

* feat: MKLDNN LSTM unidirectional/bidirectional inference support (intel#2806)

* LSTM draft

* MKLDNN LSTM fixed MD

* added hiddenSize

* setMemoryData NativeData

* weights NativeData format set to ldigo, all 1 test passed

* fixed format any problem

* LSTM weights bias initialisation

* add LSTM2 in nn

* Bidirectional LSTM inference enabled

* modified Bidirectional test

* LSTMSpec input format conversion bug between bigdl and mkldnn fixed, not support random weights, bias

* fixed the last problem 1 3 2 4

* Three inference tests with randomly generated parameters

* Added comments and modified the LSTMSpec (tests using Equivalent.nearequals)

* Deleted nn/LSTM2. Renamed methods. Added a requirement in nn/TimeDistributed

* combined initMemoryDescs() into initFwdPrimitives()

* Add require for input size and hidden size matching if layers of LSTM is more than one

* Refactor RNN

* Add comment on gate order to mkldnn/RNN

* Add unidirectional multilayer test

* add comments/ modify UTs

* phase is not used anymore/ use isTraining() in stead

* operationWant enhanced/ weight init/ release() parameters()

* remove input format check and change some variables names

* input format check / throw exception print info / release code

* comment style and RNNSerialTest

* remove unnecessary comments

* Softmax -> SoftMax (#2837)

* bug fix for cmul (intel#2836)

* bug fix for cmul

* meet pr comments

* set new storage to weight and bias for weight fusion (intel#2839)

* Add parameter processor for LARS (#2832)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* implement lars whole layer gradient norm calculation

* change random seed in UT

* add limitation on "trust" of LARS, remove debug output

* reformat

* add tests in DirtriOptimizer for LARS

* reformat

* update parameters in UT

* update parameters in UT

* Add transformer to LM example (intel#2835)

* add transformer to LM example

* refactor dropout in Transformer

* meet pr comments

* feat: MKLDNN LSTM unidirectional/bidirectional backward support (#2840)

* MKLDNN LSTM backward support with accuracy testing

* fix: require consistent between shape and layout of mkldnn (intel#2824)

* fix: fusion for multi-group of convolution (intel#2826)

* fix: support int8 of jointable (#2827)

* fix: support int8 of jointable
* doc: add more docs

* fix: invokeAndWait2 should throw the exception in the tasks (intel#2843)

* fix acc bug & init dnn thread (intel#2841)

* support tnc and ntc conversion (intel#2844)

* support ntc in dnn layer (intel#2847)

* support ntc in dnn layer

* meet pr comments

* [WIP]Add beam search feature in transformer model (intel#2834)

* add beam search feature

* Update beam search feature and unit test

* add symbolToLogits function set check

* update clearState and add serial test

* add SequenceBeamSearch to python layers

* add createSequenceBeamSearch method to python api

* feat: add a property to disable omp thread affinity (intel#2849)

* fix: use treeset to calc topk to upgrade the performance of DetectionOutputSSD (intel#2853)

* fix: wrong affinity settings (intel#2857)

* update beam search feature for interface with transformer model (#2855)

* update beam search for padding value and cache structure

* update python API for beam search

* add comments and update python layer

* modify comments format

* modify comments format

* Support converting blas lstm to dnn lstm (#2846)

* convert from blas lstm to dnn lstm

* meet pr comments

* fix load lstm error bug (intel#2858)

* Add beam search in transformer (intel#2856)

* Add beam search in transformer

* meet pr comments

* fix: upgrade the performance of normalize (intel#2854)

* feat: add axis to softmax (intel#2859)

* add release doc for 0.9 (intel#2862)

* fix: update core ref to master (intel#2865)

* flip version to 0.10.0 (intel#2869)

* [Bug Fix] - Fix module version comparison  (intel#2871)

* update serialization

* update serialization

* convert IRgraph momentum to mkldnn (intel#2872)

* tutorial fix (intel#2879)

* feat: RoiAlign Forward (intel#2874)

* Add set input output format API in Python (intel#2880)

* add set input output format

* add static graph check

* feat: Feature Pyramid Networks Forward (intel#2870)

* fix memory leak for ir graph training (intel#2895)

* add gemm layer (#2882)

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add gemm layer

* add gemm layer

* add Shape layer (intel#2885)

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add Gather layer (intel#2897)

* add gather layer

* [New feature] Add maskhead (intel#2892)

* support for maskhead

* fix unit tests (intel#2905)

* modify  predict/predictClass function  (#2868)

* predictClass output modification

* predict/predictClass function modification in Beta Api

* predict/predictClass function modification

* predict/predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* [New feature] Add Boxhead (intel#2894)

* add boxhead

* add SerialTest

* meet pr comments

* fix: Add TopBlocks to Feature Pyramid Networks (FPN) (#2899)

* Add Mean Average Precision validation method (intel#2906)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* fix boxhead unit tests (#2912)

* python api nested list input and pooler python api (intel#2900)

* Auto memory management for MKLDNN (#2867)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* style fixes

* change _implicitMemoryOwner -> _this

* [New feature] Add region proposal (intel#2896)

* add Regionproposal

* [New feature] add maskrcnn (#2908)

* add maskrcnn

* fix mask head

* move maskrcnn to models

* add maskrcnn serialTest

* Add Onnx Supported Layers (intel#2902)

* remove duplicated layers

* Update RoiLabel class and add RoiImageFeatureToBatch (intel#2913)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* update RoiLabel, add RoiImageFeatureToBatch

* fix typo in class name

* updates by suggestions

* minor updates

* Move RoiMiniBatch to MTImageFeatureToBatch.scala

* mask in RoiLabel now have Floats not Bytes

* use IndexedSeq for RoiLabel

* style fix

* add isCrowd and origSize to final target table

* style fix

* isCrowd change to float, add doc

* add tests and bug fixes

* add util getting RoiLabels from ImageFeatures

* add util getting RoiLabels from Table

* comment out the tests

* rename utils in RoiLabel

* feat: MKLDNN GRU forward/backward support (#2893)

* Onnx support: modify unsqueeze function (#2910)

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* add maskutils (intel#2921)

* add maskutils

* update tests & docs

* fix typo in document

* Fix memory leaks on training (intel#2914)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* fix memory leak in batch norm

* style fixes

* change _implicitMemoryOwner -> _this

* release submat

* release opencv submats

* support samples with different size  to one mini batch (intel#2929)

* add to batch with resize

* meet comments

* support batch for mask head and pooler (intel#2926)

* support batch for mask head

* meet comments

* Onnx support: add a dim parameter to ops.Gather (intel#2920)

* add dim parameter to ops.Gather

* improve and simplify code

* improve and simplify code

* improve and simplify code

* improve and simplify code

* support batch for regionproposal (#2928)

* support batch for regionproposal

* enable gru blas-to-dnn conversion (intel#2930)

* Onnx support: add pos parameter to softmax (intel#2933)

* add pos parameter to softmax

* add pos parameter to softmax

* add pos parameter to softmax

* fix review problem

* fix review problem

* Add resize for segmentation (intel#2923)

* add resize for segmentation

* meet pr comments

* support batch input for boxhead (#2924)

* boxhead support batch input

* meet pr comments

* COCO SeqFile (intel#2927)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* ignore non-existing images

* updates based on GH comments

* ONNX Support (#2918)

* onnx dev

* add onnx loader

* clean up

* feat: add precision recall auc (#2941)

* feat: add precision recall auc

* add post processing for maskrcnn model (#2931)

* add mask postprocessing

* put image info to mask model

* fix TimeDistributedCriterion() lack of parameter of dimension issue (intel#2940)

* revert back api (intel#2943)

* fix: softmax and bn+scale fusion (intel#2937)

* feat: multi models support with MKL-DNN backend (intel#2936)

* feat: multi models support with MKL-DNN backend

* add COCO MAP (#2935)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* add COCO MAP

* revert merge conflict

* ignore non-existing images

* add IOU related API. MAP now parses RLEs

* BBox now inclusive

* updates based on GH comments

* add COCODataset.getImageById

* COCO topK default => -1, remove height: Int, width: Int in GroundTruthRLE

* update imageId2Image

* rename MAPObjectDetection utils, add cocoSegmentationAndBBox, refine formatting

* rename utils

* update documents

* check size of bbox & classes & scores & labels & iscrowd. Handle empty predictions

* add gt and target image size checking, add support for empty target bbox, add UT

* detection sorted before matching with GT. Optimize MAPResult merging. Add UT for merging

* COCO Seq file reader: grey to bgr (intel#2942)

* grey to bgr

* refactor isGrayScaleImage

* simplify grey scale image checking

* Add the flushing denormal values option on BigDL side (#2934)

* add no argument apply api for softmax (intel#2945)

* add no argument apply api for softmax

* add no argument apply api for softmax

* ONNX ResNet example (intel#2939)

* add onnx resnet example

* add doc for onnx

* add doc for onnx

* clean up

* add maskrcnn inference example (intel#2944)

* add maskrcnn inference example

* meet pr comments

* add model download url

* Update the RoiLabel and MTImageFeatureToBatch (intel#2925)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* Python MKLDNN examples for CNN(LeNet) and RNN(LSTM) (#2932)

* fix: takeSample only works for dnn backend and get one batch (intel#2947)

* fix: takeSample only works for dnn backend and get one batch

* edit doc (#2948)

* Rename filesToRoiImageFrame to filesToRoiImageFeatures (intel#2949)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* filesToRoiImageFrame -> filesToRoiImageFeatures, to public

* fix: move out setMklThreads of MklDnn (intel#2950)

* memory data cleanup (#2956)

* memory data cleanup

* Onnx support: RoiAlign and TopK parameter update (#2957)

* Topk add dim and increase parameter

* RoiAlign add max pooling mode

* add test cases

* add test cases

* remove masks requirements (intel#2959)

* fix: the squeeze should not be included in IRElement (intel#2962)

* enhance COCODataset (#2954)

* enhance COCODataset:
Add COCODataset.loadFromSeqFile
Add COCODataset.toImageFeatures
Add COCOImage.toTable

* rename and polish doc

* fix COCO serialize bug

* fix typo in function name

* typo fix (intel#2965)

* rename RoiImageFeatureToBatch APIs (#2964)

* RoiMiniBatch enhancement (#2953)

* SerializableIndexedSeq

* allow empty target & image size info

* rename RoiImageFeatureToBatch APIs

* set as private

* change back to array

* MTImageFeatureToBatch without labels

* handle iscrowd

* remove duplication in merge

* feat: add softmax backward (intel#2967)

* feat: add softmax backward

* fix: fuse bn scale and relu to bn. (intel#2966)

* fix: fuse bn scale and relu.

* fix mask unit tests (intel#2973)

* fix: nms stability when using treeset. (intel#2972)

* flip version to 0.11 (intel#2974)

* refactor anchor generator (#2963)

* refactor anchor generator

* meet pr comments

* fix code style

* ROIAlign refactor (intel#2960)

* ROIAlign refactor

* fix unit tests

* fix model load of maskrcnn (intel#2961)

* fix maskrcnn model load

* delete temp file

* fix maskrcnn tests

* support roialign backward (intel#2975)

* support roialign backward

* fix sparselinear unit test

* fix: bn nhwc error, the channel should be the last dim (#2981)

* refactor: move torch relevants unit tests to integration tests. (intel#2971)

* fix: enable integration accuracy tests (intel#2976)

* fix: softmax dnn backend wrong order of primitive (intel#2986)

* modify TextClassifier.scala (#2987)

* Add a method to merge nested StaticGraphs (intel#2985)

* NHWC support when running with MKL-DNN (#2989)

* support NHWC for MKLDNN

* fix unit tests

* Keras with MKL-DNN backend support (#2990)

* Update README.md

* Update README.md

* feat: add distri optimizer v2 (intel#2992)

* update error message in AllReduceParameter (#2997)

* update error message in AllReduceParameter

* use tensorflow proto jar (#2994)

* fix callBigDLFunc (intel#3002)

* Remove final for AbstractModule (intel#3001)

* DistriOptimizerV2 argument (intel#3003)

* call DistriOptimizerV2

* fix inception (intel#3010)

* fix top1 and treenn (intel#3011)

* remove final setExtraParameters (#3014)

* move pretrain in DistriOptimizerV2 (intel#3016)

* move getData

* rename

* remove time counting

* deprecate dlframe (intel#3012)

* deprecate dlframe

* fix throughput (#3017)

* fix throughput

* update

* add release doc for 0.10.0 (intel#3020)

* test examples by distrioptimizerv2 (intel#3007)

* enable scala examples by distrioptimizerv2

* update example's readme

* update integration test

* test python examples by distriOptimizerV2 (intel#3008)

* Test python examples by distriOptimizerV2

* deprecate nn.keras (intel#3013)

* deprecate nn.keras

* fix loss when minibatch size is different (intel#3021)

* fix loss

* fix ut

* fix style check (intel#3022)

* specify pyspark version (intel#3030)

* specify pyspark version

* add release doc for 0.11 (#3026)

* flip version to 0.12 (intel#3029)



* update

* fix KerasLayer new parameters() (#3034)

* Fix analytics zoo protobuf shading problem (intel#3033)

* change shade name and remove protobuf-java (already introduced by tf)

* remove protobuf

* add required dependencies (#3047)

* update doc (intel#3056)

* Updatedoc (#3060)

* Update install-from-pip.md

* [WIP] spark 3.0 (intel#3054)

* spark 3.0

* add spark3.0 deployment (intel#3061)

* add spark3.0 deployment

* add warning to remind Optimizer() deprecates (intel#3062)

* add warning to remind deprecates

* Update scala maven plugin (#3068)

* update scala maven plugin

* change to public (#3064)

* Add big model support (#3067)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* squeeze target dimension (corner case) in ClassNLLCriterion (intel#3072)

* fix target dimension match error

* update message (#3073)

* flip version to 0.13-snapshot (intel#3074)

* flip version to 0.13-snapshot

* Uncompressed Tensor  (intel#3079)

* support no compressing parameter

* address comments

* hotfix ClassNLLCriterion with cloned target (#3081)

* hotfix ClassNLLCriterion with cloned target

* Fix SerializationUtils clone issue of QuantizedTensor (intel#3088)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* update clone quantizedtensor

* update

* add OptimPredictorShutdownSpec UT in integration test (#3089)

* move integration UT to a general test script (intel#3094)

* back port master (intel#3096)

* set seed to avoid random error in PredictionServiceUT (intel#3097)

* Jdk11 support (intel#3098)

* update for jdk 11 support and doc

* add serializeUid (intel#3099)

* update doc (intel#3104)

* add doc for running in ide (intel#3106)

* fix callBigDLFunc return a Int while the true return value from java is a byte array. (intel#3111)

* add list of df support (intel#3113)

* Update readme (intel#3118)

* Update index.md

* add 0.12.2 release download (#3122)

* remove DLFrames (intel#3124)

* remove DLFrames

* update

* update

* update

* rm dlframe example from test script

* Add Utest about dividing zero (#3128)

* Add Utest about dividing zero

* add Utest and zero check of LocalData

* add Utest and zero check of LocalData

* change

* Add Utest about dividing zero

* fix test

* add python3 to Dockerfile (intel#3132)

* add python3 to Dockerfile

* update

* update jdk

* update

* make default DistriOptimizer as V2 (intel#3129)

* make default DistriOptimizer as V2

* update

* fix dlframe (intel#3133)

* DistriOptimizerV2 logger (intel#3135)

* DistriOptimizerV2 logger

* update

* fix style check

* validate epoch num

* move dlframe SharedParamsApater to AZ and roll back to OptimizerV1 (intel#3137)

* upgrade spark version (intel#3138)

* Update deploy-spark2.sh

* 0.13 release doc (#3144)

* upgrade log4j (intel#3141)

* flip0.14 (intel#3142)

* flip0.14

* update

* Update deploy-spark3.sh (#3145)

* update

* update

* update

* update

* fix make dist

* migrate path

* update

* update

Co-authored-by: zhangxiaoli73 <[email protected]>
Co-authored-by: Jerry Wu <[email protected]>
Co-authored-by: Xin Qiu <[email protected]>
Co-authored-by: Yanzhang Wang <[email protected]>
Co-authored-by: GenBrg <[email protected]>
Co-authored-by: LeicongLi <[email protected]>
Co-authored-by: Emiliano Martinez <[email protected]>
Co-authored-by: abdolence <[email protected]>
Co-authored-by: Enrique Garcia <[email protected]>
Co-authored-by: Louie Tsai <[email protected]>
Co-authored-by: yaochi <[email protected]>
Co-authored-by: Menooker <[email protected]>
Co-authored-by: Menooker <[email protected]>
Co-authored-by: Firecrackerxox <[email protected]>
Co-authored-by: majing921201 <[email protected]>
Co-authored-by: jenniew <[email protected]>
Co-authored-by: Xiao <[email protected]>
Co-authored-by: Firecrackerxox <[email protected]>
Co-authored-by: Hui Li <[email protected]>
Co-authored-by: Jason Dai <[email protected]>
Co-authored-by: dding3 <[email protected]>
Co-authored-by: Yang Wang <[email protected]>
Co-authored-by: Yina Chen <[email protected]>
Co-authored-by: Hangrui Cao <[email protected]>
Co-authored-by: pinggao18 <[email protected]>
Le-Zheng added a commit to Le-Zheng/analytics-zoo that referenced this pull request Sep 17, 2021
* convert static graph to IR graph and build (intel#2711)

* add static graph to IR graph

* meet pr comments

* [Enhancement] - Enhance unig test to avoid dynamic resource allocation issue by docker (intel#2713)

* make the core number fixed

* fix local predictor

* add Trigger and/or python API (intel#2682)

* add spark 2.4 support (intel#2715)

* update sparse tensor's document (#2714)

* Reserve all state in OptimMethod when calling Optimizer.optimize() multiple times (#2648)

* reserve optimMethod for each worker

* add valdiation throughput

* cache variable previousOptim

* fix: move mkldnn computing to a single thread pool (intel#2724)

Because if we use the parent thread directly, there will be two bugs,
1. The child threads forked from parent thread will be bound to core 0
because of the affinity settings.
2. The native thread has some unknown thread local variables. So if
the parent thread exits and is recreated, such as the thread from
Executors.newFixedThreadPool. The whole app will be segment fault.
The parent thread means the main thread (Local Mode) or worker thread of
mapPartition (Distributed Mode).

* add ceilMode for Pooling & fix batchNorm evaluate (#2708)

* add ceilMode for Pooling & fix batchNorm evaluate

* add training status for dnn layer

* fix comments

* fix IRGraph init & Add regualizer (#2736)

* fix IRGraph init & Add regualizer

* meet review comments

* fix: update mkldnn version to v0.17 issues. (intel#2712)

There're two issues,

1. the padding tensor required. mkl-dnn will use a padding tensor which
    will use more memory, such as 4x1x28x28 to 4x8x28x28(avx2). It will
    pad to times of simd width.
2. the TensorMMap between DenseTensor and DnnTensor. Previous impl
    will allocate DnnTensor when model is created, which will cost too much
    space. So this patch will allocate it at runtime.

* add computshape for some layers and add skip primitives in DnnGraph (intel#2740)

* add computshape for some layer and add skip primitives in DnnGraph

* meet pr comments

* Improve documentation (intel#2745)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* include edge case to cover all the data types (#2742)

* layer auto fusion for dnn graph (intel#2746)

* add auto fusion in dnn graph

* refactor predict for dnn model (intel#2737)

* refactor predict for dnn model

* remove some unit tests (intel#2752)

* remove some conflict tests (#2753)

* Update documentation (intel#2749)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Fix Add operation error when type is Double importing Tensorflow graph (#2721)

* feature: add byte supports for DnnTensor (intel#2751)

* feat: add byte supports for DnnTensor

* [New Feature] Calculating Scales (#2750)

* [New Feature]Calculating Scales

* recursively update mask for container module (intel#2754)

* recursively update mask for container module

* [Enhancement] - Speed up BlasWrapper performance under MKL-DNN (intel#2748)

* add parallel in Blaswrapper

* refactor to support ssd

* meet pr comments

* fix logger serialize

* Loss Function docs improvement (intel#2757)

* Improve Loss Function docs v2

* change asInstanceOf to toDistirbuted in optimizer (#2755)

* change asInstanceOf to toDistirbuted

* change asInstanceOf to toDistirbuted

* convert scale in blas to dnn (#2758)

* convert scale in blas to dnn

* meet pr comment

* feat: reorder for int8 supports (#2756)

1. Because the new data type, we should add a new attribute called dataType
    to the `MemoryData`.
2. Because we should transfer the scales between FP32->int8 and Int8->FP32.
    we should add two new attributes called `mask` and `scales`.

* fix conversion accuracy (intel#2760)

*  fix accuracy for saved model

* exclude mkldnn model when conversion

* feature: layer wise supports of int8 (intel#2762)

Enable the int8 data type in layers, especially for convolutions.
So for a specific layer, it can accept a int8 input. If you want to the fp32
output, should add a reorder.

* feature: mkldnn int8 layer wise supports (intel#2759)

including 3 steps.

1. generate scales of model.
   need an api like `generateScalesWithMask` to generate the scales of
   fp32 model. and the model returned is an fp32 model too.
2. quantize the model
   the `quantize()` api will be compatible with the `bigquant`
   backend, which will set the quantize flag. And when doing compile,
   the quantized weight, output, input will be generated by mkldnn at
   runtime.
3. do the inference (forward).

* update readme for v1 training (intel#2763)

* update release doc for preparation (intel#2764)

* change some docs about mkldnn (intel#2765)

* add comments about mkldnn

* meet pr comments

* examples for int8 (intel#2761)

This is an example of how to use mkldnn int8. There're two steps, use
GenInt8Scales to generate the scales first and save the new model. And than you
can use the quantized model as usual.

* enable fustion by default (intel#2766)

* fix: the influence of default value of fusion (#2768)

* fix: use too much memory of mkldnn models (intel#2783)

* fix: inplace of input/output and weight dimension error (intel#2779)

Some layer's input and output use the same memory. We can't do forward in the
`calcScales`. Because at that time, the input has been changed, its scales maybe
not right. Such as,

Seqeuntail().add(Conv).add(ReLU)

it will do two steps, seq.forward(input) first. and when go into the ReLU, it
will do another forward, so the input will be the output. And scales will be
wrong.

For convolution's weight, the dimension always is 5, although the group number
is 1. But for dnn convolution, if there's no group, the weight's dimension
should be 4.

* fix: the blas wrapper has no scales (intel#2778)

* fix softmax (intel#2777)

* fix: performance regression on resnet50 (intel#2774)

the u8 to s8 or s8 to u8 needs no reorder on this case.

* fix log init (#2781)

* fix: dropout should init primitive (#2789)

* Docs update for spark 2.3, build 0.7 and deps exlude (intel#2671)

* flip to 0.9.0 (intel#2792)

* Improve Layer documentation v1 (#2767)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Loss Function docs improvement v1

* Improve Loss Function docs v2

* Improve Layers documentation

* Improve documentation on Activations

* minor fix

* Update a code section with python style on Metrics.md (intel#2665)

* [Fix] doc : some changes for scalaUserGuide and release links according to … (intel#2791)

* doc : some changes for scalaUserGuide and release links according to v0.8.0 release

* Update build-bigdl-core.md

* Update build-bigdl-core.md

* test: should compare the right grad input (intel#2794)

* fix the wrong error message (#2800)

* [New feature] Add attention layer and ffn layer (intel#2795)

* add attention layer

* add ffn layer and more unit tests

* refactor according to pr comments

* add SerializationTest

* fix unit tests

* add python api

* update readme with newly adopted mkl-dnn (#2803)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer (intel#2802)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer:
Add LARS optimizer: Layer-wise scaled. Also with utility functions to build a set of LARS optim for a container.

Bug fix: The gradient block id of AllReduceParameter is originally composed of {id}{pidTo}gradientBytes{pidFrom}. But the combination of {id}{pidTo} will cause ambiguity. e.g., "112" can be {1}{12} or {11}{2}. Now a "_" is added to separate id from pidTo

* refine documents, correctly set the lrSchedulerOwner bit

* format the added code

* make Lars inherit SGD

* rename Lars -> LarsSGD and reformat

* style changes

* bugfix - set mask for container (intel#2807)

* bugfix - set mask for container

* bugfix #2805: set dimension mask

* Update Graph.scala

* Update Graph.scala

* change set mask indicator's name

* rename set mask params

* [Enhancement]: Scala Reflection: get default value for constructor parameters (intel#2808)

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* resolve conflict

* resolve conflict

* code style check

* remove print

* fix typos

fix typos

* replace randomcropper with centercrop for better performance (#2818)

* fix: memory data hash code should contain data type (intel#2821)

* Optimize backward graph generation and CAddTable (intel#2817)

* Optimize backward graph generation and caddtable

* refine add table

* change api name

* add layer norm and expand size layers (#2819)

* add layer norm and expand size

* meet pr comments

* feat: enable global average pooling (intel#2823)

* feat: enable global average pooling

* test: add more unit tests

* Optimizers: use member variable in parent class

* Revert "Optimizers: use member variable in parent class"

This reverts commit 7e47204

* Dilation in MKL-DNN Convolution (intel#2815)

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* fix typos

fix typos

* make todo all uppercase

* fix: calculate arbitrary mask of scales (intel#2822)

* Use one AllReduceParameter for multi-optim method  training (intel#2814)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* change random seed in UT

* [New feature] add transformer layer (intel#2825)

* add transformer

* refactor class name

* use same embedding for translation

* fix pr comments

* [Bug Fix] Fix Issue 2734 (#2816)

* fix issue 2734

* fix issue 2734

* fix issue 2734

* [Refactor] Reflection Utilization (#2831)

* refactor reflection utils

* refactor reflection utils

* feat: MKLDNN LSTM unidirectional/bidirectional inference support (intel#2806)

* LSTM draft

* MKLDNN LSTM fixed MD

* added hiddenSize

* setMemoryData NativeData

* weights NativeData format set to ldigo, all 1 test passed

* fixed format any problem

* LSTM weights bias initialisation

* add LSTM2 in nn

* Bidirectional LSTM inference enabled

* modified Bidirectional test

* LSTMSpec input format conversion bug between bigdl and mkldnn fixed, not support random weights, bias

* fixed the last problem 1 3 2 4

* Three inference tests with randomly generated parameters

* Added comments and modified the LSTMSpec (tests using Equivalent.nearequals)

* Deleted nn/LSTM2. Renamed methods. Added a requirement in nn/TimeDistributed

* combined initMemoryDescs() into initFwdPrimitives()

* Add require for input size and hidden size matching if layers of LSTM is more than one

* Refactor RNN

* Add comment on gate order to mkldnn/RNN

* Add unidirectional multilayer test

* add comments/ modify UTs

* phase is not used anymore/ use isTraining() in stead

* operationWant enhanced/ weight init/ release() parameters()

* remove input format check and change some variables names

* input format check / throw exception print info / release code

* comment style and RNNSerialTest

* remove unnecessary comments

* Softmax -> SoftMax (#2837)

* bug fix for cmul (intel#2836)

* bug fix for cmul

* meet pr comments

* set new storage to weight and bias for weight fusion (intel#2839)

* Add parameter processor for LARS (#2832)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* implement lars whole layer gradient norm calculation

* change random seed in UT

* add limitation on "trust" of LARS, remove debug output

* reformat

* add tests in DirtriOptimizer for LARS

* reformat

* update parameters in UT

* update parameters in UT

* Add transformer to LM example (intel#2835)

* add transformer to LM example

* refactor dropout in Transformer

* meet pr comments

* feat: MKLDNN LSTM unidirectional/bidirectional backward support (#2840)

* MKLDNN LSTM backward support with accuracy testing

* fix: require consistent between shape and layout of mkldnn (intel#2824)

* fix: fusion for multi-group of convolution (intel#2826)

* fix: support int8 of jointable (#2827)

* fix: support int8 of jointable
* doc: add more docs

* fix: invokeAndWait2 should throw the exception in the tasks (intel#2843)

* fix acc bug & init dnn thread (intel#2841)

* support tnc and ntc conversion (intel#2844)

* support ntc in dnn layer (intel#2847)

* support ntc in dnn layer

* meet pr comments

* [WIP]Add beam search feature in transformer model (intel#2834)

* add beam search feature

* Update beam search feature and unit test

* add symbolToLogits function set check

* update clearState and add serial test

* add SequenceBeamSearch to python layers

* add createSequenceBeamSearch method to python api

* feat: add a property to disable omp thread affinity (intel#2849)

* fix: use treeset to calc topk to upgrade the performance of DetectionOutputSSD (intel#2853)

* fix: wrong affinity settings (intel#2857)

* update beam search feature for interface with transformer model (#2855)

* update beam search for padding value and cache structure

* update python API for beam search

* add comments and update python layer

* modify comments format

* modify comments format

* Support converting blas lstm to dnn lstm (#2846)

* convert from blas lstm to dnn lstm

* meet pr comments

* fix load lstm error bug (intel#2858)

* Add beam search in transformer (intel#2856)

* Add beam search in transformer

* meet pr comments

* fix: upgrade the performance of normalize (intel#2854)

* feat: add axis to softmax (intel#2859)

* add release doc for 0.9 (intel#2862)

* fix: update core ref to master (intel#2865)

* flip version to 0.10.0 (intel#2869)

* [Bug Fix] - Fix module version comparison  (intel#2871)

* update serialization

* update serialization

* convert IRgraph momentum to mkldnn (intel#2872)

* tutorial fix (intel#2879)

* feat: RoiAlign Forward (intel#2874)

* Add set input output format API in Python (intel#2880)

* add set input output format

* add static graph check

* feat: Feature Pyramid Networks Forward (intel#2870)

* fix memory leak for ir graph training (intel#2895)

* add gemm layer (#2882)

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add gemm layer

* add gemm layer

* add Shape layer (intel#2885)

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add Gather layer (intel#2897)

* add gather layer

* [New feature] Add maskhead (intel#2892)

* support for maskhead

* fix unit tests (intel#2905)

* modify  predict/predictClass function  (#2868)

* predictClass output modification

* predict/predictClass function modification in Beta Api

* predict/predictClass function modification

* predict/predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* [New feature] Add Boxhead (intel#2894)

* add boxhead

* add SerialTest

* meet pr comments

* fix: Add TopBlocks to Feature Pyramid Networks (FPN) (#2899)

* Add Mean Average Precision validation method (intel#2906)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* fix boxhead unit tests (#2912)

* python api nested list input and pooler python api (intel#2900)

* Auto memory management for MKLDNN (#2867)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* style fixes

* change _implicitMemoryOwner -> _this

* [New feature] Add region proposal (intel#2896)

* add Regionproposal

* [New feature] add maskrcnn (#2908)

* add maskrcnn

* fix mask head

* move maskrcnn to models

* add maskrcnn serialTest

* Add Onnx Supported Layers (intel#2902)

* remove duplicated layers

* Update RoiLabel class and add RoiImageFeatureToBatch (intel#2913)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* update RoiLabel, add RoiImageFeatureToBatch

* fix typo in class name

* updates by suggestions

* minor updates

* Move RoiMiniBatch to MTImageFeatureToBatch.scala

* mask in RoiLabel now have Floats not Bytes

* use IndexedSeq for RoiLabel

* style fix

* add isCrowd and origSize to final target table

* style fix

* isCrowd change to float, add doc

* add tests and bug fixes

* add util getting RoiLabels from ImageFeatures

* add util getting RoiLabels from Table

* comment out the tests

* rename utils in RoiLabel

* feat: MKLDNN GRU forward/backward support (#2893)

* Onnx support: modify unsqueeze function (#2910)

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* add maskutils (intel#2921)

* add maskutils

* update tests & docs

* fix typo in document

* Fix memory leaks on training (intel#2914)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* fix memory leak in batch norm

* style fixes

* change _implicitMemoryOwner -> _this

* release submat

* release opencv submats

* support samples with different size  to one mini batch (intel#2929)

* add to batch with resize

* meet comments

* support batch for mask head and pooler (intel#2926)

* support batch for mask head

* meet comments

* Onnx support: add a dim parameter to ops.Gather (intel#2920)

* add dim parameter to ops.Gather

* improve and simplify code

* improve and simplify code

* improve and simplify code

* improve and simplify code

* support batch for regionproposal (#2928)

* support batch for regionproposal

* enable gru blas-to-dnn conversion (intel#2930)

* Onnx support: add pos parameter to softmax (intel#2933)

* add pos parameter to softmax

* add pos parameter to softmax

* add pos parameter to softmax

* fix review problem

* fix review problem

* Add resize for segmentation (intel#2923)

* add resize for segmentation

* meet pr comments

* support batch input for boxhead (#2924)

* boxhead support batch input

* meet pr comments

* COCO SeqFile (intel#2927)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* ignore non-existing images

* updates based on GH comments

* ONNX Support (#2918)

* onnx dev

* add onnx loader

* clean up

* feat: add precision recall auc (#2941)

* feat: add precision recall auc

* add post processing for maskrcnn model (#2931)

* add mask postprocessing

* put image info to mask model

* fix TimeDistributedCriterion() lack of parameter of dimension issue (intel#2940)

* revert back api (intel#2943)

* fix: softmax and bn+scale fusion (intel#2937)

* feat: multi models support with MKL-DNN backend (intel#2936)

* feat: multi models support with MKL-DNN backend

* add COCO MAP (#2935)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* add COCO MAP

* revert merge conflict

* ignore non-existing images

* add IOU related API. MAP now parses RLEs

* BBox now inclusive

* updates based on GH comments

* add COCODataset.getImageById

* COCO topK default => -1, remove height: Int, width: Int in GroundTruthRLE

* update imageId2Image

* rename MAPObjectDetection utils, add cocoSegmentationAndBBox, refine formatting

* rename utils

* update documents

* check size of bbox & classes & scores & labels & iscrowd. Handle empty predictions

* add gt and target image size checking, add support for empty target bbox, add UT

* detection sorted before matching with GT. Optimize MAPResult merging. Add UT for merging

* COCO Seq file reader: grey to bgr (intel#2942)

* grey to bgr

* refactor isGrayScaleImage

* simplify grey scale image checking

* Add the flushing denormal values option on BigDL side (#2934)

* add no argument apply api for softmax (intel#2945)

* add no argument apply api for softmax

* add no argument apply api for softmax

* ONNX ResNet example (intel#2939)

* add onnx resnet example

* add doc for onnx

* add doc for onnx

* clean up

* add maskrcnn inference example (intel#2944)

* add maskrcnn inference example

* meet pr comments

* add model download url

* Update the RoiLabel and MTImageFeatureToBatch (intel#2925)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* Python MKLDNN examples for CNN(LeNet) and RNN(LSTM) (#2932)

* fix: takeSample only works for dnn backend and get one batch (intel#2947)

* fix: takeSample only works for dnn backend and get one batch

* edit doc (#2948)

* Rename filesToRoiImageFrame to filesToRoiImageFeatures (intel#2949)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* filesToRoiImageFrame -> filesToRoiImageFeatures, to public

* fix: move out setMklThreads of MklDnn (intel#2950)

* memory data cleanup (#2956)

* memory data cleanup

* Onnx support: RoiAlign and TopK parameter update (#2957)

* Topk add dim and increase parameter

* RoiAlign add max pooling mode

* add test cases

* add test cases

* remove masks requirements (intel#2959)

* fix: the squeeze should not be included in IRElement (intel#2962)

* enhance COCODataset (#2954)

* enhance COCODataset:
Add COCODataset.loadFromSeqFile
Add COCODataset.toImageFeatures
Add COCOImage.toTable

* rename and polish doc

* fix COCO serialize bug

* fix typo in function name

* typo fix (intel#2965)

* rename RoiImageFeatureToBatch APIs (#2964)

* RoiMiniBatch enhancement (#2953)

* SerializableIndexedSeq

* allow empty target & image size info

* rename RoiImageFeatureToBatch APIs

* set as private

* change back to array

* MTImageFeatureToBatch without labels

* handle iscrowd

* remove duplication in merge

* feat: add softmax backward (intel#2967)

* feat: add softmax backward

* fix: fuse bn scale and relu to bn. (intel#2966)

* fix: fuse bn scale and relu.

* fix mask unit tests (intel#2973)

* fix: nms stability when using treeset. (intel#2972)

* flip version to 0.11 (intel#2974)

* refactor anchor generator (#2963)

* refactor anchor generator

* meet pr comments

* fix code style

* ROIAlign refactor (intel#2960)

* ROIAlign refactor

* fix unit tests

* fix model load of maskrcnn (intel#2961)

* fix maskrcnn model load

* delete temp file

* fix maskrcnn tests

* support roialign backward (intel#2975)

* support roialign backward

* fix sparselinear unit test

* fix: bn nhwc error, the channel should be the last dim (#2981)

* refactor: move torch relevants unit tests to integration tests. (intel#2971)

* fix: enable integration accuracy tests (intel#2976)

* fix: softmax dnn backend wrong order of primitive (intel#2986)

* modify TextClassifier.scala (#2987)

* Add a method to merge nested StaticGraphs (intel#2985)

* NHWC support when running with MKL-DNN (#2989)

* support NHWC for MKLDNN

* fix unit tests

* Keras with MKL-DNN backend support (#2990)

* Update README.md

* Update README.md

* feat: add distri optimizer v2 (intel#2992)

* update error message in AllReduceParameter (#2997)

* update error message in AllReduceParameter

* use tensorflow proto jar (#2994)

* fix callBigDLFunc (intel#3002)

* Remove final for AbstractModule (intel#3001)

* DistriOptimizerV2 argument (intel#3003)

* call DistriOptimizerV2

* fix inception (intel#3010)

* fix top1 and treenn (intel#3011)

* remove final setExtraParameters (#3014)

* move pretrain in DistriOptimizerV2 (intel#3016)

* move getData

* rename

* remove time counting

* deprecate dlframe (intel#3012)

* deprecate dlframe

* fix throughput (#3017)

* fix throughput

* update

* add release doc for 0.10.0 (intel#3020)

* test examples by distrioptimizerv2 (intel#3007)

* enable scala examples by distrioptimizerv2

* update example's readme

* update integration test

* test python examples by distriOptimizerV2 (intel#3008)

* Test python examples by distriOptimizerV2

* deprecate nn.keras (intel#3013)

* deprecate nn.keras

* fix loss when minibatch size is different (intel#3021)

* fix loss

* fix ut

* fix style check (intel#3022)

* specify pyspark version (intel#3030)

* specify pyspark version

* add release doc for 0.11 (#3026)

* flip version to 0.12 (intel#3029)



* update

* fix KerasLayer new parameters() (#3034)

* Fix analytics zoo protobuf shading problem (intel#3033)

* change shade name and remove protobuf-java (already introduced by tf)

* remove protobuf

* add required dependencies (#3047)

* update doc (intel#3056)

* Updatedoc (#3060)

* Update install-from-pip.md

* [WIP] spark 3.0 (intel#3054)

* spark 3.0

* add spark3.0 deployment (intel#3061)

* add spark3.0 deployment

* add warning to remind Optimizer() deprecates (intel#3062)

* add warning to remind deprecates

* Update scala maven plugin (#3068)

* update scala maven plugin

* change to public (#3064)

* Add big model support (#3067)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* squeeze target dimension (corner case) in ClassNLLCriterion (intel#3072)

* fix target dimension match error

* update message (#3073)

* flip version to 0.13-snapshot (intel#3074)

* flip version to 0.13-snapshot

* Uncompressed Tensor  (intel#3079)

* support no compressing parameter

* address comments

* hotfix ClassNLLCriterion with cloned target (#3081)

* hotfix ClassNLLCriterion with cloned target

* Fix SerializationUtils clone issue of QuantizedTensor (intel#3088)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* update clone quantizedtensor

* update

* add OptimPredictorShutdownSpec UT in integration test (#3089)

* move integration UT to a general test script (intel#3094)

* back port master (intel#3096)

* set seed to avoid random error in PredictionServiceUT (intel#3097)

* Jdk11 support (intel#3098)

* update for jdk 11 support and doc

* add serializeUid (intel#3099)

* update doc (intel#3104)

* add doc for running in ide (intel#3106)

* fix callBigDLFunc return a Int while the true return value from java is a byte array. (intel#3111)

* add list of df support (intel#3113)

* Update readme (intel#3118)

* Update index.md

* add 0.12.2 release download (#3122)

* remove DLFrames (intel#3124)

* remove DLFrames

* update

* update

* update

* rm dlframe example from test script

* Add Utest about dividing zero (#3128)

* Add Utest about dividing zero

* add Utest and zero check of LocalData

* add Utest and zero check of LocalData

* change

* Add Utest about dividing zero

* fix test

* add python3 to Dockerfile (intel#3132)

* add python3 to Dockerfile

* update

* update jdk

* update

* make default DistriOptimizer as V2 (intel#3129)

* make default DistriOptimizer as V2

* update

* fix dlframe (intel#3133)

* DistriOptimizerV2 logger (intel#3135)

* DistriOptimizerV2 logger

* update

* fix style check

* validate epoch num

* move dlframe SharedParamsApater to AZ and roll back to OptimizerV1 (intel#3137)

* upgrade spark version (intel#3138)

* Update deploy-spark2.sh

* 0.13 release doc (#3144)

* upgrade log4j (intel#3141)

* flip0.14 (intel#3142)

* flip0.14

* update

* Update deploy-spark3.sh (#3145)

* update

* update

* update

* update

* fix make dist

* migrate path

* update

* update

Co-authored-by: zhangxiaoli73 <[email protected]>
Co-authored-by: Jerry Wu <[email protected]>
Co-authored-by: Xin Qiu <[email protected]>
Co-authored-by: Yanzhang Wang <[email protected]>
Co-authored-by: GenBrg <[email protected]>
Co-authored-by: LeicongLi <[email protected]>
Co-authored-by: Emiliano Martinez <[email protected]>
Co-authored-by: abdolence <[email protected]>
Co-authored-by: Enrique Garcia <[email protected]>
Co-authored-by: Louie Tsai <[email protected]>
Co-authored-by: yaochi <[email protected]>
Co-authored-by: Menooker <[email protected]>
Co-authored-by: Menooker <[email protected]>
Co-authored-by: Firecrackerxox <[email protected]>
Co-authored-by: majing921201 <[email protected]>
Co-authored-by: jenniew <[email protected]>
Co-authored-by: Xiao <[email protected]>
Co-authored-by: Firecrackerxox <[email protected]>
Co-authored-by: Hui Li <[email protected]>
Co-authored-by: Jason Dai <[email protected]>
Co-authored-by: dding3 <[email protected]>
Co-authored-by: Yang Wang <[email protected]>
Co-authored-by: Yina Chen <[email protected]>
Co-authored-by: Hangrui Cao <[email protected]>
Co-authored-by: pinggao18 <[email protected]>
Le-Zheng added a commit to Le-Zheng/analytics-zoo that referenced this pull request Sep 22, 2021
* convert static graph to IR graph and build (intel#2711)

* add static graph to IR graph

* meet pr comments

* [Enhancement] - Enhance unig test to avoid dynamic resource allocation issue by docker (intel#2713)

* make the core number fixed

* fix local predictor

* add Trigger and/or python API (intel#2682)

* add spark 2.4 support (intel#2715)

* update sparse tensor's document (#2714)

* Reserve all state in OptimMethod when calling Optimizer.optimize() multiple times (#2648)

* reserve optimMethod for each worker

* add valdiation throughput

* cache variable previousOptim

* fix: move mkldnn computing to a single thread pool (intel#2724)

Because if we use the parent thread directly, there will be two bugs,
1. The child threads forked from parent thread will be bound to core 0
because of the affinity settings.
2. The native thread has some unknown thread local variables. So if
the parent thread exits and is recreated, such as the thread from
Executors.newFixedThreadPool. The whole app will be segment fault.
The parent thread means the main thread (Local Mode) or worker thread of
mapPartition (Distributed Mode).

* add ceilMode for Pooling & fix batchNorm evaluate (#2708)

* add ceilMode for Pooling & fix batchNorm evaluate

* add training status for dnn layer

* fix comments

* fix IRGraph init & Add regualizer (#2736)

* fix IRGraph init & Add regualizer

* meet review comments

* fix: update mkldnn version to v0.17 issues. (intel#2712)

There're two issues,

1. the padding tensor required. mkl-dnn will use a padding tensor which
    will use more memory, such as 4x1x28x28 to 4x8x28x28(avx2). It will
    pad to times of simd width.
2. the TensorMMap between DenseTensor and DnnTensor. Previous impl
    will allocate DnnTensor when model is created, which will cost too much
    space. So this patch will allocate it at runtime.

* add computshape for some layers and add skip primitives in DnnGraph (intel#2740)

* add computshape for some layer and add skip primitives in DnnGraph

* meet pr comments

* Improve documentation (intel#2745)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* include edge case to cover all the data types (#2742)

* layer auto fusion for dnn graph (intel#2746)

* add auto fusion in dnn graph

* refactor predict for dnn model (intel#2737)

* refactor predict for dnn model

* remove some unit tests (intel#2752)

* remove some conflict tests (#2753)

* Update documentation (intel#2749)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Fix Add operation error when type is Double importing Tensorflow graph (#2721)

* feature: add byte supports for DnnTensor (intel#2751)

* feat: add byte supports for DnnTensor

* [New Feature] Calculating Scales (#2750)

* [New Feature]Calculating Scales

* recursively update mask for container module (intel#2754)

* recursively update mask for container module

* [Enhancement] - Speed up BlasWrapper performance under MKL-DNN (intel#2748)

* add parallel in Blaswrapper

* refactor to support ssd

* meet pr comments

* fix logger serialize

* Loss Function docs improvement (intel#2757)

* Improve Loss Function docs v2

* change asInstanceOf to toDistirbuted in optimizer (#2755)

* change asInstanceOf to toDistirbuted

* change asInstanceOf to toDistirbuted

* convert scale in blas to dnn (#2758)

* convert scale in blas to dnn

* meet pr comment

* feat: reorder for int8 supports (#2756)

1. Because the new data type, we should add a new attribute called dataType
    to the `MemoryData`.
2. Because we should transfer the scales between FP32->int8 and Int8->FP32.
    we should add two new attributes called `mask` and `scales`.

* fix conversion accuracy (intel#2760)

*  fix accuracy for saved model

* exclude mkldnn model when conversion

* feature: layer wise supports of int8 (intel#2762)

Enable the int8 data type in layers, especially for convolutions.
So for a specific layer, it can accept a int8 input. If you want to the fp32
output, should add a reorder.

* feature: mkldnn int8 layer wise supports (intel#2759)

including 3 steps.

1. generate scales of model.
   need an api like `generateScalesWithMask` to generate the scales of
   fp32 model. and the model returned is an fp32 model too.
2. quantize the model
   the `quantize()` api will be compatible with the `bigquant`
   backend, which will set the quantize flag. And when doing compile,
   the quantized weight, output, input will be generated by mkldnn at
   runtime.
3. do the inference (forward).

* update readme for v1 training (intel#2763)

* update release doc for preparation (intel#2764)

* change some docs about mkldnn (intel#2765)

* add comments about mkldnn

* meet pr comments

* examples for int8 (intel#2761)

This is an example of how to use mkldnn int8. There're two steps, use
GenInt8Scales to generate the scales first and save the new model. And than you
can use the quantized model as usual.

* enable fustion by default (intel#2766)

* fix: the influence of default value of fusion (#2768)

* fix: use too much memory of mkldnn models (intel#2783)

* fix: inplace of input/output and weight dimension error (intel#2779)

Some layer's input and output use the same memory. We can't do forward in the
`calcScales`. Because at that time, the input has been changed, its scales maybe
not right. Such as,

Seqeuntail().add(Conv).add(ReLU)

it will do two steps, seq.forward(input) first. and when go into the ReLU, it
will do another forward, so the input will be the output. And scales will be
wrong.

For convolution's weight, the dimension always is 5, although the group number
is 1. But for dnn convolution, if there's no group, the weight's dimension
should be 4.

* fix: the blas wrapper has no scales (intel#2778)

* fix softmax (intel#2777)

* fix: performance regression on resnet50 (intel#2774)

the u8 to s8 or s8 to u8 needs no reorder on this case.

* fix log init (#2781)

* fix: dropout should init primitive (#2789)

* Docs update for spark 2.3, build 0.7 and deps exlude (intel#2671)

* flip to 0.9.0 (intel#2792)

* Improve Layer documentation v1 (#2767)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Loss Function docs improvement v1

* Improve Loss Function docs v2

* Improve Layers documentation

* Improve documentation on Activations

* minor fix

* Update a code section with python style on Metrics.md (intel#2665)

* [Fix] doc : some changes for scalaUserGuide and release links according to … (intel#2791)

* doc : some changes for scalaUserGuide and release links according to v0.8.0 release

* Update build-bigdl-core.md

* Update build-bigdl-core.md

* test: should compare the right grad input (intel#2794)

* fix the wrong error message (#2800)

* [New feature] Add attention layer and ffn layer (intel#2795)

* add attention layer

* add ffn layer and more unit tests

* refactor according to pr comments

* add SerializationTest

* fix unit tests

* add python api

* update readme with newly adopted mkl-dnn (#2803)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer (intel#2802)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer:
Add LARS optimizer: Layer-wise scaled. Also with utility functions to build a set of LARS optim for a container.

Bug fix: The gradient block id of AllReduceParameter is originally composed of {id}{pidTo}gradientBytes{pidFrom}. But the combination of {id}{pidTo} will cause ambiguity. e.g., "112" can be {1}{12} or {11}{2}. Now a "_" is added to separate id from pidTo

* refine documents, correctly set the lrSchedulerOwner bit

* format the added code

* make Lars inherit SGD

* rename Lars -> LarsSGD and reformat

* style changes

* bugfix - set mask for container (intel#2807)

* bugfix - set mask for container

* bugfix #2805: set dimension mask

* Update Graph.scala

* Update Graph.scala

* change set mask indicator's name

* rename set mask params

* [Enhancement]: Scala Reflection: get default value for constructor parameters (intel#2808)

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* resolve conflict

* resolve conflict

* code style check

* remove print

* fix typos

fix typos

* replace randomcropper with centercrop for better performance (#2818)

* fix: memory data hash code should contain data type (intel#2821)

* Optimize backward graph generation and CAddTable (intel#2817)

* Optimize backward graph generation and caddtable

* refine add table

* change api name

* add layer norm and expand size layers (#2819)

* add layer norm and expand size

* meet pr comments

* feat: enable global average pooling (intel#2823)

* feat: enable global average pooling

* test: add more unit tests

* Optimizers: use member variable in parent class

* Revert "Optimizers: use member variable in parent class"

This reverts commit 7e47204

* Dilation in MKL-DNN Convolution (intel#2815)

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* fix typos

fix typos

* make todo all uppercase

* fix: calculate arbitrary mask of scales (intel#2822)

* Use one AllReduceParameter for multi-optim method  training (intel#2814)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* change random seed in UT

* [New feature] add transformer layer (intel#2825)

* add transformer

* refactor class name

* use same embedding for translation

* fix pr comments

* [Bug Fix] Fix Issue 2734 (#2816)

* fix issue 2734

* fix issue 2734

* fix issue 2734

* [Refactor] Reflection Utilization (#2831)

* refactor reflection utils

* refactor reflection utils

* feat: MKLDNN LSTM unidirectional/bidirectional inference support (intel#2806)

* LSTM draft

* MKLDNN LSTM fixed MD

* added hiddenSize

* setMemoryData NativeData

* weights NativeData format set to ldigo, all 1 test passed

* fixed format any problem

* LSTM weights bias initialisation

* add LSTM2 in nn

* Bidirectional LSTM inference enabled

* modified Bidirectional test

* LSTMSpec input format conversion bug between bigdl and mkldnn fixed, not support random weights, bias

* fixed the last problem 1 3 2 4

* Three inference tests with randomly generated parameters

* Added comments and modified the LSTMSpec (tests using Equivalent.nearequals)

* Deleted nn/LSTM2. Renamed methods. Added a requirement in nn/TimeDistributed

* combined initMemoryDescs() into initFwdPrimitives()

* Add require for input size and hidden size matching if layers of LSTM is more than one

* Refactor RNN

* Add comment on gate order to mkldnn/RNN

* Add unidirectional multilayer test

* add comments/ modify UTs

* phase is not used anymore/ use isTraining() in stead

* operationWant enhanced/ weight init/ release() parameters()

* remove input format check and change some variables names

* input format check / throw exception print info / release code

* comment style and RNNSerialTest

* remove unnecessary comments

* Softmax -> SoftMax (#2837)

* bug fix for cmul (intel#2836)

* bug fix for cmul

* meet pr comments

* set new storage to weight and bias for weight fusion (intel#2839)

* Add parameter processor for LARS (#2832)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* implement lars whole layer gradient norm calculation

* change random seed in UT

* add limitation on "trust" of LARS, remove debug output

* reformat

* add tests in DirtriOptimizer for LARS

* reformat

* update parameters in UT

* update parameters in UT

* Add transformer to LM example (intel#2835)

* add transformer to LM example

* refactor dropout in Transformer

* meet pr comments

* feat: MKLDNN LSTM unidirectional/bidirectional backward support (#2840)

* MKLDNN LSTM backward support with accuracy testing

* fix: require consistent between shape and layout of mkldnn (intel#2824)

* fix: fusion for multi-group of convolution (intel#2826)

* fix: support int8 of jointable (#2827)

* fix: support int8 of jointable
* doc: add more docs

* fix: invokeAndWait2 should throw the exception in the tasks (intel#2843)

* fix acc bug & init dnn thread (intel#2841)

* support tnc and ntc conversion (intel#2844)

* support ntc in dnn layer (intel#2847)

* support ntc in dnn layer

* meet pr comments

* [WIP]Add beam search feature in transformer model (intel#2834)

* add beam search feature

* Update beam search feature and unit test

* add symbolToLogits function set check

* update clearState and add serial test

* add SequenceBeamSearch to python layers

* add createSequenceBeamSearch method to python api

* feat: add a property to disable omp thread affinity (intel#2849)

* fix: use treeset to calc topk to upgrade the performance of DetectionOutputSSD (intel#2853)

* fix: wrong affinity settings (intel#2857)

* update beam search feature for interface with transformer model (#2855)

* update beam search for padding value and cache structure

* update python API for beam search

* add comments and update python layer

* modify comments format

* modify comments format

* Support converting blas lstm to dnn lstm (#2846)

* convert from blas lstm to dnn lstm

* meet pr comments

* fix load lstm error bug (intel#2858)

* Add beam search in transformer (intel#2856)

* Add beam search in transformer

* meet pr comments

* fix: upgrade the performance of normalize (intel#2854)

* feat: add axis to softmax (intel#2859)

* add release doc for 0.9 (intel#2862)

* fix: update core ref to master (intel#2865)

* flip version to 0.10.0 (intel#2869)

* [Bug Fix] - Fix module version comparison  (intel#2871)

* update serialization

* update serialization

* convert IRgraph momentum to mkldnn (intel#2872)

* tutorial fix (intel#2879)

* feat: RoiAlign Forward (intel#2874)

* Add set input output format API in Python (intel#2880)

* add set input output format

* add static graph check

* feat: Feature Pyramid Networks Forward (intel#2870)

* fix memory leak for ir graph training (intel#2895)

* add gemm layer (#2882)

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add gemm layer

* add gemm layer

* add Shape layer (intel#2885)

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add Gather layer (intel#2897)

* add gather layer

* [New feature] Add maskhead (intel#2892)

* support for maskhead

* fix unit tests (intel#2905)

* modify  predict/predictClass function  (#2868)

* predictClass output modification

* predict/predictClass function modification in Beta Api

* predict/predictClass function modification

* predict/predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* [New feature] Add Boxhead (intel#2894)

* add boxhead

* add SerialTest

* meet pr comments

* fix: Add TopBlocks to Feature Pyramid Networks (FPN) (#2899)

* Add Mean Average Precision validation method (intel#2906)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* fix boxhead unit tests (#2912)

* python api nested list input and pooler python api (intel#2900)

* Auto memory management for MKLDNN (#2867)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* style fixes

* change _implicitMemoryOwner -> _this

* [New feature] Add region proposal (intel#2896)

* add Regionproposal

* [New feature] add maskrcnn (#2908)

* add maskrcnn

* fix mask head

* move maskrcnn to models

* add maskrcnn serialTest

* Add Onnx Supported Layers (intel#2902)

* remove duplicated layers

* Update RoiLabel class and add RoiImageFeatureToBatch (intel#2913)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* update RoiLabel, add RoiImageFeatureToBatch

* fix typo in class name

* updates by suggestions

* minor updates

* Move RoiMiniBatch to MTImageFeatureToBatch.scala

* mask in RoiLabel now have Floats not Bytes

* use IndexedSeq for RoiLabel

* style fix

* add isCrowd and origSize to final target table

* style fix

* isCrowd change to float, add doc

* add tests and bug fixes

* add util getting RoiLabels from ImageFeatures

* add util getting RoiLabels from Table

* comment out the tests

* rename utils in RoiLabel

* feat: MKLDNN GRU forward/backward support (#2893)

* Onnx support: modify unsqueeze function (#2910)

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* add maskutils (intel#2921)

* add maskutils

* update tests & docs

* fix typo in document

* Fix memory leaks on training (intel#2914)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* fix memory leak in batch norm

* style fixes

* change _implicitMemoryOwner -> _this

* release submat

* release opencv submats

* support samples with different size  to one mini batch (intel#2929)

* add to batch with resize

* meet comments

* support batch for mask head and pooler (intel#2926)

* support batch for mask head

* meet comments

* Onnx support: add a dim parameter to ops.Gather (intel#2920)

* add dim parameter to ops.Gather

* improve and simplify code

* improve and simplify code

* improve and simplify code

* improve and simplify code

* support batch for regionproposal (#2928)

* support batch for regionproposal

* enable gru blas-to-dnn conversion (intel#2930)

* Onnx support: add pos parameter to softmax (intel#2933)

* add pos parameter to softmax

* add pos parameter to softmax

* add pos parameter to softmax

* fix review problem

* fix review problem

* Add resize for segmentation (intel#2923)

* add resize for segmentation

* meet pr comments

* support batch input for boxhead (#2924)

* boxhead support batch input

* meet pr comments

* COCO SeqFile (intel#2927)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* ignore non-existing images

* updates based on GH comments

* ONNX Support (#2918)

* onnx dev

* add onnx loader

* clean up

* feat: add precision recall auc (#2941)

* feat: add precision recall auc

* add post processing for maskrcnn model (#2931)

* add mask postprocessing

* put image info to mask model

* fix TimeDistributedCriterion() lack of parameter of dimension issue (intel#2940)

* revert back api (intel#2943)

* fix: softmax and bn+scale fusion (intel#2937)

* feat: multi models support with MKL-DNN backend (intel#2936)

* feat: multi models support with MKL-DNN backend

* add COCO MAP (#2935)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* add COCO MAP

* revert merge conflict

* ignore non-existing images

* add IOU related API. MAP now parses RLEs

* BBox now inclusive

* updates based on GH comments

* add COCODataset.getImageById

* COCO topK default => -1, remove height: Int, width: Int in GroundTruthRLE

* update imageId2Image

* rename MAPObjectDetection utils, add cocoSegmentationAndBBox, refine formatting

* rename utils

* update documents

* check size of bbox & classes & scores & labels & iscrowd. Handle empty predictions

* add gt and target image size checking, add support for empty target bbox, add UT

* detection sorted before matching with GT. Optimize MAPResult merging. Add UT for merging

* COCO Seq file reader: grey to bgr (intel#2942)

* grey to bgr

* refactor isGrayScaleImage

* simplify grey scale image checking

* Add the flushing denormal values option on BigDL side (#2934)

* add no argument apply api for softmax (intel#2945)

* add no argument apply api for softmax

* add no argument apply api for softmax

* ONNX ResNet example (intel#2939)

* add onnx resnet example

* add doc for onnx

* add doc for onnx

* clean up

* add maskrcnn inference example (intel#2944)

* add maskrcnn inference example

* meet pr comments

* add model download url

* Update the RoiLabel and MTImageFeatureToBatch (intel#2925)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* Python MKLDNN examples for CNN(LeNet) and RNN(LSTM) (#2932)

* fix: takeSample only works for dnn backend and get one batch (intel#2947)

* fix: takeSample only works for dnn backend and get one batch

* edit doc (#2948)

* Rename filesToRoiImageFrame to filesToRoiImageFeatures (intel#2949)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* filesToRoiImageFrame -> filesToRoiImageFeatures, to public

* fix: move out setMklThreads of MklDnn (intel#2950)

* memory data cleanup (#2956)

* memory data cleanup

* Onnx support: RoiAlign and TopK parameter update (#2957)

* Topk add dim and increase parameter

* RoiAlign add max pooling mode

* add test cases

* add test cases

* remove masks requirements (intel#2959)

* fix: the squeeze should not be included in IRElement (intel#2962)

* enhance COCODataset (#2954)

* enhance COCODataset:
Add COCODataset.loadFromSeqFile
Add COCODataset.toImageFeatures
Add COCOImage.toTable

* rename and polish doc

* fix COCO serialize bug

* fix typo in function name

* typo fix (intel#2965)

* rename RoiImageFeatureToBatch APIs (#2964)

* RoiMiniBatch enhancement (#2953)

* SerializableIndexedSeq

* allow empty target & image size info

* rename RoiImageFeatureToBatch APIs

* set as private

* change back to array

* MTImageFeatureToBatch without labels

* handle iscrowd

* remove duplication in merge

* feat: add softmax backward (intel#2967)

* feat: add softmax backward

* fix: fuse bn scale and relu to bn. (intel#2966)

* fix: fuse bn scale and relu.

* fix mask unit tests (intel#2973)

* fix: nms stability when using treeset. (intel#2972)

* flip version to 0.11 (intel#2974)

* refactor anchor generator (#2963)

* refactor anchor generator

* meet pr comments

* fix code style

* ROIAlign refactor (intel#2960)

* ROIAlign refactor

* fix unit tests

* fix model load of maskrcnn (intel#2961)

* fix maskrcnn model load

* delete temp file

* fix maskrcnn tests

* support roialign backward (intel#2975)

* support roialign backward

* fix sparselinear unit test

* fix: bn nhwc error, the channel should be the last dim (#2981)

* refactor: move torch relevants unit tests to integration tests. (intel#2971)

* fix: enable integration accuracy tests (intel#2976)

* fix: softmax dnn backend wrong order of primitive (intel#2986)

* modify TextClassifier.scala (#2987)

* Add a method to merge nested StaticGraphs (intel#2985)

* NHWC support when running with MKL-DNN (#2989)

* support NHWC for MKLDNN

* fix unit tests

* Keras with MKL-DNN backend support (#2990)

* Update README.md

* Update README.md

* feat: add distri optimizer v2 (intel#2992)

* update error message in AllReduceParameter (#2997)

* update error message in AllReduceParameter

* use tensorflow proto jar (#2994)

* fix callBigDLFunc (intel#3002)

* Remove final for AbstractModule (intel#3001)

* DistriOptimizerV2 argument (intel#3003)

* call DistriOptimizerV2

* fix inception (intel#3010)

* fix top1 and treenn (intel#3011)

* remove final setExtraParameters (#3014)

* move pretrain in DistriOptimizerV2 (intel#3016)

* move getData

* rename

* remove time counting

* deprecate dlframe (intel#3012)

* deprecate dlframe

* fix throughput (#3017)

* fix throughput

* update

* add release doc for 0.10.0 (intel#3020)

* test examples by distrioptimizerv2 (intel#3007)

* enable scala examples by distrioptimizerv2

* update example's readme

* update integration test

* test python examples by distriOptimizerV2 (intel#3008)

* Test python examples by distriOptimizerV2

* deprecate nn.keras (intel#3013)

* deprecate nn.keras

* fix loss when minibatch size is different (intel#3021)

* fix loss

* fix ut

* fix style check (intel#3022)

* specify pyspark version (intel#3030)

* specify pyspark version

* add release doc for 0.11 (#3026)

* flip version to 0.12 (intel#3029)



* update

* fix KerasLayer new parameters() (#3034)

* Fix analytics zoo protobuf shading problem (intel#3033)

* change shade name and remove protobuf-java (already introduced by tf)

* remove protobuf

* add required dependencies (#3047)

* update doc (intel#3056)

* Updatedoc (#3060)

* Update install-from-pip.md

* [WIP] spark 3.0 (intel#3054)

* spark 3.0

* add spark3.0 deployment (intel#3061)

* add spark3.0 deployment

* add warning to remind Optimizer() deprecates (intel#3062)

* add warning to remind deprecates

* Update scala maven plugin (#3068)

* update scala maven plugin

* change to public (#3064)

* Add big model support (#3067)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* squeeze target dimension (corner case) in ClassNLLCriterion (intel#3072)

* fix target dimension match error

* update message (#3073)

* flip version to 0.13-snapshot (intel#3074)

* flip version to 0.13-snapshot

* Uncompressed Tensor  (intel#3079)

* support no compressing parameter

* address comments

* hotfix ClassNLLCriterion with cloned target (#3081)

* hotfix ClassNLLCriterion with cloned target

* Fix SerializationUtils clone issue of QuantizedTensor (intel#3088)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* update clone quantizedtensor

* update

* add OptimPredictorShutdownSpec UT in integration test (#3089)

* move integration UT to a general test script (intel#3094)

* back port master (intel#3096)

* set seed to avoid random error in PredictionServiceUT (intel#3097)

* Jdk11 support (intel#3098)

* update for jdk 11 support and doc

* add serializeUid (intel#3099)

* update doc (intel#3104)

* add doc for running in ide (intel#3106)

* fix callBigDLFunc return a Int while the true return value from java is a byte array. (intel#3111)

* add list of df support (intel#3113)

* Update readme (intel#3118)

* Update index.md

* add 0.12.2 release download (#3122)

* remove DLFrames (intel#3124)

* remove DLFrames

* update

* update

* update

* rm dlframe example from test script

* Add Utest about dividing zero (#3128)

* Add Utest about dividing zero

* add Utest and zero check of LocalData

* add Utest and zero check of LocalData

* change

* Add Utest about dividing zero

* fix test

* add python3 to Dockerfile (intel#3132)

* add python3 to Dockerfile

* update

* update jdk

* update

* make default DistriOptimizer as V2 (intel#3129)

* make default DistriOptimizer as V2

* update

* fix dlframe (intel#3133)

* DistriOptimizerV2 logger (intel#3135)

* DistriOptimizerV2 logger

* update

* fix style check

* validate epoch num

* move dlframe SharedParamsApater to AZ and roll back to OptimizerV1 (intel#3137)

* upgrade spark version (intel#3138)

* Update deploy-spark2.sh

* 0.13 release doc (#3144)

* upgrade log4j (intel#3141)

* flip0.14 (intel#3142)

* flip0.14

* update

* Update deploy-spark3.sh (#3145)

* update

* update

* update

* update

* fix make dist

* migrate path

* update

* update

Co-authored-by: zhangxiaoli73 <[email protected]>
Co-authored-by: Jerry Wu <[email protected]>
Co-authored-by: Xin Qiu <[email protected]>
Co-authored-by: Yanzhang Wang <[email protected]>
Co-authored-by: GenBrg <[email protected]>
Co-authored-by: LeicongLi <[email protected]>
Co-authored-by: Emiliano Martinez <[email protected]>
Co-authored-by: abdolence <[email protected]>
Co-authored-by: Enrique Garcia <[email protected]>
Co-authored-by: Louie Tsai <[email protected]>
Co-authored-by: yaochi <[email protected]>
Co-authored-by: Menooker <[email protected]>
Co-authored-by: Menooker <[email protected]>
Co-authored-by: Firecrackerxox <[email protected]>
Co-authored-by: majing921201 <[email protected]>
Co-authored-by: jenniew <[email protected]>
Co-authored-by: Xiao <[email protected]>
Co-authored-by: Firecrackerxox <[email protected]>
Co-authored-by: Hui Li <[email protected]>
Co-authored-by: Jason Dai <[email protected]>
Co-authored-by: dding3 <[email protected]>
Co-authored-by: Yang Wang <[email protected]>
Co-authored-by: Yina Chen <[email protected]>
Co-authored-by: Hangrui Cao <[email protected]>
Co-authored-by: pinggao18 <[email protected]>
Le-Zheng added a commit to Le-Zheng/analytics-zoo that referenced this pull request Sep 22, 2021
* convert static graph to IR graph and build (intel#2711)

* add static graph to IR graph

* meet pr comments

* [Enhancement] - Enhance unig test to avoid dynamic resource allocation issue by docker (intel#2713)

* make the core number fixed

* fix local predictor

* add Trigger and/or python API (intel#2682)

* add spark 2.4 support (intel#2715)

* update sparse tensor's document (#2714)

* Reserve all state in OptimMethod when calling Optimizer.optimize() multiple times (#2648)

* reserve optimMethod for each worker

* add valdiation throughput

* cache variable previousOptim

* fix: move mkldnn computing to a single thread pool (intel#2724)

Because if we use the parent thread directly, there will be two bugs,
1. The child threads forked from parent thread will be bound to core 0
because of the affinity settings.
2. The native thread has some unknown thread local variables. So if
the parent thread exits and is recreated, such as the thread from
Executors.newFixedThreadPool. The whole app will be segment fault.
The parent thread means the main thread (Local Mode) or worker thread of
mapPartition (Distributed Mode).

* add ceilMode for Pooling & fix batchNorm evaluate (#2708)

* add ceilMode for Pooling & fix batchNorm evaluate

* add training status for dnn layer

* fix comments

* fix IRGraph init & Add regualizer (#2736)

* fix IRGraph init & Add regualizer

* meet review comments

* fix: update mkldnn version to v0.17 issues. (intel#2712)

There're two issues,

1. the padding tensor required. mkl-dnn will use a padding tensor which
    will use more memory, such as 4x1x28x28 to 4x8x28x28(avx2). It will
    pad to times of simd width.
2. the TensorMMap between DenseTensor and DnnTensor. Previous impl
    will allocate DnnTensor when model is created, which will cost too much
    space. So this patch will allocate it at runtime.

* add computshape for some layers and add skip primitives in DnnGraph (intel#2740)

* add computshape for some layer and add skip primitives in DnnGraph

* meet pr comments

* Improve documentation (intel#2745)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* include edge case to cover all the data types (#2742)

* layer auto fusion for dnn graph (intel#2746)

* add auto fusion in dnn graph

* refactor predict for dnn model (intel#2737)

* refactor predict for dnn model

* remove some unit tests (intel#2752)

* remove some conflict tests (#2753)

* Update documentation (intel#2749)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Fix Add operation error when type is Double importing Tensorflow graph (#2721)

* feature: add byte supports for DnnTensor (intel#2751)

* feat: add byte supports for DnnTensor

* [New Feature] Calculating Scales (#2750)

* [New Feature]Calculating Scales

* recursively update mask for container module (intel#2754)

* recursively update mask for container module

* [Enhancement] - Speed up BlasWrapper performance under MKL-DNN (intel#2748)

* add parallel in Blaswrapper

* refactor to support ssd

* meet pr comments

* fix logger serialize

* Loss Function docs improvement (intel#2757)

* Improve Loss Function docs v2

* change asInstanceOf to toDistirbuted in optimizer (#2755)

* change asInstanceOf to toDistirbuted

* change asInstanceOf to toDistirbuted

* convert scale in blas to dnn (#2758)

* convert scale in blas to dnn

* meet pr comment

* feat: reorder for int8 supports (#2756)

1. Because the new data type, we should add a new attribute called dataType
    to the `MemoryData`.
2. Because we should transfer the scales between FP32->int8 and Int8->FP32.
    we should add two new attributes called `mask` and `scales`.

* fix conversion accuracy (intel#2760)

*  fix accuracy for saved model

* exclude mkldnn model when conversion

* feature: layer wise supports of int8 (intel#2762)

Enable the int8 data type in layers, especially for convolutions.
So for a specific layer, it can accept a int8 input. If you want to the fp32
output, should add a reorder.

* feature: mkldnn int8 layer wise supports (intel#2759)

including 3 steps.

1. generate scales of model.
   need an api like `generateScalesWithMask` to generate the scales of
   fp32 model. and the model returned is an fp32 model too.
2. quantize the model
   the `quantize()` api will be compatible with the `bigquant`
   backend, which will set the quantize flag. And when doing compile,
   the quantized weight, output, input will be generated by mkldnn at
   runtime.
3. do the inference (forward).

* update readme for v1 training (intel#2763)

* update release doc for preparation (intel#2764)

* change some docs about mkldnn (intel#2765)

* add comments about mkldnn

* meet pr comments

* examples for int8 (intel#2761)

This is an example of how to use mkldnn int8. There're two steps, use
GenInt8Scales to generate the scales first and save the new model. And than you
can use the quantized model as usual.

* enable fustion by default (intel#2766)

* fix: the influence of default value of fusion (#2768)

* fix: use too much memory of mkldnn models (intel#2783)

* fix: inplace of input/output and weight dimension error (intel#2779)

Some layer's input and output use the same memory. We can't do forward in the
`calcScales`. Because at that time, the input has been changed, its scales maybe
not right. Such as,

Seqeuntail().add(Conv).add(ReLU)

it will do two steps, seq.forward(input) first. and when go into the ReLU, it
will do another forward, so the input will be the output. And scales will be
wrong.

For convolution's weight, the dimension always is 5, although the group number
is 1. But for dnn convolution, if there's no group, the weight's dimension
should be 4.

* fix: the blas wrapper has no scales (intel#2778)

* fix softmax (intel#2777)

* fix: performance regression on resnet50 (intel#2774)

the u8 to s8 or s8 to u8 needs no reorder on this case.

* fix log init (#2781)

* fix: dropout should init primitive (#2789)

* Docs update for spark 2.3, build 0.7 and deps exlude (intel#2671)

* flip to 0.9.0 (intel#2792)

* Improve Layer documentation v1 (#2767)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Loss Function docs improvement v1

* Improve Loss Function docs v2

* Improve Layers documentation

* Improve documentation on Activations

* minor fix

* Update a code section with python style on Metrics.md (intel#2665)

* [Fix] doc : some changes for scalaUserGuide and release links according to … (intel#2791)

* doc : some changes for scalaUserGuide and release links according to v0.8.0 release

* Update build-bigdl-core.md

* Update build-bigdl-core.md

* test: should compare the right grad input (intel#2794)

* fix the wrong error message (#2800)

* [New feature] Add attention layer and ffn layer (intel#2795)

* add attention layer

* add ffn layer and more unit tests

* refactor according to pr comments

* add SerializationTest

* fix unit tests

* add python api

* update readme with newly adopted mkl-dnn (#2803)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer (intel#2802)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer:
Add LARS optimizer: Layer-wise scaled. Also with utility functions to build a set of LARS optim for a container.

Bug fix: The gradient block id of AllReduceParameter is originally composed of {id}{pidTo}gradientBytes{pidFrom}. But the combination of {id}{pidTo} will cause ambiguity. e.g., "112" can be {1}{12} or {11}{2}. Now a "_" is added to separate id from pidTo

* refine documents, correctly set the lrSchedulerOwner bit

* format the added code

* make Lars inherit SGD

* rename Lars -> LarsSGD and reformat

* style changes

* bugfix - set mask for container (intel#2807)

* bugfix - set mask for container

* bugfix #2805: set dimension mask

* Update Graph.scala

* Update Graph.scala

* change set mask indicator's name

* rename set mask params

* [Enhancement]: Scala Reflection: get default value for constructor parameters (intel#2808)

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* resolve conflict

* resolve conflict

* code style check

* remove print

* fix typos

fix typos

* replace randomcropper with centercrop for better performance (#2818)

* fix: memory data hash code should contain data type (intel#2821)

* Optimize backward graph generation and CAddTable (intel#2817)

* Optimize backward graph generation and caddtable

* refine add table

* change api name

* add layer norm and expand size layers (#2819)

* add layer norm and expand size

* meet pr comments

* feat: enable global average pooling (intel#2823)

* feat: enable global average pooling

* test: add more unit tests

* Optimizers: use member variable in parent class

* Revert "Optimizers: use member variable in parent class"

This reverts commit 7e47204

* Dilation in MKL-DNN Convolution (intel#2815)

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* fix typos

fix typos

* make todo all uppercase

* fix: calculate arbitrary mask of scales (intel#2822)

* Use one AllReduceParameter for multi-optim method  training (intel#2814)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* change random seed in UT

* [New feature] add transformer layer (intel#2825)

* add transformer

* refactor class name

* use same embedding for translation

* fix pr comments

* [Bug Fix] Fix Issue 2734 (#2816)

* fix issue 2734

* fix issue 2734

* fix issue 2734

* [Refactor] Reflection Utilization (#2831)

* refactor reflection utils

* refactor reflection utils

* feat: MKLDNN LSTM unidirectional/bidirectional inference support (intel#2806)

* LSTM draft

* MKLDNN LSTM fixed MD

* added hiddenSize

* setMemoryData NativeData

* weights NativeData format set to ldigo, all 1 test passed

* fixed format any problem

* LSTM weights bias initialisation

* add LSTM2 in nn

* Bidirectional LSTM inference enabled

* modified Bidirectional test

* LSTMSpec input format conversion bug between bigdl and mkldnn fixed, not support random weights, bias

* fixed the last problem 1 3 2 4

* Three inference tests with randomly generated parameters

* Added comments and modified the LSTMSpec (tests using Equivalent.nearequals)

* Deleted nn/LSTM2. Renamed methods. Added a requirement in nn/TimeDistributed

* combined initMemoryDescs() into initFwdPrimitives()

* Add require for input size and hidden size matching if layers of LSTM is more than one

* Refactor RNN

* Add comment on gate order to mkldnn/RNN

* Add unidirectional multilayer test

* add comments/ modify UTs

* phase is not used anymore/ use isTraining() in stead

* operationWant enhanced/ weight init/ release() parameters()

* remove input format check and change some variables names

* input format check / throw exception print info / release code

* comment style and RNNSerialTest

* remove unnecessary comments

* Softmax -> SoftMax (#2837)

* bug fix for cmul (intel#2836)

* bug fix for cmul

* meet pr comments

* set new storage to weight and bias for weight fusion (intel#2839)

* Add parameter processor for LARS (#2832)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* implement lars whole layer gradient norm calculation

* change random seed in UT

* add limitation on "trust" of LARS, remove debug output

* reformat

* add tests in DirtriOptimizer for LARS

* reformat

* update parameters in UT

* update parameters in UT

* Add transformer to LM example (intel#2835)

* add transformer to LM example

* refactor dropout in Transformer

* meet pr comments

* feat: MKLDNN LSTM unidirectional/bidirectional backward support (#2840)

* MKLDNN LSTM backward support with accuracy testing

* fix: require consistent between shape and layout of mkldnn (intel#2824)

* fix: fusion for multi-group of convolution (intel#2826)

* fix: support int8 of jointable (#2827)

* fix: support int8 of jointable
* doc: add more docs

* fix: invokeAndWait2 should throw the exception in the tasks (intel#2843)

* fix acc bug & init dnn thread (intel#2841)

* support tnc and ntc conversion (intel#2844)

* support ntc in dnn layer (intel#2847)

* support ntc in dnn layer

* meet pr comments

* [WIP]Add beam search feature in transformer model (intel#2834)

* add beam search feature

* Update beam search feature and unit test

* add symbolToLogits function set check

* update clearState and add serial test

* add SequenceBeamSearch to python layers

* add createSequenceBeamSearch method to python api

* feat: add a property to disable omp thread affinity (intel#2849)

* fix: use treeset to calc topk to upgrade the performance of DetectionOutputSSD (intel#2853)

* fix: wrong affinity settings (intel#2857)

* update beam search feature for interface with transformer model (#2855)

* update beam search for padding value and cache structure

* update python API for beam search

* add comments and update python layer

* modify comments format

* modify comments format

* Support converting blas lstm to dnn lstm (#2846)

* convert from blas lstm to dnn lstm

* meet pr comments

* fix load lstm error bug (intel#2858)

* Add beam search in transformer (intel#2856)

* Add beam search in transformer

* meet pr comments

* fix: upgrade the performance of normalize (intel#2854)

* feat: add axis to softmax (intel#2859)

* add release doc for 0.9 (intel#2862)

* fix: update core ref to master (intel#2865)

* flip version to 0.10.0 (intel#2869)

* [Bug Fix] - Fix module version comparison  (intel#2871)

* update serialization

* update serialization

* convert IRgraph momentum to mkldnn (intel#2872)

* tutorial fix (intel#2879)

* feat: RoiAlign Forward (intel#2874)

* Add set input output format API in Python (intel#2880)

* add set input output format

* add static graph check

* feat: Feature Pyramid Networks Forward (intel#2870)

* fix memory leak for ir graph training (intel#2895)

* add gemm layer (#2882)

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add gemm layer

* add gemm layer

* add Shape layer (intel#2885)

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add Gather layer (intel#2897)

* add gather layer

* [New feature] Add maskhead (intel#2892)

* support for maskhead

* fix unit tests (intel#2905)

* modify  predict/predictClass function  (#2868)

* predictClass output modification

* predict/predictClass function modification in Beta Api

* predict/predictClass function modification

* predict/predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* [New feature] Add Boxhead (intel#2894)

* add boxhead

* add SerialTest

* meet pr comments

* fix: Add TopBlocks to Feature Pyramid Networks (FPN) (#2899)

* Add Mean Average Precision validation method (intel#2906)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* fix boxhead unit tests (#2912)

* python api nested list input and pooler python api (intel#2900)

* Auto memory management for MKLDNN (#2867)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* style fixes

* change _implicitMemoryOwner -> _this

* [New feature] Add region proposal (intel#2896)

* add Regionproposal

* [New feature] add maskrcnn (#2908)

* add maskrcnn

* fix mask head

* move maskrcnn to models

* add maskrcnn serialTest

* Add Onnx Supported Layers (intel#2902)

* remove duplicated layers

* Update RoiLabel class and add RoiImageFeatureToBatch (intel#2913)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* update RoiLabel, add RoiImageFeatureToBatch

* fix typo in class name

* updates by suggestions

* minor updates

* Move RoiMiniBatch to MTImageFeatureToBatch.scala

* mask in RoiLabel now have Floats not Bytes

* use IndexedSeq for RoiLabel

* style fix

* add isCrowd and origSize to final target table

* style fix

* isCrowd change to float, add doc

* add tests and bug fixes

* add util getting RoiLabels from ImageFeatures

* add util getting RoiLabels from Table

* comment out the tests

* rename utils in RoiLabel

* feat: MKLDNN GRU forward/backward support (#2893)

* Onnx support: modify unsqueeze function (#2910)

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* add maskutils (intel#2921)

* add maskutils

* update tests & docs

* fix typo in document

* Fix memory leaks on training (intel#2914)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* fix memory leak in batch norm

* style fixes

* change _implicitMemoryOwner -> _this

* release submat

* release opencv submats

* support samples with different size  to one mini batch (intel#2929)

* add to batch with resize

* meet comments

* support batch for mask head and pooler (intel#2926)

* support batch for mask head

* meet comments

* Onnx support: add a dim parameter to ops.Gather (intel#2920)

* add dim parameter to ops.Gather

* improve and simplify code

* improve and simplify code

* improve and simplify code

* improve and simplify code

* support batch for regionproposal (#2928)

* support batch for regionproposal

* enable gru blas-to-dnn conversion (intel#2930)

* Onnx support: add pos parameter to softmax (intel#2933)

* add pos parameter to softmax

* add pos parameter to softmax

* add pos parameter to softmax

* fix review problem

* fix review problem

* Add resize for segmentation (intel#2923)

* add resize for segmentation

* meet pr comments

* support batch input for boxhead (#2924)

* boxhead support batch input

* meet pr comments

* COCO SeqFile (intel#2927)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* ignore non-existing images

* updates based on GH comments

* ONNX Support (#2918)

* onnx dev

* add onnx loader

* clean up

* feat: add precision recall auc (#2941)

* feat: add precision recall auc

* add post processing for maskrcnn model (#2931)

* add mask postprocessing

* put image info to mask model

* fix TimeDistributedCriterion() lack of parameter of dimension issue (intel#2940)

* revert back api (intel#2943)

* fix: softmax and bn+scale fusion (intel#2937)

* feat: multi models support with MKL-DNN backend (intel#2936)

* feat: multi models support with MKL-DNN backend

* add COCO MAP (#2935)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* add COCO MAP

* revert merge conflict

* ignore non-existing images

* add IOU related API. MAP now parses RLEs

* BBox now inclusive

* updates based on GH comments

* add COCODataset.getImageById

* COCO topK default => -1, remove height: Int, width: Int in GroundTruthRLE

* update imageId2Image

* rename MAPObjectDetection utils, add cocoSegmentationAndBBox, refine formatting

* rename utils

* update documents

* check size of bbox & classes & scores & labels & iscrowd. Handle empty predictions

* add gt and target image size checking, add support for empty target bbox, add UT

* detection sorted before matching with GT. Optimize MAPResult merging. Add UT for merging

* COCO Seq file reader: grey to bgr (intel#2942)

* grey to bgr

* refactor isGrayScaleImage

* simplify grey scale image checking

* Add the flushing denormal values option on BigDL side (#2934)

* add no argument apply api for softmax (intel#2945)

* add no argument apply api for softmax

* add no argument apply api for softmax

* ONNX ResNet example (intel#2939)

* add onnx resnet example

* add doc for onnx

* add doc for onnx

* clean up

* add maskrcnn inference example (intel#2944)

* add maskrcnn inference example

* meet pr comments

* add model download url

* Update the RoiLabel and MTImageFeatureToBatch (intel#2925)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* Python MKLDNN examples for CNN(LeNet) and RNN(LSTM) (#2932)

* fix: takeSample only works for dnn backend and get one batch (intel#2947)

* fix: takeSample only works for dnn backend and get one batch

* edit doc (#2948)

* Rename filesToRoiImageFrame to filesToRoiImageFeatures (intel#2949)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* filesToRoiImageFrame -> filesToRoiImageFeatures, to public

* fix: move out setMklThreads of MklDnn (intel#2950)

* memory data cleanup (#2956)

* memory data cleanup

* Onnx support: RoiAlign and TopK parameter update (#2957)

* Topk add dim and increase parameter

* RoiAlign add max pooling mode

* add test cases

* add test cases

* remove masks requirements (intel#2959)

* fix: the squeeze should not be included in IRElement (intel#2962)

* enhance COCODataset (#2954)

* enhance COCODataset:
Add COCODataset.loadFromSeqFile
Add COCODataset.toImageFeatures
Add COCOImage.toTable

* rename and polish doc

* fix COCO serialize bug

* fix typo in function name

* typo fix (intel#2965)

* rename RoiImageFeatureToBatch APIs (#2964)

* RoiMiniBatch enhancement (#2953)

* SerializableIndexedSeq

* allow empty target & image size info

* rename RoiImageFeatureToBatch APIs

* set as private

* change back to array

* MTImageFeatureToBatch without labels

* handle iscrowd

* remove duplication in merge

* feat: add softmax backward (intel#2967)

* feat: add softmax backward

* fix: fuse bn scale and relu to bn. (intel#2966)

* fix: fuse bn scale and relu.

* fix mask unit tests (intel#2973)

* fix: nms stability when using treeset. (intel#2972)

* flip version to 0.11 (intel#2974)

* refactor anchor generator (#2963)

* refactor anchor generator

* meet pr comments

* fix code style

* ROIAlign refactor (intel#2960)

* ROIAlign refactor

* fix unit tests

* fix model load of maskrcnn (intel#2961)

* fix maskrcnn model load

* delete temp file

* fix maskrcnn tests

* support roialign backward (intel#2975)

* support roialign backward

* fix sparselinear unit test

* fix: bn nhwc error, the channel should be the last dim (#2981)

* refactor: move torch relevants unit tests to integration tests. (intel#2971)

* fix: enable integration accuracy tests (intel#2976)

* fix: softmax dnn backend wrong order of primitive (intel#2986)

* modify TextClassifier.scala (#2987)

* Add a method to merge nested StaticGraphs (intel#2985)

* NHWC support when running with MKL-DNN (#2989)

* support NHWC for MKLDNN

* fix unit tests

* Keras with MKL-DNN backend support (#2990)

* Update README.md

* Update README.md

* feat: add distri optimizer v2 (intel#2992)

* update error message in AllReduceParameter (#2997)

* update error message in AllReduceParameter

* use tensorflow proto jar (#2994)

* fix callBigDLFunc (intel#3002)

* Remove final for AbstractModule (intel#3001)

* DistriOptimizerV2 argument (intel#3003)

* call DistriOptimizerV2

* fix inception (intel#3010)

* fix top1 and treenn (intel#3011)

* remove final setExtraParameters (#3014)

* move pretrain in DistriOptimizerV2 (intel#3016)

* move getData

* rename

* remove time counting

* deprecate dlframe (intel#3012)

* deprecate dlframe

* fix throughput (#3017)

* fix throughput

* update

* add release doc for 0.10.0 (intel#3020)

* test examples by distrioptimizerv2 (intel#3007)

* enable scala examples by distrioptimizerv2

* update example's readme

* update integration test

* test python examples by distriOptimizerV2 (intel#3008)

* Test python examples by distriOptimizerV2

* deprecate nn.keras (intel#3013)

* deprecate nn.keras

* fix loss when minibatch size is different (intel#3021)

* fix loss

* fix ut

* fix style check (intel#3022)

* specify pyspark version (intel#3030)

* specify pyspark version

* add release doc for 0.11 (#3026)

* flip version to 0.12 (intel#3029)



* update

* fix KerasLayer new parameters() (#3034)

* Fix analytics zoo protobuf shading problem (intel#3033)

* change shade name and remove protobuf-java (already introduced by tf)

* remove protobuf

* add required dependencies (#3047)

* update doc (intel#3056)

* Updatedoc (#3060)

* Update install-from-pip.md

* [WIP] spark 3.0 (intel#3054)

* spark 3.0

* add spark3.0 deployment (intel#3061)

* add spark3.0 deployment

* add warning to remind Optimizer() deprecates (intel#3062)

* add warning to remind deprecates

* Update scala maven plugin (#3068)

* update scala maven plugin

* change to public (#3064)

* Add big model support (#3067)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* squeeze target dimension (corner case) in ClassNLLCriterion (intel#3072)

* fix target dimension match error

* update message (#3073)

* flip version to 0.13-snapshot (intel#3074)

* flip version to 0.13-snapshot

* Uncompressed Tensor  (intel#3079)

* support no compressing parameter

* address comments

* hotfix ClassNLLCriterion with cloned target (#3081)

* hotfix ClassNLLCriterion with cloned target

* Fix SerializationUtils clone issue of QuantizedTensor (intel#3088)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* update clone quantizedtensor

* update

* add OptimPredictorShutdownSpec UT in integration test (#3089)

* move integration UT to a general test script (intel#3094)

* back port master (intel#3096)

* set seed to avoid random error in PredictionServiceUT (intel#3097)

* Jdk11 support (intel#3098)

* update for jdk 11 support and doc

* add serializeUid (intel#3099)

* update doc (intel#3104)

* add doc for running in ide (intel#3106)

* fix callBigDLFunc return a Int while the true return value from java is a byte array. (intel#3111)

* add list of df support (intel#3113)

* Update readme (intel#3118)

* Update index.md

* add 0.12.2 release download (#3122)

* remove DLFrames (intel#3124)

* remove DLFrames

* update

* update

* update

* rm dlframe example from test script

* Add Utest about dividing zero (#3128)

* Add Utest about dividing zero

* add Utest and zero check of LocalData

* add Utest and zero check of LocalData

* change

* Add Utest about dividing zero

* fix test

* add python3 to Dockerfile (intel#3132)

* add python3 to Dockerfile

* update

* update jdk

* update

* make default DistriOptimizer as V2 (intel#3129)

* make default DistriOptimizer as V2

* update

* fix dlframe (intel#3133)

* DistriOptimizerV2 logger (intel#3135)

* DistriOptimizerV2 logger

* update

* fix style check

* validate epoch num

* move dlframe SharedParamsApater to AZ and roll back to OptimizerV1 (intel#3137)

* upgrade spark version (intel#3138)

* Update deploy-spark2.sh

* 0.13 release doc (#3144)

* upgrade log4j (intel#3141)

* flip0.14 (intel#3142)

* flip0.14

* update

* Update deploy-spark3.sh (#3145)

* update

* update

* update

* update

* fix make dist

* migrate path

* update

* update

Co-authored-by: zhangxiaoli73 <[email protected]>
Co-authored-by: Jerry Wu <[email protected]>
Co-authored-by: Xin Qiu <[email protected]>
Co-authored-by: Yanzhang Wang <[email protected]>
Co-authored-by: GenBrg <[email protected]>
Co-authored-by: LeicongLi <[email protected]>
Co-authored-by: Emiliano Martinez <[email protected]>
Co-authored-by: abdolence <[email protected]>
Co-authored-by: Enrique Garcia <[email protected]>
Co-authored-by: Louie Tsai <[email protected]>
Co-authored-by: yaochi <[email protected]>
Co-authored-by: Menooker <[email protected]>
Co-authored-by: Menooker <[email protected]>
Co-authored-by: Firecrackerxox <[email protected]>
Co-authored-by: majing921201 <[email protected]>
Co-authored-by: jenniew <[email protected]>
Co-authored-by: Xiao <[email protected]>
Co-authored-by: Firecrackerxox <[email protected]>
Co-authored-by: Hui Li <[email protected]>
Co-authored-by: Jason Dai <[email protected]>
Co-authored-by: dding3 <[email protected]>
Co-authored-by: Yang Wang <[email protected]>
Co-authored-by: Yina Chen <[email protected]>
Co-authored-by: Hangrui Cao <[email protected]>
Co-authored-by: pinggao18 <[email protected]>
Le-Zheng added a commit to Le-Zheng/analytics-zoo that referenced this pull request Sep 22, 2021
* convert static graph to IR graph and build (intel#2711)

* add static graph to IR graph

* meet pr comments

* [Enhancement] - Enhance unig test to avoid dynamic resource allocation issue by docker (intel#2713)

* make the core number fixed

* fix local predictor

* add Trigger and/or python API (intel#2682)

* add spark 2.4 support (intel#2715)

* update sparse tensor's document (#2714)

* Reserve all state in OptimMethod when calling Optimizer.optimize() multiple times (#2648)

* reserve optimMethod for each worker

* add valdiation throughput

* cache variable previousOptim

* fix: move mkldnn computing to a single thread pool (intel#2724)

Because if we use the parent thread directly, there will be two bugs,
1. The child threads forked from parent thread will be bound to core 0
because of the affinity settings.
2. The native thread has some unknown thread local variables. So if
the parent thread exits and is recreated, such as the thread from
Executors.newFixedThreadPool. The whole app will be segment fault.
The parent thread means the main thread (Local Mode) or worker thread of
mapPartition (Distributed Mode).

* add ceilMode for Pooling & fix batchNorm evaluate (#2708)

* add ceilMode for Pooling & fix batchNorm evaluate

* add training status for dnn layer

* fix comments

* fix IRGraph init & Add regualizer (#2736)

* fix IRGraph init & Add regualizer

* meet review comments

* fix: update mkldnn version to v0.17 issues. (intel#2712)

There're two issues,

1. the padding tensor required. mkl-dnn will use a padding tensor which
    will use more memory, such as 4x1x28x28 to 4x8x28x28(avx2). It will
    pad to times of simd width.
2. the TensorMMap between DenseTensor and DnnTensor. Previous impl
    will allocate DnnTensor when model is created, which will cost too much
    space. So this patch will allocate it at runtime.

* add computshape for some layers and add skip primitives in DnnGraph (intel#2740)

* add computshape for some layer and add skip primitives in DnnGraph

* meet pr comments

* Improve documentation (intel#2745)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* include edge case to cover all the data types (#2742)

* layer auto fusion for dnn graph (intel#2746)

* add auto fusion in dnn graph

* refactor predict for dnn model (intel#2737)

* refactor predict for dnn model

* remove some unit tests (intel#2752)

* remove some conflict tests (#2753)

* Update documentation (intel#2749)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Fix Add operation error when type is Double importing Tensorflow graph (#2721)

* feature: add byte supports for DnnTensor (intel#2751)

* feat: add byte supports for DnnTensor

* [New Feature] Calculating Scales (#2750)

* [New Feature]Calculating Scales

* recursively update mask for container module (intel#2754)

* recursively update mask for container module

* [Enhancement] - Speed up BlasWrapper performance under MKL-DNN (intel#2748)

* add parallel in Blaswrapper

* refactor to support ssd

* meet pr comments

* fix logger serialize

* Loss Function docs improvement (intel#2757)

* Improve Loss Function docs v2

* change asInstanceOf to toDistirbuted in optimizer (#2755)

* change asInstanceOf to toDistirbuted

* change asInstanceOf to toDistirbuted

* convert scale in blas to dnn (#2758)

* convert scale in blas to dnn

* meet pr comment

* feat: reorder for int8 supports (#2756)

1. Because the new data type, we should add a new attribute called dataType
    to the `MemoryData`.
2. Because we should transfer the scales between FP32->int8 and Int8->FP32.
    we should add two new attributes called `mask` and `scales`.

* fix conversion accuracy (intel#2760)

*  fix accuracy for saved model

* exclude mkldnn model when conversion

* feature: layer wise supports of int8 (intel#2762)

Enable the int8 data type in layers, especially for convolutions.
So for a specific layer, it can accept a int8 input. If you want to the fp32
output, should add a reorder.

* feature: mkldnn int8 layer wise supports (intel#2759)

including 3 steps.

1. generate scales of model.
   need an api like `generateScalesWithMask` to generate the scales of
   fp32 model. and the model returned is an fp32 model too.
2. quantize the model
   the `quantize()` api will be compatible with the `bigquant`
   backend, which will set the quantize flag. And when doing compile,
   the quantized weight, output, input will be generated by mkldnn at
   runtime.
3. do the inference (forward).

* update readme for v1 training (intel#2763)

* update release doc for preparation (intel#2764)

* change some docs about mkldnn (intel#2765)

* add comments about mkldnn

* meet pr comments

* examples for int8 (intel#2761)

This is an example of how to use mkldnn int8. There're two steps, use
GenInt8Scales to generate the scales first and save the new model. And than you
can use the quantized model as usual.

* enable fustion by default (intel#2766)

* fix: the influence of default value of fusion (#2768)

* fix: use too much memory of mkldnn models (intel#2783)

* fix: inplace of input/output and weight dimension error (intel#2779)

Some layer's input and output use the same memory. We can't do forward in the
`calcScales`. Because at that time, the input has been changed, its scales maybe
not right. Such as,

Seqeuntail().add(Conv).add(ReLU)

it will do two steps, seq.forward(input) first. and when go into the ReLU, it
will do another forward, so the input will be the output. And scales will be
wrong.

For convolution's weight, the dimension always is 5, although the group number
is 1. But for dnn convolution, if there's no group, the weight's dimension
should be 4.

* fix: the blas wrapper has no scales (intel#2778)

* fix softmax (intel#2777)

* fix: performance regression on resnet50 (intel#2774)

the u8 to s8 or s8 to u8 needs no reorder on this case.

* fix log init (#2781)

* fix: dropout should init primitive (#2789)

* Docs update for spark 2.3, build 0.7 and deps exlude (intel#2671)

* flip to 0.9.0 (intel#2792)

* Improve Layer documentation v1 (#2767)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Loss Function docs improvement v1

* Improve Loss Function docs v2

* Improve Layers documentation

* Improve documentation on Activations

* minor fix

* Update a code section with python style on Metrics.md (intel#2665)

* [Fix] doc : some changes for scalaUserGuide and release links according to … (intel#2791)

* doc : some changes for scalaUserGuide and release links according to v0.8.0 release

* Update build-bigdl-core.md

* Update build-bigdl-core.md

* test: should compare the right grad input (intel#2794)

* fix the wrong error message (#2800)

* [New feature] Add attention layer and ffn layer (intel#2795)

* add attention layer

* add ffn layer and more unit tests

* refactor according to pr comments

* add SerializationTest

* fix unit tests

* add python api

* update readme with newly adopted mkl-dnn (#2803)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer (intel#2802)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer:
Add LARS optimizer: Layer-wise scaled. Also with utility functions to build a set of LARS optim for a container.

Bug fix: The gradient block id of AllReduceParameter is originally composed of {id}{pidTo}gradientBytes{pidFrom}. But the combination of {id}{pidTo} will cause ambiguity. e.g., "112" can be {1}{12} or {11}{2}. Now a "_" is added to separate id from pidTo

* refine documents, correctly set the lrSchedulerOwner bit

* format the added code

* make Lars inherit SGD

* rename Lars -> LarsSGD and reformat

* style changes

* bugfix - set mask for container (intel#2807)

* bugfix - set mask for container

* bugfix #2805: set dimension mask

* Update Graph.scala

* Update Graph.scala

* change set mask indicator's name

* rename set mask params

* [Enhancement]: Scala Reflection: get default value for constructor parameters (intel#2808)

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* resolve conflict

* resolve conflict

* code style check

* remove print

* fix typos

fix typos

* replace randomcropper with centercrop for better performance (#2818)

* fix: memory data hash code should contain data type (intel#2821)

* Optimize backward graph generation and CAddTable (intel#2817)

* Optimize backward graph generation and caddtable

* refine add table

* change api name

* add layer norm and expand size layers (#2819)

* add layer norm and expand size

* meet pr comments

* feat: enable global average pooling (intel#2823)

* feat: enable global average pooling

* test: add more unit tests

* Optimizers: use member variable in parent class

* Revert "Optimizers: use member variable in parent class"

This reverts commit 7e47204

* Dilation in MKL-DNN Convolution (intel#2815)

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* fix typos

fix typos

* make todo all uppercase

* fix: calculate arbitrary mask of scales (intel#2822)

* Use one AllReduceParameter for multi-optim method  training (intel#2814)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* change random seed in UT

* [New feature] add transformer layer (intel#2825)

* add transformer

* refactor class name

* use same embedding for translation

* fix pr comments

* [Bug Fix] Fix Issue 2734 (#2816)

* fix issue 2734

* fix issue 2734

* fix issue 2734

* [Refactor] Reflection Utilization (#2831)

* refactor reflection utils

* refactor reflection utils

* feat: MKLDNN LSTM unidirectional/bidirectional inference support (intel#2806)

* LSTM draft

* MKLDNN LSTM fixed MD

* added hiddenSize

* setMemoryData NativeData

* weights NativeData format set to ldigo, all 1 test passed

* fixed format any problem

* LSTM weights bias initialisation

* add LSTM2 in nn

* Bidirectional LSTM inference enabled

* modified Bidirectional test

* LSTMSpec input format conversion bug between bigdl and mkldnn fixed, not support random weights, bias

* fixed the last problem 1 3 2 4

* Three inference tests with randomly generated parameters

* Added comments and modified the LSTMSpec (tests using Equivalent.nearequals)

* Deleted nn/LSTM2. Renamed methods. Added a requirement in nn/TimeDistributed

* combined initMemoryDescs() into initFwdPrimitives()

* Add require for input size and hidden size matching if layers of LSTM is more than one

* Refactor RNN

* Add comment on gate order to mkldnn/RNN

* Add unidirectional multilayer test

* add comments/ modify UTs

* phase is not used anymore/ use isTraining() in stead

* operationWant enhanced/ weight init/ release() parameters()

* remove input format check and change some variables names

* input format check / throw exception print info / release code

* comment style and RNNSerialTest

* remove unnecessary comments

* Softmax -> SoftMax (#2837)

* bug fix for cmul (intel#2836)

* bug fix for cmul

* meet pr comments

* set new storage to weight and bias for weight fusion (intel#2839)

* Add parameter processor for LARS (#2832)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* implement lars whole layer gradient norm calculation

* change random seed in UT

* add limitation on "trust" of LARS, remove debug output

* reformat

* add tests in DirtriOptimizer for LARS

* reformat

* update parameters in UT

* update parameters in UT

* Add transformer to LM example (intel#2835)

* add transformer to LM example

* refactor dropout in Transformer

* meet pr comments

* feat: MKLDNN LSTM unidirectional/bidirectional backward support (#2840)

* MKLDNN LSTM backward support with accuracy testing

* fix: require consistent between shape and layout of mkldnn (intel#2824)

* fix: fusion for multi-group of convolution (intel#2826)

* fix: support int8 of jointable (#2827)

* fix: support int8 of jointable
* doc: add more docs

* fix: invokeAndWait2 should throw the exception in the tasks (intel#2843)

* fix acc bug & init dnn thread (intel#2841)

* support tnc and ntc conversion (intel#2844)

* support ntc in dnn layer (intel#2847)

* support ntc in dnn layer

* meet pr comments

* [WIP]Add beam search feature in transformer model (intel#2834)

* add beam search feature

* Update beam search feature and unit test

* add symbolToLogits function set check

* update clearState and add serial test

* add SequenceBeamSearch to python layers

* add createSequenceBeamSearch method to python api

* feat: add a property to disable omp thread affinity (intel#2849)

* fix: use treeset to calc topk to upgrade the performance of DetectionOutputSSD (intel#2853)

* fix: wrong affinity settings (intel#2857)

* update beam search feature for interface with transformer model (#2855)

* update beam search for padding value and cache structure

* update python API for beam search

* add comments and update python layer

* modify comments format

* modify comments format

* Support converting blas lstm to dnn lstm (#2846)

* convert from blas lstm to dnn lstm

* meet pr comments

* fix load lstm error bug (intel#2858)

* Add beam search in transformer (intel#2856)

* Add beam search in transformer

* meet pr comments

* fix: upgrade the performance of normalize (intel#2854)

* feat: add axis to softmax (intel#2859)

* add release doc for 0.9 (intel#2862)

* fix: update core ref to master (intel#2865)

* flip version to 0.10.0 (intel#2869)

* [Bug Fix] - Fix module version comparison  (intel#2871)

* update serialization

* update serialization

* convert IRgraph momentum to mkldnn (intel#2872)

* tutorial fix (intel#2879)

* feat: RoiAlign Forward (intel#2874)

* Add set input output format API in Python (intel#2880)

* add set input output format

* add static graph check

* feat: Feature Pyramid Networks Forward (intel#2870)

* fix memory leak for ir graph training (intel#2895)

* add gemm layer (#2882)

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add gemm layer

* add gemm layer

* add Shape layer (intel#2885)

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add Gather layer (intel#2897)

* add gather layer

* [New feature] Add maskhead (intel#2892)

* support for maskhead

* fix unit tests (intel#2905)

* modify  predict/predictClass function  (#2868)

* predictClass output modification

* predict/predictClass function modification in Beta Api

* predict/predictClass function modification

* predict/predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* [New feature] Add Boxhead (intel#2894)

* add boxhead

* add SerialTest

* meet pr comments

* fix: Add TopBlocks to Feature Pyramid Networks (FPN) (#2899)

* Add Mean Average Precision validation method (intel#2906)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* fix boxhead unit tests (#2912)

* python api nested list input and pooler python api (intel#2900)

* Auto memory management for MKLDNN (#2867)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* style fixes

* change _implicitMemoryOwner -> _this

* [New feature] Add region proposal (intel#2896)

* add Regionproposal

* [New feature] add maskrcnn (#2908)

* add maskrcnn

* fix mask head

* move maskrcnn to models

* add maskrcnn serialTest

* Add Onnx Supported Layers (intel#2902)

* remove duplicated layers

* Update RoiLabel class and add RoiImageFeatureToBatch (intel#2913)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* update RoiLabel, add RoiImageFeatureToBatch

* fix typo in class name

* updates by suggestions

* minor updates

* Move RoiMiniBatch to MTImageFeatureToBatch.scala

* mask in RoiLabel now have Floats not Bytes

* use IndexedSeq for RoiLabel

* style fix

* add isCrowd and origSize to final target table

* style fix

* isCrowd change to float, add doc

* add tests and bug fixes

* add util getting RoiLabels from ImageFeatures

* add util getting RoiLabels from Table

* comment out the tests

* rename utils in RoiLabel

* feat: MKLDNN GRU forward/backward support (#2893)

* Onnx support: modify unsqueeze function (#2910)

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* add maskutils (intel#2921)

* add maskutils

* update tests & docs

* fix typo in document

* Fix memory leaks on training (intel#2914)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* fix memory leak in batch norm

* style fixes

* change _implicitMemoryOwner -> _this

* release submat

* release opencv submats

* support samples with different size  to one mini batch (intel#2929)

* add to batch with resize

* meet comments

* support batch for mask head and pooler (intel#2926)

* support batch for mask head

* meet comments

* Onnx support: add a dim parameter to ops.Gather (intel#2920)

* add dim parameter to ops.Gather

* improve and simplify code

* improve and simplify code

* improve and simplify code

* improve and simplify code

* support batch for regionproposal (#2928)

* support batch for regionproposal

* enable gru blas-to-dnn conversion (intel#2930)

* Onnx support: add pos parameter to softmax (intel#2933)

* add pos parameter to softmax

* add pos parameter to softmax

* add pos parameter to softmax

* fix review problem

* fix review problem

* Add resize for segmentation (intel#2923)

* add resize for segmentation

* meet pr comments

* support batch input for boxhead (#2924)

* boxhead support batch input

* meet pr comments

* COCO SeqFile (intel#2927)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* ignore non-existing images

* updates based on GH comments

* ONNX Support (#2918)

* onnx dev

* add onnx loader

* clean up

* feat: add precision recall auc (#2941)

* feat: add precision recall auc

* add post processing for maskrcnn model (#2931)

* add mask postprocessing

* put image info to mask model

* fix TimeDistributedCriterion() lack of parameter of dimension issue (intel#2940)

* revert back api (intel#2943)

* fix: softmax and bn+scale fusion (intel#2937)

* feat: multi models support with MKL-DNN backend (intel#2936)

* feat: multi models support with MKL-DNN backend

* add COCO MAP (#2935)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* add COCO MAP

* revert merge conflict

* ignore non-existing images

* add IOU related API. MAP now parses RLEs

* BBox now inclusive

* updates based on GH comments

* add COCODataset.getImageById

* COCO topK default => -1, remove height: Int, width: Int in GroundTruthRLE

* update imageId2Image

* rename MAPObjectDetection utils, add cocoSegmentationAndBBox, refine formatting

* rename utils

* update documents

* check size of bbox & classes & scores & labels & iscrowd. Handle empty predictions

* add gt and target image size checking, add support for empty target bbox, add UT

* detection sorted before matching with GT. Optimize MAPResult merging. Add UT for merging

* COCO Seq file reader: grey to bgr (intel#2942)

* grey to bgr

* refactor isGrayScaleImage

* simplify grey scale image checking

* Add the flushing denormal values option on BigDL side (#2934)

* add no argument apply api for softmax (intel#2945)

* add no argument apply api for softmax

* add no argument apply api for softmax

* ONNX ResNet example (intel#2939)

* add onnx resnet example

* add doc for onnx

* add doc for onnx

* clean up

* add maskrcnn inference example (intel#2944)

* add maskrcnn inference example

* meet pr comments

* add model download url

* Update the RoiLabel and MTImageFeatureToBatch (intel#2925)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* Python MKLDNN examples for CNN(LeNet) and RNN(LSTM) (#2932)

* fix: takeSample only works for dnn backend and get one batch (intel#2947)

* fix: takeSample only works for dnn backend and get one batch

* edit doc (#2948)

* Rename filesToRoiImageFrame to filesToRoiImageFeatures (intel#2949)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* filesToRoiImageFrame -> filesToRoiImageFeatures, to public

* fix: move out setMklThreads of MklDnn (intel#2950)

* memory data cleanup (#2956)

* memory data cleanup

* Onnx support: RoiAlign and TopK parameter update (#2957)

* Topk add dim and increase parameter

* RoiAlign add max pooling mode

* add test cases

* add test cases

* remove masks requirements (intel#2959)

* fix: the squeeze should not be included in IRElement (intel#2962)

* enhance COCODataset (#2954)

* enhance COCODataset:
Add COCODataset.loadFromSeqFile
Add COCODataset.toImageFeatures
Add COCOImage.toTable

* rename and polish doc

* fix COCO serialize bug

* fix typo in function name

* typo fix (intel#2965)

* rename RoiImageFeatureToBatch APIs (#2964)

* RoiMiniBatch enhancement (#2953)

* SerializableIndexedSeq

* allow empty target & image size info

* rename RoiImageFeatureToBatch APIs

* set as private

* change back to array

* MTImageFeatureToBatch without labels

* handle iscrowd

* remove duplication in merge

* feat: add softmax backward (intel#2967)

* feat: add softmax backward

* fix: fuse bn scale and relu to bn. (intel#2966)

* fix: fuse bn scale and relu.

* fix mask unit tests (intel#2973)

* fix: nms stability when using treeset. (intel#2972)

* flip version to 0.11 (intel#2974)

* refactor anchor generator (#2963)

* refactor anchor generator

* meet pr comments

* fix code style

* ROIAlign refactor (intel#2960)

* ROIAlign refactor

* fix unit tests

* fix model load of maskrcnn (intel#2961)

* fix maskrcnn model load

* delete temp file

* fix maskrcnn tests

* support roialign backward (intel#2975)

* support roialign backward

* fix sparselinear unit test

* fix: bn nhwc error, the channel should be the last dim (#2981)

* refactor: move torch relevants unit tests to integration tests. (intel#2971)

* fix: enable integration accuracy tests (intel#2976)

* fix: softmax dnn backend wrong order of primitive (intel#2986)

* modify TextClassifier.scala (#2987)

* Add a method to merge nested StaticGraphs (intel#2985)

* NHWC support when running with MKL-DNN (#2989)

* support NHWC for MKLDNN

* fix unit tests

* Keras with MKL-DNN backend support (#2990)

* Update README.md

* Update README.md

* feat: add distri optimizer v2 (intel#2992)

* update error message in AllReduceParameter (#2997)

* update error message in AllReduceParameter

* use tensorflow proto jar (#2994)

* fix callBigDLFunc (intel#3002)

* Remove final for AbstractModule (intel#3001)

* DistriOptimizerV2 argument (intel#3003)

* call DistriOptimizerV2

* fix inception (intel#3010)

* fix top1 and treenn (intel#3011)

* remove final setExtraParameters (#3014)

* move pretrain in DistriOptimizerV2 (intel#3016)

* move getData

* rename

* remove time counting

* deprecate dlframe (intel#3012)

* deprecate dlframe

* fix throughput (#3017)

* fix throughput

* update

* add release doc for 0.10.0 (intel#3020)

* test examples by distrioptimizerv2 (intel#3007)

* enable scala examples by distrioptimizerv2

* update example's readme

* update integration test

* test python examples by distriOptimizerV2 (intel#3008)

* Test python examples by distriOptimizerV2

* deprecate nn.keras (intel#3013)

* deprecate nn.keras

* fix loss when minibatch size is different (intel#3021)

* fix loss

* fix ut

* fix style check (intel#3022)

* specify pyspark version (intel#3030)

* specify pyspark version

* add release doc for 0.11 (#3026)

* flip version to 0.12 (intel#3029)



* update

* fix KerasLayer new parameters() (#3034)

* Fix analytics zoo protobuf shading problem (intel#3033)

* change shade name and remove protobuf-java (already introduced by tf)

* remove protobuf

* add required dependencies (#3047)

* update doc (intel#3056)

* Updatedoc (#3060)

* Update install-from-pip.md

* [WIP] spark 3.0 (intel#3054)

* spark 3.0

* add spark3.0 deployment (intel#3061)

* add spark3.0 deployment

* add warning to remind Optimizer() deprecates (intel#3062)

* add warning to remind deprecates

* Update scala maven plugin (#3068)

* update scala maven plugin

* change to public (#3064)

* Add big model support (#3067)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* squeeze target dimension (corner case) in ClassNLLCriterion (intel#3072)

* fix target dimension match error

* update message (#3073)

* flip version to 0.13-snapshot (intel#3074)

* flip version to 0.13-snapshot

* Uncompressed Tensor  (intel#3079)

* support no compressing parameter

* address comments

* hotfix ClassNLLCriterion with cloned target (#3081)

* hotfix ClassNLLCriterion with cloned target

* Fix SerializationUtils clone issue of QuantizedTensor (intel#3088)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* update clone quantizedtensor

* update

* add OptimPredictorShutdownSpec UT in integration test (#3089)

* move integration UT to a general test script (intel#3094)

* back port master (intel#3096)

* set seed to avoid random error in PredictionServiceUT (intel#3097)

* Jdk11 support (intel#3098)

* update for jdk 11 support and doc

* add serializeUid (intel#3099)

* update doc (intel#3104)

* add doc for running in ide (intel#3106)

* fix callBigDLFunc return a Int while the true return value from java is a byte array. (intel#3111)

* add list of df support (intel#3113)

* Update readme (intel#3118)

* Update index.md

* add 0.12.2 release download (#3122)

* remove DLFrames (intel#3124)

* remove DLFrames

* update

* update

* update

* rm dlframe example from test script

* Add Utest about dividing zero (#3128)

* Add Utest about dividing zero

* add Utest and zero check of LocalData

* add Utest and zero check of LocalData

* change

* Add Utest about dividing zero

* fix test

* add python3 to Dockerfile (intel#3132)

* add python3 to Dockerfile

* update

* update jdk

* update

* make default DistriOptimizer as V2 (intel#3129)

* make default DistriOptimizer as V2

* update

* fix dlframe (intel#3133)

* DistriOptimizerV2 logger (intel#3135)

* DistriOptimizerV2 logger

* update

* fix style check

* validate epoch num

* move dlframe SharedParamsApater to AZ and roll back to OptimizerV1 (intel#3137)

* upgrade spark version (intel#3138)

* Update deploy-spark2.sh

* 0.13 release doc (#3144)

* upgrade log4j (intel#3141)

* flip0.14 (intel#3142)

* flip0.14

* update

* Update deploy-spark3.sh (#3145)

* update

* update

* update

* update

* fix make dist

* migrate path

* update

* update

Co-authored-by: zhangxiaoli73 <[email protected]>
Co-authored-by: Jerry Wu <[email protected]>
Co-authored-by: Xin Qiu <[email protected]>
Co-authored-by: Yanzhang Wang <[email protected]>
Co-authored-by: GenBrg <[email protected]>
Co-authored-by: LeicongLi <[email protected]>
Co-authored-by: Emiliano Martinez <[email protected]>
Co-authored-by: abdolence <[email protected]>
Co-authored-by: Enrique Garcia <[email protected]>
Co-authored-by: Louie Tsai <[email protected]>
Co-authored-by: yaochi <[email protected]>
Co-authored-by: Menooker <[email protected]>
Co-authored-by: Menooker <[email protected]>
Co-authored-by: Firecrackerxox <[email protected]>
Co-authored-by: majing921201 <[email protected]>
Co-authored-by: jenniew <[email protected]>
Co-authored-by: Xiao <[email protected]>
Co-authored-by: Firecrackerxox <[email protected]>
Co-authored-by: Hui Li <[email protected]>
Co-authored-by: Jason Dai <[email protected]>
Co-authored-by: dding3 <[email protected]>
Co-authored-by: Yang Wang <[email protected]>
Co-authored-by: Yina Chen <[email protected]>
Co-authored-by: Hangrui Cao <[email protected]>
Co-authored-by: pinggao18 <[email protected]>
Le-Zheng added a commit to Le-Zheng/analytics-zoo that referenced this pull request Sep 22, 2021
* convert static graph to IR graph and build (intel#2711)

* add static graph to IR graph

* meet pr comments

* [Enhancement] - Enhance unig test to avoid dynamic resource allocation issue by docker (intel#2713)

* make the core number fixed

* fix local predictor

* add Trigger and/or python API (intel#2682)

* add spark 2.4 support (intel#2715)

* update sparse tensor's document (#2714)

* Reserve all state in OptimMethod when calling Optimizer.optimize() multiple times (#2648)

* reserve optimMethod for each worker

* add valdiation throughput

* cache variable previousOptim

* fix: move mkldnn computing to a single thread pool (intel#2724)

Because if we use the parent thread directly, there will be two bugs,
1. The child threads forked from parent thread will be bound to core 0
because of the affinity settings.
2. The native thread has some unknown thread local variables. So if
the parent thread exits and is recreated, such as the thread from
Executors.newFixedThreadPool. The whole app will be segment fault.
The parent thread means the main thread (Local Mode) or worker thread of
mapPartition (Distributed Mode).

* add ceilMode for Pooling & fix batchNorm evaluate (#2708)

* add ceilMode for Pooling & fix batchNorm evaluate

* add training status for dnn layer

* fix comments

* fix IRGraph init & Add regualizer (#2736)

* fix IRGraph init & Add regualizer

* meet review comments

* fix: update mkldnn version to v0.17 issues. (intel#2712)

There're two issues,

1. the padding tensor required. mkl-dnn will use a padding tensor which
    will use more memory, such as 4x1x28x28 to 4x8x28x28(avx2). It will
    pad to times of simd width.
2. the TensorMMap between DenseTensor and DnnTensor. Previous impl
    will allocate DnnTensor when model is created, which will cost too much
    space. So this patch will allocate it at runtime.

* add computshape for some layers and add skip primitives in DnnGraph (intel#2740)

* add computshape for some layer and add skip primitives in DnnGraph

* meet pr comments

* Improve documentation (intel#2745)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* include edge case to cover all the data types (#2742)

* layer auto fusion for dnn graph (intel#2746)

* add auto fusion in dnn graph

* refactor predict for dnn model (intel#2737)

* refactor predict for dnn model

* remove some unit tests (intel#2752)

* remove some conflict tests (#2753)

* Update documentation (intel#2749)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Fix Add operation error when type is Double importing Tensorflow graph (#2721)

* feature: add byte supports for DnnTensor (intel#2751)

* feat: add byte supports for DnnTensor

* [New Feature] Calculating Scales (#2750)

* [New Feature]Calculating Scales

* recursively update mask for container module (intel#2754)

* recursively update mask for container module

* [Enhancement] - Speed up BlasWrapper performance under MKL-DNN (intel#2748)

* add parallel in Blaswrapper

* refactor to support ssd

* meet pr comments

* fix logger serialize

* Loss Function docs improvement (intel#2757)

* Improve Loss Function docs v2

* change asInstanceOf to toDistirbuted in optimizer (#2755)

* change asInstanceOf to toDistirbuted

* change asInstanceOf to toDistirbuted

* convert scale in blas to dnn (#2758)

* convert scale in blas to dnn

* meet pr comment

* feat: reorder for int8 supports (#2756)

1. Because the new data type, we should add a new attribute called dataType
    to the `MemoryData`.
2. Because we should transfer the scales between FP32->int8 and Int8->FP32.
    we should add two new attributes called `mask` and `scales`.

* fix conversion accuracy (intel#2760)

*  fix accuracy for saved model

* exclude mkldnn model when conversion

* feature: layer wise supports of int8 (intel#2762)

Enable the int8 data type in layers, especially for convolutions.
So for a specific layer, it can accept a int8 input. If you want to the fp32
output, should add a reorder.

* feature: mkldnn int8 layer wise supports (intel#2759)

including 3 steps.

1. generate scales of model.
   need an api like `generateScalesWithMask` to generate the scales of
   fp32 model. and the model returned is an fp32 model too.
2. quantize the model
   the `quantize()` api will be compatible with the `bigquant`
   backend, which will set the quantize flag. And when doing compile,
   the quantized weight, output, input will be generated by mkldnn at
   runtime.
3. do the inference (forward).

* update readme for v1 training (intel#2763)

* update release doc for preparation (intel#2764)

* change some docs about mkldnn (intel#2765)

* add comments about mkldnn

* meet pr comments

* examples for int8 (intel#2761)

This is an example of how to use mkldnn int8. There're two steps, use
GenInt8Scales to generate the scales first and save the new model. And than you
can use the quantized model as usual.

* enable fustion by default (intel#2766)

* fix: the influence of default value of fusion (#2768)

* fix: use too much memory of mkldnn models (intel#2783)

* fix: inplace of input/output and weight dimension error (intel#2779)

Some layer's input and output use the same memory. We can't do forward in the
`calcScales`. Because at that time, the input has been changed, its scales maybe
not right. Such as,

Seqeuntail().add(Conv).add(ReLU)

it will do two steps, seq.forward(input) first. and when go into the ReLU, it
will do another forward, so the input will be the output. And scales will be
wrong.

For convolution's weight, the dimension always is 5, although the group number
is 1. But for dnn convolution, if there's no group, the weight's dimension
should be 4.

* fix: the blas wrapper has no scales (intel#2778)

* fix softmax (intel#2777)

* fix: performance regression on resnet50 (intel#2774)

the u8 to s8 or s8 to u8 needs no reorder on this case.

* fix log init (#2781)

* fix: dropout should init primitive (#2789)

* Docs update for spark 2.3, build 0.7 and deps exlude (intel#2671)

* flip to 0.9.0 (intel#2792)

* Improve Layer documentation v1 (#2767)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Loss Function docs improvement v1

* Improve Loss Function docs v2

* Improve Layers documentation

* Improve documentation on Activations

* minor fix

* Update a code section with python style on Metrics.md (intel#2665)

* [Fix] doc : some changes for scalaUserGuide and release links according to … (intel#2791)

* doc : some changes for scalaUserGuide and release links according to v0.8.0 release

* Update build-bigdl-core.md

* Update build-bigdl-core.md

* test: should compare the right grad input (intel#2794)

* fix the wrong error message (#2800)

* [New feature] Add attention layer and ffn layer (intel#2795)

* add attention layer

* add ffn layer and more unit tests

* refactor according to pr comments

* add SerializationTest

* fix unit tests

* add python api

* update readme with newly adopted mkl-dnn (#2803)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer (intel#2802)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer:
Add LARS optimizer: Layer-wise scaled. Also with utility functions to build a set of LARS optim for a container.

Bug fix: The gradient block id of AllReduceParameter is originally composed of {id}{pidTo}gradientBytes{pidFrom}. But the combination of {id}{pidTo} will cause ambiguity. e.g., "112" can be {1}{12} or {11}{2}. Now a "_" is added to separate id from pidTo

* refine documents, correctly set the lrSchedulerOwner bit

* format the added code

* make Lars inherit SGD

* rename Lars -> LarsSGD and reformat

* style changes

* bugfix - set mask for container (intel#2807)

* bugfix - set mask for container

* bugfix #2805: set dimension mask

* Update Graph.scala

* Update Graph.scala

* change set mask indicator's name

* rename set mask params

* [Enhancement]: Scala Reflection: get default value for constructor parameters (intel#2808)

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* resolve conflict

* resolve conflict

* code style check

* remove print

* fix typos

fix typos

* replace randomcropper with centercrop for better performance (#2818)

* fix: memory data hash code should contain data type (intel#2821)

* Optimize backward graph generation and CAddTable (intel#2817)

* Optimize backward graph generation and caddtable

* refine add table

* change api name

* add layer norm and expand size layers (#2819)

* add layer norm and expand size

* meet pr comments

* feat: enable global average pooling (intel#2823)

* feat: enable global average pooling

* test: add more unit tests

* Optimizers: use member variable in parent class

* Revert "Optimizers: use member variable in parent class"

This reverts commit 7e47204

* Dilation in MKL-DNN Convolution (intel#2815)

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* fix typos

fix typos

* make todo all uppercase

* fix: calculate arbitrary mask of scales (intel#2822)

* Use one AllReduceParameter for multi-optim method  training (intel#2814)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* change random seed in UT

* [New feature] add transformer layer (intel#2825)

* add transformer

* refactor class name

* use same embedding for translation

* fix pr comments

* [Bug Fix] Fix Issue 2734 (#2816)

* fix issue 2734

* fix issue 2734

* fix issue 2734

* [Refactor] Reflection Utilization (#2831)

* refactor reflection utils

* refactor reflection utils

* feat: MKLDNN LSTM unidirectional/bidirectional inference support (intel#2806)

* LSTM draft

* MKLDNN LSTM fixed MD

* added hiddenSize

* setMemoryData NativeData

* weights NativeData format set to ldigo, all 1 test passed

* fixed format any problem

* LSTM weights bias initialisation

* add LSTM2 in nn

* Bidirectional LSTM inference enabled

* modified Bidirectional test

* LSTMSpec input format conversion bug between bigdl and mkldnn fixed, not support random weights, bias

* fixed the last problem 1 3 2 4

* Three inference tests with randomly generated parameters

* Added comments and modified the LSTMSpec (tests using Equivalent.nearequals)

* Deleted nn/LSTM2. Renamed methods. Added a requirement in nn/TimeDistributed

* combined initMemoryDescs() into initFwdPrimitives()

* Add require for input size and hidden size matching if layers of LSTM is more than one

* Refactor RNN

* Add comment on gate order to mkldnn/RNN

* Add unidirectional multilayer test

* add comments/ modify UTs

* phase is not used anymore/ use isTraining() in stead

* operationWant enhanced/ weight init/ release() parameters()

* remove input format check and change some variables names

* input format check / throw exception print info / release code

* comment style and RNNSerialTest

* remove unnecessary comments

* Softmax -> SoftMax (#2837)

* bug fix for cmul (intel#2836)

* bug fix for cmul

* meet pr comments

* set new storage to weight and bias for weight fusion (intel#2839)

* Add parameter processor for LARS (#2832)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* implement lars whole layer gradient norm calculation

* change random seed in UT

* add limitation on "trust" of LARS, remove debug output

* reformat

* add tests in DirtriOptimizer for LARS

* reformat

* update parameters in UT

* update parameters in UT

* Add transformer to LM example (intel#2835)

* add transformer to LM example

* refactor dropout in Transformer

* meet pr comments

* feat: MKLDNN LSTM unidirectional/bidirectional backward support (#2840)

* MKLDNN LSTM backward support with accuracy testing

* fix: require consistent between shape and layout of mkldnn (intel#2824)

* fix: fusion for multi-group of convolution (intel#2826)

* fix: support int8 of jointable (#2827)

* fix: support int8 of jointable
* doc: add more docs

* fix: invokeAndWait2 should throw the exception in the tasks (intel#2843)

* fix acc bug & init dnn thread (intel#2841)

* support tnc and ntc conversion (intel#2844)

* support ntc in dnn layer (intel#2847)

* support ntc in dnn layer

* meet pr comments

* [WIP]Add beam search feature in transformer model (intel#2834)

* add beam search feature

* Update beam search feature and unit test

* add symbolToLogits function set check

* update clearState and add serial test

* add SequenceBeamSearch to python layers

* add createSequenceBeamSearch method to python api

* feat: add a property to disable omp thread affinity (intel#2849)

* fix: use treeset to calc topk to upgrade the performance of DetectionOutputSSD (intel#2853)

* fix: wrong affinity settings (intel#2857)

* update beam search feature for interface with transformer model (#2855)

* update beam search for padding value and cache structure

* update python API for beam search

* add comments and update python layer

* modify comments format

* modify comments format

* Support converting blas lstm to dnn lstm (#2846)

* convert from blas lstm to dnn lstm

* meet pr comments

* fix load lstm error bug (intel#2858)

* Add beam search in transformer (intel#2856)

* Add beam search in transformer

* meet pr comments

* fix: upgrade the performance of normalize (intel#2854)

* feat: add axis to softmax (intel#2859)

* add release doc for 0.9 (intel#2862)

* fix: update core ref to master (intel#2865)

* flip version to 0.10.0 (intel#2869)

* [Bug Fix] - Fix module version comparison  (intel#2871)

* update serialization

* update serialization

* convert IRgraph momentum to mkldnn (intel#2872)

* tutorial fix (intel#2879)

* feat: RoiAlign Forward (intel#2874)

* Add set input output format API in Python (intel#2880)

* add set input output format

* add static graph check

* feat: Feature Pyramid Networks Forward (intel#2870)

* fix memory leak for ir graph training (intel#2895)

* add gemm layer (#2882)

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add gemm layer

* add gemm layer

* add Shape layer (intel#2885)

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add Gather layer (intel#2897)

* add gather layer

* [New feature] Add maskhead (intel#2892)

* support for maskhead

* fix unit tests (intel#2905)

* modify  predict/predictClass function  (#2868)

* predictClass output modification

* predict/predictClass function modification in Beta Api

* predict/predictClass function modification

* predict/predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* [New feature] Add Boxhead (intel#2894)

* add boxhead

* add SerialTest

* meet pr comments

* fix: Add TopBlocks to Feature Pyramid Networks (FPN) (#2899)

* Add Mean Average Precision validation method (intel#2906)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* fix boxhead unit tests (#2912)

* python api nested list input and pooler python api (intel#2900)

* Auto memory management for MKLDNN (#2867)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* style fixes

* change _implicitMemoryOwner -> _this

* [New feature] Add region proposal (intel#2896)

* add Regionproposal

* [New feature] add maskrcnn (#2908)

* add maskrcnn

* fix mask head

* move maskrcnn to models

* add maskrcnn serialTest

* Add Onnx Supported Layers (intel#2902)

* remove duplicated layers

* Update RoiLabel class and add RoiImageFeatureToBatch (intel#2913)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* update RoiLabel, add RoiImageFeatureToBatch

* fix typo in class name

* updates by suggestions

* minor updates

* Move RoiMiniBatch to MTImageFeatureToBatch.scala

* mask in RoiLabel now have Floats not Bytes

* use IndexedSeq for RoiLabel

* style fix

* add isCrowd and origSize to final target table

* style fix

* isCrowd change to float, add doc

* add tests and bug fixes

* add util getting RoiLabels from ImageFeatures

* add util getting RoiLabels from Table

* comment out the tests

* rename utils in RoiLabel

* feat: MKLDNN GRU forward/backward support (#2893)

* Onnx support: modify unsqueeze function (#2910)

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* add maskutils (intel#2921)

* add maskutils

* update tests & docs

* fix typo in document

* Fix memory leaks on training (intel#2914)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* fix memory leak in batch norm

* style fixes

* change _implicitMemoryOwner -> _this

* release submat

* release opencv submats

* support samples with different size  to one mini batch (intel#2929)

* add to batch with resize

* meet comments

* support batch for mask head and pooler (intel#2926)

* support batch for mask head

* meet comments

* Onnx support: add a dim parameter to ops.Gather (intel#2920)

* add dim parameter to ops.Gather

* improve and simplify code

* improve and simplify code

* improve and simplify code

* improve and simplify code

* support batch for regionproposal (#2928)

* support batch for regionproposal

* enable gru blas-to-dnn conversion (intel#2930)

* Onnx support: add pos parameter to softmax (intel#2933)

* add pos parameter to softmax

* add pos parameter to softmax

* add pos parameter to softmax

* fix review problem

* fix review problem

* Add resize for segmentation (intel#2923)

* add resize for segmentation

* meet pr comments

* support batch input for boxhead (#2924)

* boxhead support batch input

* meet pr comments

* COCO SeqFile (intel#2927)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* ignore non-existing images

* updates based on GH comments

* ONNX Support (#2918)

* onnx dev

* add onnx loader

* clean up

* feat: add precision recall auc (#2941)

* feat: add precision recall auc

* add post processing for maskrcnn model (#2931)

* add mask postprocessing

* put image info to mask model

* fix TimeDistributedCriterion() lack of parameter of dimension issue (intel#2940)

* revert back api (intel#2943)

* fix: softmax and bn+scale fusion (intel#2937)

* feat: multi models support with MKL-DNN backend (intel#2936)

* feat: multi models support with MKL-DNN backend

* add COCO MAP (#2935)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* add COCO MAP

* revert merge conflict

* ignore non-existing images

* add IOU related API. MAP now parses RLEs

* BBox now inclusive

* updates based on GH comments

* add COCODataset.getImageById

* COCO topK default => -1, remove height: Int, width: Int in GroundTruthRLE

* update imageId2Image

* rename MAPObjectDetection utils, add cocoSegmentationAndBBox, refine formatting

* rename utils

* update documents

* check size of bbox & classes & scores & labels & iscrowd. Handle empty predictions

* add gt and target image size checking, add support for empty target bbox, add UT

* detection sorted before matching with GT. Optimize MAPResult merging. Add UT for merging

* COCO Seq file reader: grey to bgr (intel#2942)

* grey to bgr

* refactor isGrayScaleImage

* simplify grey scale image checking

* Add the flushing denormal values option on BigDL side (#2934)

* add no argument apply api for softmax (intel#2945)

* add no argument apply api for softmax

* add no argument apply api for softmax

* ONNX ResNet example (intel#2939)

* add onnx resnet example

* add doc for onnx

* add doc for onnx

* clean up

* add maskrcnn inference example (intel#2944)

* add maskrcnn inference example

* meet pr comments

* add model download url

* Update the RoiLabel and MTImageFeatureToBatch (intel#2925)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* Python MKLDNN examples for CNN(LeNet) and RNN(LSTM) (#2932)

* fix: takeSample only works for dnn backend and get one batch (intel#2947)

* fix: takeSample only works for dnn backend and get one batch

* edit doc (#2948)

* Rename filesToRoiImageFrame to filesToRoiImageFeatures (intel#2949)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* filesToRoiImageFrame -> filesToRoiImageFeatures, to public

* fix: move out setMklThreads of MklDnn (intel#2950)

* memory data cleanup (#2956)

* memory data cleanup

* Onnx support: RoiAlign and TopK parameter update (#2957)

* Topk add dim and increase parameter

* RoiAlign add max pooling mode

* add test cases

* add test cases

* remove masks requirements (intel#2959)

* fix: the squeeze should not be included in IRElement (intel#2962)

* enhance COCODataset (#2954)

* enhance COCODataset:
Add COCODataset.loadFromSeqFile
Add COCODataset.toImageFeatures
Add COCOImage.toTable

* rename and polish doc

* fix COCO serialize bug

* fix typo in function name

* typo fix (intel#2965)

* rename RoiImageFeatureToBatch APIs (#2964)

* RoiMiniBatch enhancement (#2953)

* SerializableIndexedSeq

* allow empty target & image size info

* rename RoiImageFeatureToBatch APIs

* set as private

* change back to array

* MTImageFeatureToBatch without labels

* handle iscrowd

* remove duplication in merge

* feat: add softmax backward (intel#2967)

* feat: add softmax backward

* fix: fuse bn scale and relu to bn. (intel#2966)

* fix: fuse bn scale and relu.

* fix mask unit tests (intel#2973)

* fix: nms stability when using treeset. (intel#2972)

* flip version to 0.11 (intel#2974)

* refactor anchor generator (#2963)

* refactor anchor generator

* meet pr comments

* fix code style

* ROIAlign refactor (intel#2960)

* ROIAlign refactor

* fix unit tests

* fix model load of maskrcnn (intel#2961)

* fix maskrcnn model load

* delete temp file

* fix maskrcnn tests

* support roialign backward (intel#2975)

* support roialign backward

* fix sparselinear unit test

* fix: bn nhwc error, the channel should be the last dim (#2981)

* refactor: move torch relevants unit tests to integration tests. (intel#2971)

* fix: enable integration accuracy tests (intel#2976)

* fix: softmax dnn backend wrong order of primitive (intel#2986)

* modify TextClassifier.scala (#2987)

* Add a method to merge nested StaticGraphs (intel#2985)

* NHWC support when running with MKL-DNN (#2989)

* support NHWC for MKLDNN

* fix unit tests

* Keras with MKL-DNN backend support (#2990)

* Update README.md

* Update README.md

* feat: add distri optimizer v2 (intel#2992)

* update error message in AllReduceParameter (#2997)

* update error message in AllReduceParameter

* use tensorflow proto jar (#2994)

* fix callBigDLFunc (intel#3002)

* Remove final for AbstractModule (intel#3001)

* DistriOptimizerV2 argument (intel#3003)

* call DistriOptimizerV2

* fix inception (intel#3010)

* fix top1 and treenn (intel#3011)

* remove final setExtraParameters (#3014)

* move pretrain in DistriOptimizerV2 (intel#3016)

* move getData

* rename

* remove time counting

* deprecate dlframe (intel#3012)

* deprecate dlframe

* fix throughput (#3017)

* fix throughput

* update

* add release doc for 0.10.0 (intel#3020)

* test examples by distrioptimizerv2 (intel#3007)

* enable scala examples by distrioptimizerv2

* update example's readme

* update integration test

* test python examples by distriOptimizerV2 (intel#3008)

* Test python examples by distriOptimizerV2

* deprecate nn.keras (intel#3013)

* deprecate nn.keras

* fix loss when minibatch size is different (intel#3021)

* fix loss

* fix ut

* fix style check (intel#3022)

* specify pyspark version (intel#3030)

* specify pyspark version

* add release doc for 0.11 (#3026)

* flip version to 0.12 (intel#3029)



* update

* fix KerasLayer new parameters() (#3034)

* Fix analytics zoo protobuf shading problem (intel#3033)

* change shade name and remove protobuf-java (already introduced by tf)

* remove protobuf

* add required dependencies (#3047)

* update doc (intel#3056)

* Updatedoc (#3060)

* Update install-from-pip.md

* [WIP] spark 3.0 (intel#3054)

* spark 3.0

* add spark3.0 deployment (intel#3061)

* add spark3.0 deployment

* add warning to remind Optimizer() deprecates (intel#3062)

* add warning to remind deprecates

* Update scala maven plugin (#3068)

* update scala maven plugin

* change to public (#3064)

* Add big model support (#3067)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* squeeze target dimension (corner case) in ClassNLLCriterion (intel#3072)

* fix target dimension match error

* update message (#3073)

* flip version to 0.13-snapshot (intel#3074)

* flip version to 0.13-snapshot

* Uncompressed Tensor  (intel#3079)

* support no compressing parameter

* address comments

* hotfix ClassNLLCriterion with cloned target (#3081)

* hotfix ClassNLLCriterion with cloned target

* Fix SerializationUtils clone issue of QuantizedTensor (intel#3088)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* update clone quantizedtensor

* update

* add OptimPredictorShutdownSpec UT in integration test (#3089)

* move integration UT to a general test script (intel#3094)

* back port master (intel#3096)

* set seed to avoid random error in PredictionServiceUT (intel#3097)

* Jdk11 support (intel#3098)

* update for jdk 11 support and doc

* add serializeUid (intel#3099)

* update doc (intel#3104)

* add doc for running in ide (intel#3106)

* fix callBigDLFunc return a Int while the true return value from java is a byte array. (intel#3111)

* add list of df support (intel#3113)

* Update readme (intel#3118)

* Update index.md

* add 0.12.2 release download (#3122)

* remove DLFrames (intel#3124)

* remove DLFrames

* update

* update

* update

* rm dlframe example from test script

* Add Utest about dividing zero (#3128)

* Add Utest about dividing zero

* add Utest and zero check of LocalData

* add Utest and zero check of LocalData

* change

* Add Utest about dividing zero

* fix test

* add python3 to Dockerfile (intel#3132)

* add python3 to Dockerfile

* update

* update jdk

* update

* make default DistriOptimizer as V2 (intel#3129)

* make default DistriOptimizer as V2

* update

* fix dlframe (intel#3133)

* DistriOptimizerV2 logger (intel#3135)

* DistriOptimizerV2 logger

* update

* fix style check

* validate epoch num

* move dlframe SharedParamsApater to AZ and roll back to OptimizerV1 (intel#3137)

* upgrade spark version (intel#3138)

* Update deploy-spark2.sh

* 0.13 release doc (#3144)

* upgrade log4j (intel#3141)

* flip0.14 (intel#3142)

* flip0.14

* update

* Update deploy-spark3.sh (#3145)

* update

* update

* update

* update

* fix make dist

* migrate path

* update

* update

Co-authored-by: zhangxiaoli73 <[email protected]>
Co-authored-by: Jerry Wu <[email protected]>
Co-authored-by: Xin Qiu <[email protected]>
Co-authored-by: Yanzhang Wang <[email protected]>
Co-authored-by: GenBrg <[email protected]>
Co-authored-by: LeicongLi <[email protected]>
Co-authored-by: Emiliano Martinez <[email protected]>
Co-authored-by: abdolence <[email protected]>
Co-authored-by: Enrique Garcia <[email protected]>
Co-authored-by: Louie Tsai <[email protected]>
Co-authored-by: yaochi <[email protected]>
Co-authored-by: Menooker <[email protected]>
Co-authored-by: Menooker <[email protected]>
Co-authored-by: Firecrackerxox <[email protected]>
Co-authored-by: majing921201 <[email protected]>
Co-authored-by: jenniew <[email protected]>
Co-authored-by: Xiao <[email protected]>
Co-authored-by: Firecrackerxox <[email protected]>
Co-authored-by: Hui Li <[email protected]>
Co-authored-by: Jason Dai <[email protected]>
Co-authored-by: dding3 <[email protected]>
Co-authored-by: Yang Wang <[email protected]>
Co-authored-by: Yina Chen <[email protected]>
Co-authored-by: Hangrui Cao <[email protected]>
Co-authored-by: pinggao18 <[email protected]>
Le-Zheng added a commit to Le-Zheng/analytics-zoo that referenced this pull request Sep 22, 2021
* convert static graph to IR graph and build (intel#2711)

* add static graph to IR graph

* meet pr comments

* [Enhancement] - Enhance unig test to avoid dynamic resource allocation issue by docker (intel#2713)

* make the core number fixed

* fix local predictor

* add Trigger and/or python API (intel#2682)

* add spark 2.4 support (intel#2715)

* update sparse tensor's document (#2714)

* Reserve all state in OptimMethod when calling Optimizer.optimize() multiple times (#2648)

* reserve optimMethod for each worker

* add valdiation throughput

* cache variable previousOptim

* fix: move mkldnn computing to a single thread pool (intel#2724)

Because if we use the parent thread directly, there will be two bugs,
1. The child threads forked from parent thread will be bound to core 0
because of the affinity settings.
2. The native thread has some unknown thread local variables. So if
the parent thread exits and is recreated, such as the thread from
Executors.newFixedThreadPool. The whole app will be segment fault.
The parent thread means the main thread (Local Mode) or worker thread of
mapPartition (Distributed Mode).

* add ceilMode for Pooling & fix batchNorm evaluate (#2708)

* add ceilMode for Pooling & fix batchNorm evaluate

* add training status for dnn layer

* fix comments

* fix IRGraph init & Add regualizer (#2736)

* fix IRGraph init & Add regualizer

* meet review comments

* fix: update mkldnn version to v0.17 issues. (intel#2712)

There're two issues,

1. the padding tensor required. mkl-dnn will use a padding tensor which
    will use more memory, such as 4x1x28x28 to 4x8x28x28(avx2). It will
    pad to times of simd width.
2. the TensorMMap between DenseTensor and DnnTensor. Previous impl
    will allocate DnnTensor when model is created, which will cost too much
    space. So this patch will allocate it at runtime.

* add computshape for some layers and add skip primitives in DnnGraph (intel#2740)

* add computshape for some layer and add skip primitives in DnnGraph

* meet pr comments

* Improve documentation (intel#2745)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* include edge case to cover all the data types (#2742)

* layer auto fusion for dnn graph (intel#2746)

* add auto fusion in dnn graph

* refactor predict for dnn model (intel#2737)

* refactor predict for dnn model

* remove some unit tests (intel#2752)

* remove some conflict tests (#2753)

* Update documentation (intel#2749)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Fix Add operation error when type is Double importing Tensorflow graph (#2721)

* feature: add byte supports for DnnTensor (intel#2751)

* feat: add byte supports for DnnTensor

* [New Feature] Calculating Scales (#2750)

* [New Feature]Calculating Scales

* recursively update mask for container module (intel#2754)

* recursively update mask for container module

* [Enhancement] - Speed up BlasWrapper performance under MKL-DNN (intel#2748)

* add parallel in Blaswrapper

* refactor to support ssd

* meet pr comments

* fix logger serialize

* Loss Function docs improvement (intel#2757)

* Improve Loss Function docs v2

* change asInstanceOf to toDistirbuted in optimizer (#2755)

* change asInstanceOf to toDistirbuted

* change asInstanceOf to toDistirbuted

* convert scale in blas to dnn (#2758)

* convert scale in blas to dnn

* meet pr comment

* feat: reorder for int8 supports (#2756)

1. Because the new data type, we should add a new attribute called dataType
    to the `MemoryData`.
2. Because we should transfer the scales between FP32->int8 and Int8->FP32.
    we should add two new attributes called `mask` and `scales`.

* fix conversion accuracy (intel#2760)

*  fix accuracy for saved model

* exclude mkldnn model when conversion

* feature: layer wise supports of int8 (intel#2762)

Enable the int8 data type in layers, especially for convolutions.
So for a specific layer, it can accept a int8 input. If you want to the fp32
output, should add a reorder.

* feature: mkldnn int8 layer wise supports (intel#2759)

including 3 steps.

1. generate scales of model.
   need an api like `generateScalesWithMask` to generate the scales of
   fp32 model. and the model returned is an fp32 model too.
2. quantize the model
   the `quantize()` api will be compatible with the `bigquant`
   backend, which will set the quantize flag. And when doing compile,
   the quantized weight, output, input will be generated by mkldnn at
   runtime.
3. do the inference (forward).

* update readme for v1 training (intel#2763)

* update release doc for preparation (intel#2764)

* change some docs about mkldnn (intel#2765)

* add comments about mkldnn

* meet pr comments

* examples for int8 (intel#2761)

This is an example of how to use mkldnn int8. There're two steps, use
GenInt8Scales to generate the scales first and save the new model. And than you
can use the quantized model as usual.

* enable fustion by default (intel#2766)

* fix: the influence of default value of fusion (#2768)

* fix: use too much memory of mkldnn models (intel#2783)

* fix: inplace of input/output and weight dimension error (intel#2779)

Some layer's input and output use the same memory. We can't do forward in the
`calcScales`. Because at that time, the input has been changed, its scales maybe
not right. Such as,

Seqeuntail().add(Conv).add(ReLU)

it will do two steps, seq.forward(input) first. and when go into the ReLU, it
will do another forward, so the input will be the output. And scales will be
wrong.

For convolution's weight, the dimension always is 5, although the group number
is 1. But for dnn convolution, if there's no group, the weight's dimension
should be 4.

* fix: the blas wrapper has no scales (intel#2778)

* fix softmax (intel#2777)

* fix: performance regression on resnet50 (intel#2774)

the u8 to s8 or s8 to u8 needs no reorder on this case.

* fix log init (#2781)

* fix: dropout should init primitive (#2789)

* Docs update for spark 2.3, build 0.7 and deps exlude (intel#2671)

* flip to 0.9.0 (intel#2792)

* Improve Layer documentation v1 (#2767)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Loss Function docs improvement v1

* Improve Loss Function docs v2

* Improve Layers documentation

* Improve documentation on Activations

* minor fix

* Update a code section with python style on Metrics.md (intel#2665)

* [Fix] doc : some changes for scalaUserGuide and release links according to … (intel#2791)

* doc : some changes for scalaUserGuide and release links according to v0.8.0 release

* Update build-bigdl-core.md

* Update build-bigdl-core.md

* test: should compare the right grad input (intel#2794)

* fix the wrong error message (#2800)

* [New feature] Add attention layer and ffn layer (intel#2795)

* add attention layer

* add ffn layer and more unit tests

* refactor according to pr comments

* add SerializationTest

* fix unit tests

* add python api

* update readme with newly adopted mkl-dnn (#2803)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer (intel#2802)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer:
Add LARS optimizer: Layer-wise scaled. Also with utility functions to build a set of LARS optim for a container.

Bug fix: The gradient block id of AllReduceParameter is originally composed of {id}{pidTo}gradientBytes{pidFrom}. But the combination of {id}{pidTo} will cause ambiguity. e.g., "112" can be {1}{12} or {11}{2}. Now a "_" is added to separate id from pidTo

* refine documents, correctly set the lrSchedulerOwner bit

* format the added code

* make Lars inherit SGD

* rename Lars -> LarsSGD and reformat

* style changes

* bugfix - set mask for container (intel#2807)

* bugfix - set mask for container

* bugfix #2805: set dimension mask

* Update Graph.scala

* Update Graph.scala

* change set mask indicator's name

* rename set mask params

* [Enhancement]: Scala Reflection: get default value for constructor parameters (intel#2808)

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* resolve conflict

* resolve conflict

* code style check

* remove print

* fix typos

fix typos

* replace randomcropper with centercrop for better performance (#2818)

* fix: memory data hash code should contain data type (intel#2821)

* Optimize backward graph generation and CAddTable (intel#2817)

* Optimize backward graph generation and caddtable

* refine add table

* change api name

* add layer norm and expand size layers (#2819)

* add layer norm and expand size

* meet pr comments

* feat: enable global average pooling (intel#2823)

* feat: enable global average pooling

* test: add more unit tests

* Optimizers: use member variable in parent class

* Revert "Optimizers: use member variable in parent class"

This reverts commit 7e47204

* Dilation in MKL-DNN Convolution (intel#2815)

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* fix typos

fix typos

* make todo all uppercase

* fix: calculate arbitrary mask of scales (intel#2822)

* Use one AllReduceParameter for multi-optim method  training (intel#2814)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* change random seed in UT

* [New feature] add transformer layer (intel#2825)

* add transformer

* refactor class name

* use same embedding for translation

* fix pr comments

* [Bug Fix] Fix Issue 2734 (#2816)

* fix issue 2734

* fix issue 2734

* fix issue 2734

* [Refactor] Reflection Utilization (#2831)

* refactor reflection utils

* refactor reflection utils

* feat: MKLDNN LSTM unidirectional/bidirectional inference support (intel#2806)

* LSTM draft

* MKLDNN LSTM fixed MD

* added hiddenSize

* setMemoryData NativeData

* weights NativeData format set to ldigo, all 1 test passed

* fixed format any problem

* LSTM weights bias initialisation

* add LSTM2 in nn

* Bidirectional LSTM inference enabled

* modified Bidirectional test

* LSTMSpec input format conversion bug between bigdl and mkldnn fixed, not support random weights, bias

* fixed the last problem 1 3 2 4

* Three inference tests with randomly generated parameters

* Added comments and modified the LSTMSpec (tests using Equivalent.nearequals)

* Deleted nn/LSTM2. Renamed methods. Added a requirement in nn/TimeDistributed

* combined initMemoryDescs() into initFwdPrimitives()

* Add require for input size and hidden size matching if layers of LSTM is more than one

* Refactor RNN

* Add comment on gate order to mkldnn/RNN

* Add unidirectional multilayer test

* add comments/ modify UTs

* phase is not used anymore/ use isTraining() in stead

* operationWant enhanced/ weight init/ release() parameters()

* remove input format check and change some variables names

* input format check / throw exception print info / release code

* comment style and RNNSerialTest

* remove unnecessary comments

* Softmax -> SoftMax (#2837)

* bug fix for cmul (intel#2836)

* bug fix for cmul

* meet pr comments

* set new storage to weight and bias for weight fusion (intel#2839)

* Add parameter processor for LARS (#2832)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* implement lars whole layer gradient norm calculation

* change random seed in UT

* add limitation on "trust" of LARS, remove debug output

* reformat

* add tests in DirtriOptimizer for LARS

* reformat

* update parameters in UT

* update parameters in UT

* Add transformer to LM example (intel#2835)

* add transformer to LM example

* refactor dropout in Transformer

* meet pr comments

* feat: MKLDNN LSTM unidirectional/bidirectional backward support (#2840)

* MKLDNN LSTM backward support with accuracy testing

* fix: require consistent between shape and layout of mkldnn (intel#2824)

* fix: fusion for multi-group of convolution (intel#2826)

* fix: support int8 of jointable (#2827)

* fix: support int8 of jointable
* doc: add more docs

* fix: invokeAndWait2 should throw the exception in the tasks (intel#2843)

* fix acc bug & init dnn thread (intel#2841)

* support tnc and ntc conversion (intel#2844)

* support ntc in dnn layer (intel#2847)

* support ntc in dnn layer

* meet pr comments

* [WIP]Add beam search feature in transformer model (intel#2834)

* add beam search feature

* Update beam search feature and unit test

* add symbolToLogits function set check

* update clearState and add serial test

* add SequenceBeamSearch to python layers

* add createSequenceBeamSearch method to python api

* feat: add a property to disable omp thread affinity (intel#2849)

* fix: use treeset to calc topk to upgrade the performance of DetectionOutputSSD (intel#2853)

* fix: wrong affinity settings (intel#2857)

* update beam search feature for interface with transformer model (#2855)

* update beam search for padding value and cache structure

* update python API for beam search

* add comments and update python layer

* modify comments format

* modify comments format

* Support converting blas lstm to dnn lstm (#2846)

* convert from blas lstm to dnn lstm

* meet pr comments

* fix load lstm error bug (intel#2858)

* Add beam search in transformer (intel#2856)

* Add beam search in transformer

* meet pr comments

* fix: upgrade the performance of normalize (intel#2854)

* feat: add axis to softmax (intel#2859)

* add release doc for 0.9 (intel#2862)

* fix: update core ref to master (intel#2865)

* flip version to 0.10.0 (intel#2869)

* [Bug Fix] - Fix module version comparison  (intel#2871)

* update serialization

* update serialization

* convert IRgraph momentum to mkldnn (intel#2872)

* tutorial fix (intel#2879)

* feat: RoiAlign Forward (intel#2874)

* Add set input output format API in Python (intel#2880)

* add set input output format

* add static graph check

* feat: Feature Pyramid Networks Forward (intel#2870)

* fix memory leak for ir graph training (intel#2895)

* add gemm layer (#2882)

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add gemm layer

* add gemm layer

* add Shape layer (intel#2885)

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add Gather layer (intel#2897)

* add gather layer

* [New feature] Add maskhead (intel#2892)

* support for maskhead

* fix unit tests (intel#2905)

* modify  predict/predictClass function  (#2868)

* predictClass output modification

* predict/predictClass function modification in Beta Api

* predict/predictClass function modification

* predict/predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* [New feature] Add Boxhead (intel#2894)

* add boxhead

* add SerialTest

* meet pr comments

* fix: Add TopBlocks to Feature Pyramid Networks (FPN) (#2899)

* Add Mean Average Precision validation method (intel#2906)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* fix boxhead unit tests (#2912)

* python api nested list input and pooler python api (intel#2900)

* Auto memory management for MKLDNN (#2867)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* style fixes

* change _implicitMemoryOwner -> _this

* [New feature] Add region proposal (intel#2896)

* add Regionproposal

* [New feature] add maskrcnn (#2908)

* add maskrcnn

* fix mask head

* move maskrcnn to models

* add maskrcnn serialTest

* Add Onnx Supported Layers (intel#2902)

* remove duplicated layers

* Update RoiLabel class and add RoiImageFeatureToBatch (intel#2913)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* update RoiLabel, add RoiImageFeatureToBatch

* fix typo in class name

* updates by suggestions

* minor updates

* Move RoiMiniBatch to MTImageFeatureToBatch.scala

* mask in RoiLabel now have Floats not Bytes

* use IndexedSeq for RoiLabel

* style fix

* add isCrowd and origSize to final target table

* style fix

* isCrowd change to float, add doc

* add tests and bug fixes

* add util getting RoiLabels from ImageFeatures

* add util getting RoiLabels from Table

* comment out the tests

* rename utils in RoiLabel

* feat: MKLDNN GRU forward/backward support (#2893)

* Onnx support: modify unsqueeze function (#2910)

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* add maskutils (intel#2921)

* add maskutils

* update tests & docs

* fix typo in document

* Fix memory leaks on training (intel#2914)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* fix memory leak in batch norm

* style fixes

* change _implicitMemoryOwner -> _this

* release submat

* release opencv submats

* support samples with different size  to one mini batch (intel#2929)

* add to batch with resize

* meet comments

* support batch for mask head and pooler (intel#2926)

* support batch for mask head

* meet comments

* Onnx support: add a dim parameter to ops.Gather (intel#2920)

* add dim parameter to ops.Gather

* improve and simplify code

* improve and simplify code

* improve and simplify code

* improve and simplify code

* support batch for regionproposal (#2928)

* support batch for regionproposal

* enable gru blas-to-dnn conversion (intel#2930)

* Onnx support: add pos parameter to softmax (intel#2933)

* add pos parameter to softmax

* add pos parameter to softmax

* add pos parameter to softmax

* fix review problem

* fix review problem

* Add resize for segmentation (intel#2923)

* add resize for segmentation

* meet pr comments

* support batch input for boxhead (#2924)

* boxhead support batch input

* meet pr comments

* COCO SeqFile (intel#2927)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* ignore non-existing images

* updates based on GH comments

* ONNX Support (#2918)

* onnx dev

* add onnx loader

* clean up

* feat: add precision recall auc (#2941)

* feat: add precision recall auc

* add post processing for maskrcnn model (#2931)

* add mask postprocessing

* put image info to mask model

* fix TimeDistributedCriterion() lack of parameter of dimension issue (intel#2940)

* revert back api (intel#2943)

* fix: softmax and bn+scale fusion (intel#2937)

* feat: multi models support with MKL-DNN backend (intel#2936)

* feat: multi models support with MKL-DNN backend

* add COCO MAP (#2935)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* add COCO MAP

* revert merge conflict

* ignore non-existing images

* add IOU related API. MAP now parses RLEs

* BBox now inclusive

* updates based on GH comments

* add COCODataset.getImageById

* COCO topK default => -1, remove height: Int, width: Int in GroundTruthRLE

* update imageId2Image

* rename MAPObjectDetection utils, add cocoSegmentationAndBBox, refine formatting

* rename utils

* update documents

* check size of bbox & classes & scores & labels & iscrowd. Handle empty predictions

* add gt and target image size checking, add support for empty target bbox, add UT

* detection sorted before matching with GT. Optimize MAPResult merging. Add UT for merging

* COCO Seq file reader: grey to bgr (intel#2942)

* grey to bgr

* refactor isGrayScaleImage

* simplify grey scale image checking

* Add the flushing denormal values option on BigDL side (#2934)

* add no argument apply api for softmax (intel#2945)

* add no argument apply api for softmax

* add no argument apply api for softmax

* ONNX ResNet example (intel#2939)

* add onnx resnet example

* add doc for onnx

* add doc for onnx

* clean up

* add maskrcnn inference example (intel#2944)

* add maskrcnn inference example

* meet pr comments

* add model download url

* Update the RoiLabel and MTImageFeatureToBatch (intel#2925)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* Python MKLDNN examples for CNN(LeNet) and RNN(LSTM) (#2932)

* fix: takeSample only works for dnn backend and get one batch (intel#2947)

* fix: takeSample only works for dnn backend and get one batch

* edit doc (#2948)

* Rename filesToRoiImageFrame to filesToRoiImageFeatures (intel#2949)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* filesToRoiImageFrame -> filesToRoiImageFeatures, to public

* fix: move out setMklThreads of MklDnn (intel#2950)

* memory data cleanup (#2956)

* memory data cleanup

* Onnx support: RoiAlign and TopK parameter update (#2957)

* Topk add dim and increase parameter

* RoiAlign add max pooling mode

* add test cases

* add test cases

* remove masks requirements (intel#2959)

* fix: the squeeze should not be included in IRElement (intel#2962)

* enhance COCODataset (#2954)

* enhance COCODataset:
Add COCODataset.loadFromSeqFile
Add COCODataset.toImageFeatures
Add COCOImage.toTable

* rename and polish doc

* fix COCO serialize bug

* fix typo in function name

* typo fix (intel#2965)

* rename RoiImageFeatureToBatch APIs (#2964)

* RoiMiniBatch enhancement (#2953)

* SerializableIndexedSeq

* allow empty target & image size info

* rename RoiImageFeatureToBatch APIs

* set as private

* change back to array

* MTImageFeatureToBatch without labels

* handle iscrowd

* remove duplication in merge

* feat: add softmax backward (intel#2967)

* feat: add softmax backward

* fix: fuse bn scale and relu to bn. (intel#2966)

* fix: fuse bn scale and relu.

* fix mask unit tests (intel#2973)

* fix: nms stability when using treeset. (intel#2972)

* flip version to 0.11 (intel#2974)

* refactor anchor generator (#2963)

* refactor anchor generator

* meet pr comments

* fix code style

* ROIAlign refactor (intel#2960)

* ROIAlign refactor

* fix unit tests

* fix model load of maskrcnn (intel#2961)

* fix maskrcnn model load

* delete temp file

* fix maskrcnn tests

* support roialign backward (intel#2975)

* support roialign backward

* fix sparselinear unit test

* fix: bn nhwc error, the channel should be the last dim (#2981)

* refactor: move torch relevants unit tests to integration tests. (intel#2971)

* fix: enable integration accuracy tests (intel#2976)

* fix: softmax dnn backend wrong order of primitive (intel#2986)

* modify TextClassifier.scala (#2987)

* Add a method to merge nested StaticGraphs (intel#2985)

* NHWC support when running with MKL-DNN (#2989)

* support NHWC for MKLDNN

* fix unit tests

* Keras with MKL-DNN backend support (#2990)

* Update README.md

* Update README.md

* feat: add distri optimizer v2 (intel#2992)

* update error message in AllReduceParameter (#2997)

* update error message in AllReduceParameter

* use tensorflow proto jar (#2994)

* fix callBigDLFunc (intel#3002)

* Remove final for AbstractModule (intel#3001)

* DistriOptimizerV2 argument (intel#3003)

* call DistriOptimizerV2

* fix inception (intel#3010)

* fix top1 and treenn (intel#3011)

* remove final setExtraParameters (#3014)

* move pretrain in DistriOptimizerV2 (intel#3016)

* move getData

* rename

* remove time counting

* deprecate dlframe (intel#3012)

* deprecate dlframe

* fix throughput (#3017)

* fix throughput

* update

* add release doc for 0.10.0 (intel#3020)

* test examples by distrioptimizerv2 (intel#3007)

* enable scala examples by distrioptimizerv2

* update example's readme

* update integration test

* test python examples by distriOptimizerV2 (intel#3008)

* Test python examples by distriOptimizerV2

* deprecate nn.keras (intel#3013)

* deprecate nn.keras

* fix loss when minibatch size is different (intel#3021)

* fix loss

* fix ut

* fix style check (intel#3022)

* specify pyspark version (intel#3030)

* specify pyspark version

* add release doc for 0.11 (#3026)

* flip version to 0.12 (intel#3029)



* update

* fix KerasLayer new parameters() (#3034)

* Fix analytics zoo protobuf shading problem (intel#3033)

* change shade name and remove protobuf-java (already introduced by tf)

* remove protobuf

* add required dependencies (#3047)

* update doc (intel#3056)

* Updatedoc (#3060)

* Update install-from-pip.md

* [WIP] spark 3.0 (intel#3054)

* spark 3.0

* add spark3.0 deployment (intel#3061)

* add spark3.0 deployment

* add warning to remind Optimizer() deprecates (intel#3062)

* add warning to remind deprecates

* Update scala maven plugin (#3068)

* update scala maven plugin

* change to public (#3064)

* Add big model support (#3067)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* squeeze target dimension (corner case) in ClassNLLCriterion (intel#3072)

* fix target dimension match error

* update message (#3073)

* flip version to 0.13-snapshot (intel#3074)

* flip version to 0.13-snapshot

* Uncompressed Tensor  (intel#3079)

* support no compressing parameter

* address comments

* hotfix ClassNLLCriterion with cloned target (#3081)

* hotfix ClassNLLCriterion with cloned target

* Fix SerializationUtils clone issue of QuantizedTensor (intel#3088)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* update clone quantizedtensor

* update

* add OptimPredictorShutdownSpec UT in integration test (#3089)

* move integration UT to a general test script (intel#3094)

* back port master (intel#3096)

* set seed to avoid random error in PredictionServiceUT (intel#3097)

* Jdk11 support (intel#3098)

* update for jdk 11 support and doc

* add serializeUid (intel#3099)

* update doc (intel#3104)

* add doc for running in ide (intel#3106)

* fix callBigDLFunc return a Int while the true return value from java is a byte array. (intel#3111)

* add list of df support (intel#3113)

* Update readme (intel#3118)

* Update index.md

* add 0.12.2 release download (#3122)

* remove DLFrames (intel#3124)

* remove DLFrames

* update

* update

* update

* rm dlframe example from test script

* Add Utest about dividing zero (#3128)

* Add Utest about dividing zero

* add Utest and zero check of LocalData

* add Utest and zero check of LocalData

* change

* Add Utest about dividing zero

* fix test

* add python3 to Dockerfile (intel#3132)

* add python3 to Dockerfile

* update

* update jdk

* update

* make default DistriOptimizer as V2 (intel#3129)

* make default DistriOptimizer as V2

* update

* fix dlframe (intel#3133)

* DistriOptimizerV2 logger (intel#3135)

* DistriOptimizerV2 logger

* update

* fix style check

* validate epoch num

* move dlframe SharedParamsApater to AZ and roll back to OptimizerV1 (intel#3137)

* upgrade spark version (intel#3138)

* Update deploy-spark2.sh

* 0.13 release doc (#3144)

* upgrade log4j (intel#3141)

* flip0.14 (intel#3142)

* flip0.14

* update

* Update deploy-spark3.sh (#3145)

* update

* update

* update

* update

* fix make dist

* migrate path

* update

* update

Co-authored-by: zhangxiaoli73 <[email protected]>
Co-authored-by: Jerry Wu <[email protected]>
Co-authored-by: Xin Qiu <[email protected]>
Co-authored-by: Yanzhang Wang <[email protected]>
Co-authored-by: GenBrg <[email protected]>
Co-authored-by: LeicongLi <[email protected]>
Co-authored-by: Emiliano Martinez <[email protected]>
Co-authored-by: abdolence <[email protected]>
Co-authored-by: Enrique Garcia <[email protected]>
Co-authored-by: Louie Tsai <[email protected]>
Co-authored-by: yaochi <[email protected]>
Co-authored-by: Menooker <[email protected]>
Co-authored-by: Menooker <[email protected]>
Co-authored-by: Firecrackerxox <[email protected]>
Co-authored-by: majing921201 <[email protected]>
Co-authored-by: jenniew <[email protected]>
Co-authored-by: Xiao <[email protected]>
Co-authored-by: Firecrackerxox <[email protected]>
Co-authored-by: Hui Li <[email protected]>
Co-authored-by: Jason Dai <[email protected]>
Co-authored-by: dding3 <[email protected]>
Co-authored-by: Yang Wang <[email protected]>
Co-authored-by: Yina Chen <[email protected]>
Co-authored-by: Hangrui Cao <[email protected]>
Co-authored-by: pinggao18 <[email protected]>
Le-Zheng added a commit to Le-Zheng/analytics-zoo that referenced this pull request Sep 22, 2021
* convert static graph to IR graph and build (intel#2711)

* add static graph to IR graph

* meet pr comments

* [Enhancement] - Enhance unig test to avoid dynamic resource allocation issue by docker (intel#2713)

* make the core number fixed

* fix local predictor

* add Trigger and/or python API (intel#2682)

* add spark 2.4 support (intel#2715)

* update sparse tensor's document (#2714)

* Reserve all state in OptimMethod when calling Optimizer.optimize() multiple times (#2648)

* reserve optimMethod for each worker

* add valdiation throughput

* cache variable previousOptim

* fix: move mkldnn computing to a single thread pool (intel#2724)

Because if we use the parent thread directly, there will be two bugs,
1. The child threads forked from parent thread will be bound to core 0
because of the affinity settings.
2. The native thread has some unknown thread local variables. So if
the parent thread exits and is recreated, such as the thread from
Executors.newFixedThreadPool. The whole app will be segment fault.
The parent thread means the main thread (Local Mode) or worker thread of
mapPartition (Distributed Mode).

* add ceilMode for Pooling & fix batchNorm evaluate (#2708)

* add ceilMode for Pooling & fix batchNorm evaluate

* add training status for dnn layer

* fix comments

* fix IRGraph init & Add regualizer (#2736)

* fix IRGraph init & Add regualizer

* meet review comments

* fix: update mkldnn version to v0.17 issues. (intel#2712)

There're two issues,

1. the padding tensor required. mkl-dnn will use a padding tensor which
    will use more memory, such as 4x1x28x28 to 4x8x28x28(avx2). It will
    pad to times of simd width.
2. the TensorMMap between DenseTensor and DnnTensor. Previous impl
    will allocate DnnTensor when model is created, which will cost too much
    space. So this patch will allocate it at runtime.

* add computshape for some layers and add skip primitives in DnnGraph (intel#2740)

* add computshape for some layer and add skip primitives in DnnGraph

* meet pr comments

* Improve documentation (intel#2745)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* include edge case to cover all the data types (#2742)

* layer auto fusion for dnn graph (intel#2746)

* add auto fusion in dnn graph

* refactor predict for dnn model (intel#2737)

* refactor predict for dnn model

* remove some unit tests (intel#2752)

* remove some conflict tests (#2753)

* Update documentation (intel#2749)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Fix Add operation error when type is Double importing Tensorflow graph (#2721)

* feature: add byte supports for DnnTensor (intel#2751)

* feat: add byte supports for DnnTensor

* [New Feature] Calculating Scales (#2750)

* [New Feature]Calculating Scales

* recursively update mask for container module (intel#2754)

* recursively update mask for container module

* [Enhancement] - Speed up BlasWrapper performance under MKL-DNN (intel#2748)

* add parallel in Blaswrapper

* refactor to support ssd

* meet pr comments

* fix logger serialize

* Loss Function docs improvement (intel#2757)

* Improve Loss Function docs v2

* change asInstanceOf to toDistirbuted in optimizer (#2755)

* change asInstanceOf to toDistirbuted

* change asInstanceOf to toDistirbuted

* convert scale in blas to dnn (#2758)

* convert scale in blas to dnn

* meet pr comment

* feat: reorder for int8 supports (#2756)

1. Because the new data type, we should add a new attribute called dataType
    to the `MemoryData`.
2. Because we should transfer the scales between FP32->int8 and Int8->FP32.
    we should add two new attributes called `mask` and `scales`.

* fix conversion accuracy (intel#2760)

*  fix accuracy for saved model

* exclude mkldnn model when conversion

* feature: layer wise supports of int8 (intel#2762)

Enable the int8 data type in layers, especially for convolutions.
So for a specific layer, it can accept a int8 input. If you want to the fp32
output, should add a reorder.

* feature: mkldnn int8 layer wise supports (intel#2759)

including 3 steps.

1. generate scales of model.
   need an api like `generateScalesWithMask` to generate the scales of
   fp32 model. and the model returned is an fp32 model too.
2. quantize the model
   the `quantize()` api will be compatible with the `bigquant`
   backend, which will set the quantize flag. And when doing compile,
   the quantized weight, output, input will be generated by mkldnn at
   runtime.
3. do the inference (forward).

* update readme for v1 training (intel#2763)

* update release doc for preparation (intel#2764)

* change some docs about mkldnn (intel#2765)

* add comments about mkldnn

* meet pr comments

* examples for int8 (intel#2761)

This is an example of how to use mkldnn int8. There're two steps, use
GenInt8Scales to generate the scales first and save the new model. And than you
can use the quantized model as usual.

* enable fustion by default (intel#2766)

* fix: the influence of default value of fusion (#2768)

* fix: use too much memory of mkldnn models (intel#2783)

* fix: inplace of input/output and weight dimension error (intel#2779)

Some layer's input and output use the same memory. We can't do forward in the
`calcScales`. Because at that time, the input has been changed, its scales maybe
not right. Such as,

Seqeuntail().add(Conv).add(ReLU)

it will do two steps, seq.forward(input) first. and when go into the ReLU, it
will do another forward, so the input will be the output. And scales will be
wrong.

For convolution's weight, the dimension always is 5, although the group number
is 1. But for dnn convolution, if there's no group, the weight's dimension
should be 4.

* fix: the blas wrapper has no scales (intel#2778)

* fix softmax (intel#2777)

* fix: performance regression on resnet50 (intel#2774)

the u8 to s8 or s8 to u8 needs no reorder on this case.

* fix log init (#2781)

* fix: dropout should init primitive (#2789)

* Docs update for spark 2.3, build 0.7 and deps exlude (intel#2671)

* flip to 0.9.0 (intel#2792)

* Improve Layer documentation v1 (#2767)

* Modify documentation

* Modify documentation 2

* 修改了环境配置文档

* Corrected some mistakes in the API Guide

* Update learning rate scheduler doc.

* Fix the Bottle Container example code.

* Loss Function docs improvement v1

* Improve Loss Function docs v2

* Improve Layers documentation

* Improve documentation on Activations

* minor fix

* Update a code section with python style on Metrics.md (intel#2665)

* [Fix] doc : some changes for scalaUserGuide and release links according to … (intel#2791)

* doc : some changes for scalaUserGuide and release links according to v0.8.0 release

* Update build-bigdl-core.md

* Update build-bigdl-core.md

* test: should compare the right grad input (intel#2794)

* fix the wrong error message (#2800)

* [New feature] Add attention layer and ffn layer (intel#2795)

* add attention layer

* add ffn layer and more unit tests

* refactor according to pr comments

* add SerializationTest

* fix unit tests

* add python api

* update readme with newly adopted mkl-dnn (#2803)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer (intel#2802)

* [New feature & fix] Add layer-wise adaptive rate scaling optimizer:
Add LARS optimizer: Layer-wise scaled. Also with utility functions to build a set of LARS optim for a container.

Bug fix: The gradient block id of AllReduceParameter is originally composed of {id}{pidTo}gradientBytes{pidFrom}. But the combination of {id}{pidTo} will cause ambiguity. e.g., "112" can be {1}{12} or {11}{2}. Now a "_" is added to separate id from pidTo

* refine documents, correctly set the lrSchedulerOwner bit

* format the added code

* make Lars inherit SGD

* rename Lars -> LarsSGD and reformat

* style changes

* bugfix - set mask for container (intel#2807)

* bugfix - set mask for container

* bugfix #2805: set dimension mask

* Update Graph.scala

* Update Graph.scala

* change set mask indicator's name

* rename set mask params

* [Enhancement]: Scala Reflection: get default value for constructor parameters (intel#2808)

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* reflection: get param's default value when instantiating a class

* resolve conflict

* resolve conflict

* code style check

* remove print

* fix typos

fix typos

* replace randomcropper with centercrop for better performance (#2818)

* fix: memory data hash code should contain data type (intel#2821)

* Optimize backward graph generation and CAddTable (intel#2817)

* Optimize backward graph generation and caddtable

* refine add table

* change api name

* add layer norm and expand size layers (#2819)

* add layer norm and expand size

* meet pr comments

* feat: enable global average pooling (intel#2823)

* feat: enable global average pooling

* test: add more unit tests

* Optimizers: use member variable in parent class

* Revert "Optimizers: use member variable in parent class"

This reverts commit 7e47204

* Dilation in MKL-DNN Convolution (intel#2815)

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* mkldnn-dilatedconv

* fix typos

fix typos

* make todo all uppercase

* fix: calculate arbitrary mask of scales (intel#2822)

* Use one AllReduceParameter for multi-optim method  training (intel#2814)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* change random seed in UT

* [New feature] add transformer layer (intel#2825)

* add transformer

* refactor class name

* use same embedding for translation

* fix pr comments

* [Bug Fix] Fix Issue 2734 (#2816)

* fix issue 2734

* fix issue 2734

* fix issue 2734

* [Refactor] Reflection Utilization (#2831)

* refactor reflection utils

* refactor reflection utils

* feat: MKLDNN LSTM unidirectional/bidirectional inference support (intel#2806)

* LSTM draft

* MKLDNN LSTM fixed MD

* added hiddenSize

* setMemoryData NativeData

* weights NativeData format set to ldigo, all 1 test passed

* fixed format any problem

* LSTM weights bias initialisation

* add LSTM2 in nn

* Bidirectional LSTM inference enabled

* modified Bidirectional test

* LSTMSpec input format conversion bug between bigdl and mkldnn fixed, not support random weights, bias

* fixed the last problem 1 3 2 4

* Three inference tests with randomly generated parameters

* Added comments and modified the LSTMSpec (tests using Equivalent.nearequals)

* Deleted nn/LSTM2. Renamed methods. Added a requirement in nn/TimeDistributed

* combined initMemoryDescs() into initFwdPrimitives()

* Add require for input size and hidden size matching if layers of LSTM is more than one

* Refactor RNN

* Add comment on gate order to mkldnn/RNN

* Add unidirectional multilayer test

* add comments/ modify UTs

* phase is not used anymore/ use isTraining() in stead

* operationWant enhanced/ weight init/ release() parameters()

* remove input format check and change some variables names

* input format check / throw exception print info / release code

* comment style and RNNSerialTest

* remove unnecessary comments

* Softmax -> SoftMax (#2837)

* bug fix for cmul (intel#2836)

* bug fix for cmul

* meet pr comments

* set new storage to weight and bias for weight fusion (intel#2839)

* Add parameter processor for LARS (#2832)

* enhancement: use one shared allreduceparameter

* update localPartitionRange

* implement lars whole layer gradient norm calculation

* change random seed in UT

* add limitation on "trust" of LARS, remove debug output

* reformat

* add tests in DirtriOptimizer for LARS

* reformat

* update parameters in UT

* update parameters in UT

* Add transformer to LM example (intel#2835)

* add transformer to LM example

* refactor dropout in Transformer

* meet pr comments

* feat: MKLDNN LSTM unidirectional/bidirectional backward support (#2840)

* MKLDNN LSTM backward support with accuracy testing

* fix: require consistent between shape and layout of mkldnn (intel#2824)

* fix: fusion for multi-group of convolution (intel#2826)

* fix: support int8 of jointable (#2827)

* fix: support int8 of jointable
* doc: add more docs

* fix: invokeAndWait2 should throw the exception in the tasks (intel#2843)

* fix acc bug & init dnn thread (intel#2841)

* support tnc and ntc conversion (intel#2844)

* support ntc in dnn layer (intel#2847)

* support ntc in dnn layer

* meet pr comments

* [WIP]Add beam search feature in transformer model (intel#2834)

* add beam search feature

* Update beam search feature and unit test

* add symbolToLogits function set check

* update clearState and add serial test

* add SequenceBeamSearch to python layers

* add createSequenceBeamSearch method to python api

* feat: add a property to disable omp thread affinity (intel#2849)

* fix: use treeset to calc topk to upgrade the performance of DetectionOutputSSD (intel#2853)

* fix: wrong affinity settings (intel#2857)

* update beam search feature for interface with transformer model (#2855)

* update beam search for padding value and cache structure

* update python API for beam search

* add comments and update python layer

* modify comments format

* modify comments format

* Support converting blas lstm to dnn lstm (#2846)

* convert from blas lstm to dnn lstm

* meet pr comments

* fix load lstm error bug (intel#2858)

* Add beam search in transformer (intel#2856)

* Add beam search in transformer

* meet pr comments

* fix: upgrade the performance of normalize (intel#2854)

* feat: add axis to softmax (intel#2859)

* add release doc for 0.9 (intel#2862)

* fix: update core ref to master (intel#2865)

* flip version to 0.10.0 (intel#2869)

* [Bug Fix] - Fix module version comparison  (intel#2871)

* update serialization

* update serialization

* convert IRgraph momentum to mkldnn (intel#2872)

* tutorial fix (intel#2879)

* feat: RoiAlign Forward (intel#2874)

* Add set input output format API in Python (intel#2880)

* add set input output format

* add static graph check

* feat: Feature Pyramid Networks Forward (intel#2870)

* fix memory leak for ir graph training (intel#2895)

* add gemm layer (#2882)

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add transpose in gemm layer

* add gemm layer

* add gemm layer

* add Shape layer (intel#2885)

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add shape layer

* add Gather layer (intel#2897)

* add gather layer

* [New feature] Add maskhead (intel#2892)

* support for maskhead

* fix unit tests (intel#2905)

* modify  predict/predictClass function  (#2868)

* predictClass output modification

* predict/predictClass function modification in Beta Api

* predict/predictClass function modification

* predict/predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* predictClass function modification

* [New feature] Add Boxhead (intel#2894)

* add boxhead

* add SerialTest

* meet pr comments

* fix: Add TopBlocks to Feature Pyramid Networks (FPN) (#2899)

* Add Mean Average Precision validation method (intel#2906)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* fix boxhead unit tests (#2912)

* python api nested list input and pooler python api (intel#2900)

* Auto memory management for MKLDNN (#2867)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* style fixes

* change _implicitMemoryOwner -> _this

* [New feature] Add region proposal (intel#2896)

* add Regionproposal

* [New feature] add maskrcnn (#2908)

* add maskrcnn

* fix mask head

* move maskrcnn to models

* add maskrcnn serialTest

* Add Onnx Supported Layers (intel#2902)

* remove duplicated layers

* Update RoiLabel class and add RoiImageFeatureToBatch (intel#2913)

* add MeanAveragePrecision validation method

* Add MAP basic code for object detection

* update tests

* bug fixes based on results of former MAP validation method

* update documents

* add python binding

* typo fix, style change, change calculateAP to private

* update comments

* update RoiLabel, add RoiImageFeatureToBatch

* fix typo in class name

* updates by suggestions

* minor updates

* Move RoiMiniBatch to MTImageFeatureToBatch.scala

* mask in RoiLabel now have Floats not Bytes

* use IndexedSeq for RoiLabel

* style fix

* add isCrowd and origSize to final target table

* style fix

* isCrowd change to float, add doc

* add tests and bug fixes

* add util getting RoiLabels from ImageFeatures

* add util getting RoiLabels from Table

* comment out the tests

* rename utils in RoiLabel

* feat: MKLDNN GRU forward/backward support (#2893)

* Onnx support: modify unsqueeze function (#2910)

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* modeify unsqueeze function

* add maskutils (intel#2921)

* add maskutils

* update tests & docs

* fix typo in document

* Fix memory leaks on training (intel#2914)

* add memory owner

* Add DnnTensor to MemoryOwner

* delete unused file

* style fix

* Move ReorderManager to MemoryOwner

* Fix compiling errors

* use Releasable as a general management type. release input layer.

* remove redundant null checking

* fix memory leak in batch norm

* style fixes

* change _implicitMemoryOwner -> _this

* release submat

* release opencv submats

* support samples with different size  to one mini batch (intel#2929)

* add to batch with resize

* meet comments

* support batch for mask head and pooler (intel#2926)

* support batch for mask head

* meet comments

* Onnx support: add a dim parameter to ops.Gather (intel#2920)

* add dim parameter to ops.Gather

* improve and simplify code

* improve and simplify code

* improve and simplify code

* improve and simplify code

* support batch for regionproposal (#2928)

* support batch for regionproposal

* enable gru blas-to-dnn conversion (intel#2930)

* Onnx support: add pos parameter to softmax (intel#2933)

* add pos parameter to softmax

* add pos parameter to softmax

* add pos parameter to softmax

* fix review problem

* fix review problem

* Add resize for segmentation (intel#2923)

* add resize for segmentation

* meet pr comments

* support batch input for boxhead (#2924)

* boxhead support batch input

* meet pr comments

* COCO SeqFile (intel#2927)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* ignore non-existing images

* updates based on GH comments

* ONNX Support (#2918)

* onnx dev

* add onnx loader

* clean up

* feat: add precision recall auc (#2941)

* feat: add precision recall auc

* add post processing for maskrcnn model (#2931)

* add mask postprocessing

* put image info to mask model

* fix TimeDistributedCriterion() lack of parameter of dimension issue (intel#2940)

* revert back api (intel#2943)

* fix: softmax and bn+scale fusion (intel#2937)

* feat: multi models support with MKL-DNN backend (intel#2936)

* feat: multi models support with MKL-DNN backend

* add COCO MAP (#2935)

* Move COCO SeqFile related updates into this branch

* bbox

* add UT

* add UT

* add UT

* add COCO MAP

* revert merge conflict

* ignore non-existing images

* add IOU related API. MAP now parses RLEs

* BBox now inclusive

* updates based on GH comments

* add COCODataset.getImageById

* COCO topK default => -1, remove height: Int, width: Int in GroundTruthRLE

* update imageId2Image

* rename MAPObjectDetection utils, add cocoSegmentationAndBBox, refine formatting

* rename utils

* update documents

* check size of bbox & classes & scores & labels & iscrowd. Handle empty predictions

* add gt and target image size checking, add support for empty target bbox, add UT

* detection sorted before matching with GT. Optimize MAPResult merging. Add UT for merging

* COCO Seq file reader: grey to bgr (intel#2942)

* grey to bgr

* refactor isGrayScaleImage

* simplify grey scale image checking

* Add the flushing denormal values option on BigDL side (#2934)

* add no argument apply api for softmax (intel#2945)

* add no argument apply api for softmax

* add no argument apply api for softmax

* ONNX ResNet example (intel#2939)

* add onnx resnet example

* add doc for onnx

* add doc for onnx

* clean up

* add maskrcnn inference example (intel#2944)

* add maskrcnn inference example

* meet pr comments

* add model download url

* Update the RoiLabel and MTImageFeatureToBatch (intel#2925)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* Python MKLDNN examples for CNN(LeNet) and RNN(LSTM) (#2932)

* fix: takeSample only works for dnn backend and get one batch (intel#2947)

* fix: takeSample only works for dnn backend and get one batch

* edit doc (#2948)

* Rename filesToRoiImageFrame to filesToRoiImageFeatures (intel#2949)

* Update the RoiLabel related files from Sequence-file related PR

* var -> val

* Bug fix for curBatchSize < batchSize. toRGB default to false

* add ROISIZE

* update documents

* update documents

* add UT

* fix document

* filesToRoiImageFrame -> filesToRoiImageFeatures, to public

* fix: move out setMklThreads of MklDnn (intel#2950)

* memory data cleanup (#2956)

* memory data cleanup

* Onnx support: RoiAlign and TopK parameter update (#2957)

* Topk add dim and increase parameter

* RoiAlign add max pooling mode

* add test cases

* add test cases

* remove masks requirements (intel#2959)

* fix: the squeeze should not be included in IRElement (intel#2962)

* enhance COCODataset (#2954)

* enhance COCODataset:
Add COCODataset.loadFromSeqFile
Add COCODataset.toImageFeatures
Add COCOImage.toTable

* rename and polish doc

* fix COCO serialize bug

* fix typo in function name

* typo fix (intel#2965)

* rename RoiImageFeatureToBatch APIs (#2964)

* RoiMiniBatch enhancement (#2953)

* SerializableIndexedSeq

* allow empty target & image size info

* rename RoiImageFeatureToBatch APIs

* set as private

* change back to array

* MTImageFeatureToBatch without labels

* handle iscrowd

* remove duplication in merge

* feat: add softmax backward (intel#2967)

* feat: add softmax backward

* fix: fuse bn scale and relu to bn. (intel#2966)

* fix: fuse bn scale and relu.

* fix mask unit tests (intel#2973)

* fix: nms stability when using treeset. (intel#2972)

* flip version to 0.11 (intel#2974)

* refactor anchor generator (#2963)

* refactor anchor generator

* meet pr comments

* fix code style

* ROIAlign refactor (intel#2960)

* ROIAlign refactor

* fix unit tests

* fix model load of maskrcnn (intel#2961)

* fix maskrcnn model load

* delete temp file

* fix maskrcnn tests

* support roialign backward (intel#2975)

* support roialign backward

* fix sparselinear unit test

* fix: bn nhwc error, the channel should be the last dim (#2981)

* refactor: move torch relevants unit tests to integration tests. (intel#2971)

* fix: enable integration accuracy tests (intel#2976)

* fix: softmax dnn backend wrong order of primitive (intel#2986)

* modify TextClassifier.scala (#2987)

* Add a method to merge nested StaticGraphs (intel#2985)

* NHWC support when running with MKL-DNN (#2989)

* support NHWC for MKLDNN

* fix unit tests

* Keras with MKL-DNN backend support (#2990)

* Update README.md

* Update README.md

* feat: add distri optimizer v2 (intel#2992)

* update error message in AllReduceParameter (#2997)

* update error message in AllReduceParameter

* use tensorflow proto jar (#2994)

* fix callBigDLFunc (intel#3002)

* Remove final for AbstractModule (intel#3001)

* DistriOptimizerV2 argument (intel#3003)

* call DistriOptimizerV2

* fix inception (intel#3010)

* fix top1 and treenn (intel#3011)

* remove final setExtraParameters (#3014)

* move pretrain in DistriOptimizerV2 (intel#3016)

* move getData

* rename

* remove time counting

* deprecate dlframe (intel#3012)

* deprecate dlframe

* fix throughput (#3017)

* fix throughput

* update

* add release doc for 0.10.0 (intel#3020)

* test examples by distrioptimizerv2 (intel#3007)

* enable scala examples by distrioptimizerv2

* update example's readme

* update integration test

* test python examples by distriOptimizerV2 (intel#3008)

* Test python examples by distriOptimizerV2

* deprecate nn.keras (intel#3013)

* deprecate nn.keras

* fix loss when minibatch size is different (intel#3021)

* fix loss

* fix ut

* fix style check (intel#3022)

* specify pyspark version (intel#3030)

* specify pyspark version

* add release doc for 0.11 (#3026)

* flip version to 0.12 (intel#3029)



* update

* fix KerasLayer new parameters() (#3034)

* Fix analytics zoo protobuf shading problem (intel#3033)

* change shade name and remove protobuf-java (already introduced by tf)

* remove protobuf

* add required dependencies (#3047)

* update doc (intel#3056)

* Updatedoc (#3060)

* Update install-from-pip.md

* [WIP] spark 3.0 (intel#3054)

* spark 3.0

* add spark3.0 deployment (intel#3061)

* add spark3.0 deployment

* add warning to remind Optimizer() deprecates (intel#3062)

* add warning to remind deprecates

* Update scala maven plugin (#3068)

* update scala maven plugin

* change to public (#3064)

* Add big model support (#3067)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* squeeze target dimension (corner case) in ClassNLLCriterion (intel#3072)

* fix target dimension match error

* update message (#3073)

* flip version to 0.13-snapshot (intel#3074)

* flip version to 0.13-snapshot

* Uncompressed Tensor  (intel#3079)

* support no compressing parameter

* address comments

* hotfix ClassNLLCriterion with cloned target (#3081)

* hotfix ClassNLLCriterion with cloned target

* Fix SerializationUtils clone issue of QuantizedTensor (intel#3088)

* update get extra param

* add test

* add check

* fix clone parameter

* fix test

* fix test

* update clone quantizedtensor

* update

* add OptimPredictorShutdownSpec UT in integration test (#3089)

* move integration UT to a general test script (intel#3094)

* back port master (intel#3096)

* set seed to avoid random error in PredictionServiceUT (intel#3097)

* Jdk11 support (intel#3098)

* update for jdk 11 support and doc

* add serializeUid (intel#3099)

* update doc (intel#3104)

* add doc for running in ide (intel#3106)

* fix callBigDLFunc return a Int while the true return value from java is a byte array. (intel#3111)

* add list of df support (intel#3113)

* Update readme (intel#3118)

* Update index.md

* add 0.12.2 release download (#3122)

* remove DLFrames (intel#3124)

* remove DLFrames

* update

* update

* update

* rm dlframe example from test script

* Add Utest about dividing zero (#3128)

* Add Utest about dividing zero

* add Utest and zero check of LocalData

* add Utest and zero check of LocalData

* change

* Add Utest about dividing zero

* fix test

* add python3 to Dockerfile (intel#3132)

* add python3 to Dockerfile

* update

* update jdk

* update

* make default DistriOptimizer as V2 (intel#3129)

* make default DistriOptimizer as V2

* update

* fix dlframe (intel#3133)

* DistriOptimizerV2 logger (intel#3135)

* DistriOptimizerV2 logger

* update

* fix style check

* validate epoch num

* move dlframe SharedParamsApater to AZ and roll back to OptimizerV1 (intel#3137)

* upgrade spark version (intel#3138)

* Update deploy-spark2.sh

* 0.13 release doc (#3144)

* upgrade log4j (intel#3141)

* flip0.14 (intel#3142)

* flip0.14

* update

* Update deploy-spark3.sh (#3145)

* update

* update

* update

* update

* fix make dist

* migrate path

* update

* update

Co-authored-by: zhangxiaoli73 <[email protected]>
Co-authored-by: Jerry Wu <[email protected]>
Co-authored-by: Xin Qiu <[email protected]>
Co-authored-by: Yanzhang Wang <[email protected]>
Co-authored-by: GenBrg <[email protected]>
Co-authored-by: LeicongLi <[email protected]>
Co-authored-by: Emiliano Martinez <[email protected]>
Co-authored-by: abdolence <[email protected]>
Co-authored-by: Enrique Garcia <[email protected]>
Co-authored-by: Louie Tsai <[email protected]>
Co-authored-by: yaochi <[email protected]>
Co-authored-by: Menooker <[email protected]>
Co-authored-by: Menooker <[email protected]>
Co-authored-by: Firecrackerxox <[email protected]>
Co-authored-by: majing921201 <[email protected]>
Co-authored-by: jenniew <[email protected]>
Co-authored-by: Xiao <[email protected]>
Co-authored-by: Firecrackerxox <[email protected]>
Co-authored-by: Hui Li <[email protected]>
Co-authored-by: Jason Dai <[email protected]>
Co-authored-by: dding3 <[email protected]>
Co-authored-by: Yang Wang <[email protected]>
Co-authored-by: Yina Chen <[email protected]>
Co-authored-by: Hangrui Cao <[email protected]>
Co-authored-by: pinggao18 <[email protected]>
Litchilitchy added a commit to Litchilitchy/analytics-zoo that referenced this pull request Sep 23, 2021
liu-shaojun pushed a commit that referenced this pull request Mar 6, 2024
* add warning to remind deprecates
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants