intel · chensuyue · Jul 18, 2023 · May 23, 2023 · May 27, 2023 · May 27, 2023
diff --git a/.azure-pipelines/scripts/codeScan/pyspelling/inc_dict.txt b/.azure-pipelines/scripts/codeScan/pyspelling/inc_dict.txt
@@ -495,6 +495,7 @@ dnf
 dnn
 dnnl
 DNNL
+DnnlExecutionProvider
 Dockerfile
 doclist
 docstrings
@@ -563,6 +564,7 @@ enum
 env
 environ
 ep
+eps
 eq
 erf
 Erf

diff --git a/docs/source/mixed_precision.md b/docs/source/mixed_precision.md
@@ -17,6 +17,7 @@ The recently launched 3rd Gen Intel® Xeon® Scalable processor (codenamed Coope
 </p>
 
 ## Mixed Precision Support Matrix
+
 <table class="center">
     <thead>
         <tr>
@@ -48,7 +49,7 @@ The recently launched 3rd Gen Intel® Xeon® Scalable processor (codenamed Coope
             <td align="left">:x:</td>
         </tr>
         <tr>
-            <td rowspan="3" align="left">ONNX Runtime</td>
+            <td rowspan="4" align="left">ONNX Runtime</td>
             <td align="left">CPUExecutionProvider</td>
             <td align="left">MLAS</td>
             <td align="left">"default"</td>
@@ -72,6 +73,14 @@ The recently launched 3rd Gen Intel® Xeon® Scalable processor (codenamed Coope
             <td align="left">&#10004;</td>
             <td align="left">&#10004;</td>
         </tr>
+        <tr>
+            <td align="left">DnnlExecutionProvider</td>
+            <td align="left">OneDNN</td>
+            <td align="left">"onnxrt_dnnl_ep"</td>
+            <td align="left">cpu</td>
+            <td align="left">&#10004;</td>
+            <td align="left">:x:</td>
+        </tr>
         <tr>
             <td rowspan="2" align="left">Tensorflow</td>
             <td align="left">Tensorflow</td>
@@ -162,4 +171,5 @@ converted_model.save('./path/to/save/')
 - Quick started with [helloworld example](/examples/helloworld/tf_example3)
 - PyTorch [ResNet18](/examples/pytorch/image_recognition/torchvision_models/mixed_precision/resnet18)
 - IPEX [DistilBERT base](/examples/pytorch/nlp/huggingface_models/question-answering/mixed_precision/ipex)
-- Tensorflow [ResNet50](/examples/tensorflow/image_recognition/tensorflow_models/resnet50_v1/mixed_precision) 
+- Tensorflow [ResNet50](/examples/tensorflow/image_recognition/tensorflow_models/resnet50_v1/mixed_precision)
+- ONNX Runtime [Bert base](/examples/onnxrt/nlp/huggingface_model/text_classification/mix_precision)
diff --git a/docs/source/quantization.md b/docs/source/quantization.md
@@ -452,7 +452,7 @@ Intel(R) Neural Compressor support multi-framework: PyTorch, Tensorflow, ONNX Ru
             <td align="left">cpu</td>
         </tr>
         <tr>
-            <td rowspan="3" align="left">ONNX Runtime</td>
+            <td rowspan="4" align="left">ONNX Runtime</td>
             <td align="left">CPUExecutionProvider</td>
             <td align="left">MLAS</td>
             <td align="left">"default"</td>
@@ -470,6 +470,12 @@ Intel(R) Neural Compressor support multi-framework: PyTorch, Tensorflow, ONNX Ru
             <td align="left">"onnxrt_cuda_ep"</td>
             <td align="left">gpu</td>
         </tr>
+        <tr>
+            <td align="left">DnnlExecutionProvider</td>
+            <td align="left">OneDNN</td>
+            <td align="left">"onnxrt_dnnl_ep"</td>
+            <td align="left">cpu</td>
+        </tr>
         <tr>
             <td rowspan="2" align="left">Tensorflow</td>
             <td align="left">Tensorflow</td>

diff --git a/examples/onnxrt/nlp/huggingface_model/text_classification/mix_precision/README.md b/examples/onnxrt/nlp/huggingface_model/text_classification/mix_precision/README.md
@@ -0,0 +1,77 @@
+Step-by-Step
+============
+
+This example load a language translation model and confirm its accuracy and speed based on [GLUE data](https://gluebenchmark.com/).
+
+# Prerequisite
+
+## 1. Environment
+```shell
+git clone -b dnnl_ep --depth 1 https://github.com/intel/neural-compressor.git
+cd neural-compressor
+pip install -e ./
+
+cd examples/onnxrt/nlp/huggingface_model/text_classification/mix_precision/
+pip install -r requirements.txt
+```
+> Note: Validated ONNX Runtime [Version](/docs/source/installation_guide.md#validated-software-environment).
+
+## 2. Prepare Model
+
+Supported model identifier from [huggingface.co](https://huggingface.co/):
+
+|                 Model Identifier                |
+|:-----------------------------------------------:|
+|           Intel/bert-base-uncased-mrpc          |
+|             Intel/roberta-base-mrpc             |
+|           Intel/xlm-roberta-base-mrpc           |
+|            Intel/camembert-base-mrpc            |
+| distilbert-base-uncased-finetuned-sst-2-english |
+|         Alireza1044/albert-base-v2-sst2         |
+|        Intel/MiniLM-L12-H384-uncased-mrpc       |
+|      philschmid/MiniLM-L6-H384-uncased-sst2     |
+|     bert-base-cased-finetuned-mrpc              |
+|        Intel/electra-small-discriminator-mrpc   |
+|         M-FAC/bert-mini-finetuned-mrpc          |
+|           Intel/xlnet-base-cased-mrpc           |
+|            Intel/bart-large-mrpc                |
+
+```bash
+python export.py --model_name_or_path=Intel/bert-base-uncased-mrpc # or other supported model identifier
+```
+
+## 3. Prepare Dataset
+Download the GLUE data with `prepare_data.sh` script.
+
+```shell
+export GLUE_DIR=/path/to/glue_data
+export TASK_NAME=MRPC # or SST
+
+bash prepare_data.sh --data_dir=$GLUE_DIR --task_name=$TASK_NAME
+```
+
+# Run
+
+If the hardware doesn't support bf16 instruction, please set flag as below to force bf16 conversion (this way will be deprecated):
+
+```shell
+export FORCE_BF16=1
+```
+
+## 1. Only mixed precision conversion
+
+```bash
+bash run.sh --input_model=path/to/model \ # model path as *.onnx
+            --output_model=path/to/model_tune \ # model path as *.onnx
+```
+
+## 2. Mixed precision conversion + accuracy evaluation
+
+Please make sure DnnlExecutionProvider is in available providers list to execute evaluation.
+
+```bash
+bash eval.sh --input_model=path/to/model \ # model path as *.onnx
+            --output_model=path/to/model_tune \ # model path as *.onnx
+            --dataset_location=path/to/glue/data \ 
+            --batch_size=batch_size \  # optional
+```
diff --git a/examples/onnxrt/nlp/huggingface_model/text_classification/mix_precision/eval.sh b/examples/onnxrt/nlp/huggingface_model/text_classification/mix_precision/eval.sh
@@ -0,0 +1,128 @@
+#!/bin/bash
+set -x
+
+function main {
+  init_params "$@"
+  run_tuning
+}
+
+# init params
+function init_params {
+  for var in "$@"
+  do
+    case $var in
+      --input_model=*)
+          input_model=$(echo $var |cut -f2 -d=)
+      ;;
+      --output_model=*)
+          output_model=$(echo $var |cut -f2 -d=)
+      ;;
+      --dataset_location=*)
+          dataset_location=$(echo $var |cut -f2 -d=)
+      ;;
+      --batch_size=*)
+          batch_size=$(echo $var |cut -f2 -d=)
+      ;;
+    esac
+  done
+
+}
+
+# run_tuning
+function run_tuning {
+
+    if [[ "${input_model}" =~ "bert-base-uncased" ]]; then
+        model_name_or_path="Intel/bert-base-uncased-mrpc"
+        TASK_NAME='mrpc'
+        num_heads=12
+        hidden_size=768
+    fi
+    if [[ "${input_model}" =~ "roberta-base" ]]; then
+        model_name_or_path="Intel/roberta-base-mrpc"
+        TASK_NAME='mrpc'
+        num_heads=12
+        hidden_size=768
+    fi
+    if [[ "${input_model}" =~ "xlm-roberta-base" ]]; then
+        model_name_or_path="Intel/xlm-roberta-base-mrpc"
+        TASK_NAME='mrpc'
+        num_heads=12
+        hidden_size=768
+    fi
+    if [[ "${input_model}" =~ "camembert-base" ]]; then
+        model_name_or_path="Intel/camembert-base-mrpc"
+        TASK_NAME='mrpc'
+        num_heads=12
+        hidden_size=768
+    fi
+    if [[ "${input_model}" =~ "distilbert-base" ]]; then
+        model_name_or_path="distilbert-base-uncased-finetuned-sst-2-english"
+        TASK_NAME='sst-2'
+        num_heads=12
+        hidden_size=768
+    fi
+    if [[ "${input_model}" =~ "albert-base" ]]; then
+        model_name_or_path="Alireza1044/albert-base-v2-sst2"
+        TASK_NAME='sst-2'
+        num_heads=12
+        hidden_size=768
+    fi
+    if [[ "${input_model}" =~ "MiniLM-L6" ]]; then
+        model_name_or_path="philschmid/MiniLM-L6-H384-uncased-sst2"
+        TASK_NAME='sst-2'
+        num_heads=12
+        hidden_size=384
+    fi
+    if [[ "${input_model}" =~ "MiniLM-L12" ]]; then
+        model_name_or_path="Intel/MiniLM-L12-H384-uncased-mrpc"
+        TASK_NAME='mrpc'
+        num_heads=12
+        hidden_size=384
+    fi
+    if [[ "${input_model}" =~ "bert-base-cased" ]]; then
+        model_name_or_path="bert-base-cased-finetuned-mrpc"
+        TASK_NAME='mrpc'
+        num_heads=12
+        hidden_size=384
+    fi
+    if [[ "${input_model}" =~ "xlnet-base-cased" ]]; then
+        model_name_or_path="Intel/xlnet-base-cased-mrpc"
+        TASK_NAME='mrpc'
+        num_heads=12
+        hidden_size=768
+    fi
+    if [[ "${input_model}" =~ "bert-mini" ]]; then
+        model_name_or_path="M-FAC/bert-mini-finetuned-mrpc"
+        TASK_NAME='mrpc'
+        num_heads=4
+        hidden_size=256
+    fi
+    if [[ "${input_model}" =~ "electra-small-discriminator" ]]; then
+        model_name_or_path="Intel/electra-small-discriminator-mrpc"
+        TASK_NAME='mrpc'
+        num_heads=4
+        hidden_size=256
+    fi
+    if [[ "${input_model}" =~ "bart" ]]; then
+        model_name_or_path="Intel/bart-large-mrpc"
+        TASK_NAME='mrpc'
+        num_heads=16
+        hidden_size=4096
+    fi
+
+    python main.py \
+            --model_name_or_path ${model_name_or_path} \
+            --model_path ${input_model} \
+            --output_model ${output_model} \
+            --data_path ${dataset_location} \
+            --batch_size ${batch_size-1} \
+            --task ${TASK_NAME} \
+            --num_heads ${num_heads} \
+            --hidden_size ${hidden_size} \
+            --do_eval
+}
+
+main "$@"
+
+
+
diff --git a/examples/onnxrt/nlp/huggingface_model/text_classification/mix_precision/export.py b/examples/onnxrt/nlp/huggingface_model/text_classification/mix_precision/export.py
@@ -0,0 +1,74 @@
+import argparse
+
+import torch
+from transformers import AutoConfig, AutoModelForSequenceClassification
+
+def export_onnx_model(args, model):
+    with torch.no_grad():
+        symbolic_names = {0: 'batch_size', 1: 'max_seq_len'}
+        if args.model_name_or_path in ['Intel/roberta-base-mrpc', 
+                                        'Intel/xlm-roberta-base-mrpc', 
+                                        'Intel/camembert-base-mrpc', 
+                                        'distilbert-base-uncased-finetuned-sst-2-english']:
+            inputs = {'input_ids':      torch.ones(1, args.max_len, dtype=torch.int64),
+                    'attention_mask': torch.ones(1, args.max_len, dtype=torch.int64)}
+            torch.onnx.export(model,                            # model being run
+                            (inputs['input_ids'],               # model input (or a tuple for multiple inputs) 
+                            inputs['attention_mask']),          
+                            args.output_model,                  # where to save the model (can be a file or file-like object)
+                            opset_version=14,                   # the ONNX version to export the model
+                            do_constant_folding=True,           # whether to execute constant folding
+                            input_names=['input_ids',           # the model's input names
+                                        'attention_mask'],
+                            output_names=['logits'],
+                            dynamic_axes={'input_ids': symbolic_names,        # variable length axes
+                                        'attention_mask' : symbolic_names})
+        else:
+            inputs = {'input_ids':      torch.ones(1, args.max_len, dtype=torch.int64),
+                      'attention_mask': torch.ones(1, args.max_len, dtype=torch.int64),
+                    'token_type_ids': torch.ones(1, args.max_len, dtype=torch.int64)}
+            torch.onnx.export(model,                            # model being run
+                            (inputs['input_ids'],               # model input (or a tuple for multiple inputs) 
+                            inputs['attention_mask'],
+                            inputs['token_type_ids']),          
+                            args.output_model,                  # where to save the model (can be a file or file-like object)
+                            opset_version=14,                   # the ONNX version to export the model
+                            do_constant_folding=True,           # whether to execute constant folding
+                            input_names=['input_ids',           # the model's input names
+                                        'attention_mask',
+                                        'token_type_ids'],
+                            output_names=['logits'],
+                            dynamic_axes={'input_ids': symbolic_names,        # variable length axes
+                                        'attention_mask' : symbolic_names,
+                                        'token_type_ids' : symbolic_names})
+        print("ONNX Model exported to {0}".format(args.output_model))
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(
+    description='Export huggingface onnx model',
+    formatter_class=argparse.ArgumentDefaultsHelpFormatter)
+    parser.add_argument(
+        '--model_name_or_path',
+        type=str,
+        choices=['Intel/bert-base-uncased-mrpc',
+                'Intel/roberta-base-mrpc',
+                'Intel/xlm-roberta-base-mrpc',
+                'Intel/camembert-base-mrpc',
+                'distilbert-base-uncased-finetuned-sst-2-english',
+                'Alireza1044/albert-base-v2-sst2',
+                'philschmid/MiniLM-L6-H384-uncased-sst2',
+                'Intel/MiniLM-L12-H384-uncased-mrpc'],
+        help='pretrained model name or path')
+    parser.add_argument(
+        '--max_len',
+        type=int,
+        default=128,
+        help='Maximum length of the sentence pairs')
+    args = parser.parse_args()
+    args.output_model = args.model_name_or_path.split('/')[-1] + '.onnx'
+
+    model = AutoModelForSequenceClassification.from_pretrained(
+        args.model_name_or_path,
+        config=AutoConfig.from_pretrained(args.model_name_or_path))
+
+    export_onnx_model(args, model)
-Original file line number
+Diff line change
@@ Expand Up / @@ -495,6 +495,7 @@ dnf @@
     dnn
     dnnl
     DNNL
+    DnnlExecutionProvider
     Dockerfile
     doclist
     docstrings
@@ Expand Down Expand Up / @@ -563,6 +564,7 @@ enum @@
     env
     environ
     ep
+    eps
     eq
     erf
     Erf
@@ Expand Down @@