Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dnnl ep #903

Merged
merged 21 commits into from
Jul 18, 2023
Merged
Show file tree
Hide file tree
Changes from 18 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .azure-pipelines/scripts/codeScan/pyspelling/inc_dict.txt
Original file line number Diff line number Diff line change
Expand Up @@ -495,6 +495,7 @@ dnf
dnn
dnnl
DNNL
DnnlExecutionProvider
Dockerfile
doclist
docstrings
Expand Down Expand Up @@ -563,6 +564,7 @@ enum
env
environ
ep
eps
eq
erf
Erf
Expand Down
14 changes: 12 additions & 2 deletions docs/source/mixed_precision.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ The recently launched 3rd Gen Intel® Xeon® Scalable processor (codenamed Coope
</p>

## Mixed Precision Support Matrix

<table class="center">
<thead>
<tr>
Expand Down Expand Up @@ -48,7 +49,7 @@ The recently launched 3rd Gen Intel® Xeon® Scalable processor (codenamed Coope
<td align="left">:x:</td>
</tr>
<tr>
<td rowspan="3" align="left">ONNX Runtime</td>
<td rowspan="4" align="left">ONNX Runtime</td>
<td align="left">CPUExecutionProvider</td>
<td align="left">MLAS</td>
<td align="left">"default"</td>
Expand All @@ -72,6 +73,14 @@ The recently launched 3rd Gen Intel® Xeon® Scalable processor (codenamed Coope
<td align="left">&#10004;</td>
<td align="left">&#10004;</td>
</tr>
<tr>
<td align="left">DnnlExecutionProvider</td>
<td align="left">OneDNN</td>
<td align="left">"onnxrt_dnnl_ep"</td>
<td align="left">cpu</td>
<td align="left">&#10004;</td>
<td align="left">:x:</td>
</tr>
<tr>
<td rowspan="2" align="left">Tensorflow</td>
<td align="left">Tensorflow</td>
Expand Down Expand Up @@ -162,4 +171,5 @@ converted_model.save('./path/to/save/')
- Quick started with [helloworld example](/examples/helloworld/tf_example3)
- PyTorch [ResNet18](/examples/pytorch/image_recognition/torchvision_models/mixed_precision/resnet18)
- IPEX [DistilBERT base](/examples/pytorch/nlp/huggingface_models/question-answering/mixed_precision/ipex)
- Tensorflow [ResNet50](/examples/tensorflow/image_recognition/tensorflow_models/resnet50_v1/mixed_precision)
- Tensorflow [ResNet50](/examples/tensorflow/image_recognition/tensorflow_models/resnet50_v1/mixed_precision)
- ONNX Runtime [Bert base](/examples/onnxrt/nlp/huggingface_model/text_classification/mix_precision)
8 changes: 7 additions & 1 deletion docs/source/quantization.md
Original file line number Diff line number Diff line change
Expand Up @@ -452,7 +452,7 @@ Intel(R) Neural Compressor support multi-framework: PyTorch, Tensorflow, ONNX Ru
<td align="left">cpu</td>
</tr>
<tr>
<td rowspan="3" align="left">ONNX Runtime</td>
<td rowspan="4" align="left">ONNX Runtime</td>
<td align="left">CPUExecutionProvider</td>
<td align="left">MLAS</td>
<td align="left">"default"</td>
Expand All @@ -470,6 +470,12 @@ Intel(R) Neural Compressor support multi-framework: PyTorch, Tensorflow, ONNX Ru
<td align="left">"onnxrt_cuda_ep"</td>
<td align="left">gpu</td>
</tr>
<tr>
<td align="left">DnnlExecutionProvider</td>
<td align="left">OneDNN</td>
<td align="left">"onnxrt_dnnl_ep"</td>
<td align="left">cpu</td>
</tr>
<tr>
<td rowspan="2" align="left">Tensorflow</td>
<td align="left">Tensorflow</td>
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
Step-by-Step
============

This example load a language translation model and confirm its accuracy and speed based on [GLUE data](https://gluebenchmark.com/).

# Prerequisite

## 1. Environment
```shell
git clone -b dnnl_ep --depth 1 https://github.com/intel/neural-compressor.git
cd neural-compressor
pip install -e ./

cd examples/onnxrt/nlp/huggingface_model/text_classification/mix_precision/
pip install -r requirements.txt
```
> Note: Validated ONNX Runtime [Version](/docs/source/installation_guide.md#validated-software-environment).

## 2. Prepare Model

Supported model identifier from [huggingface.co](https://huggingface.co/):

| Model Identifier |
|:-----------------------------------------------:|
| Intel/bert-base-uncased-mrpc |
| Intel/roberta-base-mrpc |
| Intel/xlm-roberta-base-mrpc |
| Intel/camembert-base-mrpc |
| distilbert-base-uncased-finetuned-sst-2-english |
| Alireza1044/albert-base-v2-sst2 |
| Intel/MiniLM-L12-H384-uncased-mrpc |
| philschmid/MiniLM-L6-H384-uncased-sst2 |
| bert-base-cased-finetuned-mrpc |
| Intel/electra-small-discriminator-mrpc |
| M-FAC/bert-mini-finetuned-mrpc |
| Intel/xlnet-base-cased-mrpc |
| Intel/bart-large-mrpc |

```bash
python export.py --model_name_or_path=Intel/bert-base-uncased-mrpc # or other supported model identifier
mengniwang95 marked this conversation as resolved.
Show resolved Hide resolved
```

## 3. Prepare Dataset
Download the GLUE data with `prepare_data.sh` script.

```shell
export GLUE_DIR=/path/to/glue_data
export TASK_NAME=MRPC # or SST

bash prepare_data.sh --data_dir=$GLUE_DIR --task_name=$TASK_NAME
```

# Run

If the hardware doesn't support bf16 instruction, please set flag as below to force bf16 conversion (this way will be deprecated):

```shell
export FORCE_BF16=1
```

## 1. Only mixed precision conversion

```bash
bash run.sh --input_model=path/to/model \ # model path as *.onnx
--output_model=path/to/model_tune \ # model path as *.onnx
```

## 2. Mixed precision conversion + accuracy evaluation

Please make sure DnnlExecutionProvider is in available providers list to execute evaluation.

```bash
bash eval.sh --input_model=path/to/model \ # model path as *.onnx
--output_model=path/to/model_tune \ # model path as *.onnx
--dataset_location=path/to/glue/data \
--batch_size=batch_size \ # optional
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
#!/bin/bash
set -x

function main {
init_params "$@"
run_tuning
}

# init params
function init_params {
for var in "$@"
do
case $var in
--input_model=*)
input_model=$(echo $var |cut -f2 -d=)
;;
--output_model=*)
output_model=$(echo $var |cut -f2 -d=)
;;
--dataset_location=*)
dataset_location=$(echo $var |cut -f2 -d=)
;;
--batch_size=*)
batch_size=$(echo $var |cut -f2 -d=)
;;
esac
done

}

# run_tuning
function run_tuning {

if [[ "${input_model}" =~ "bert-base-uncased" ]]; then
model_name_or_path="Intel/bert-base-uncased-mrpc"
TASK_NAME='mrpc'
num_heads=12
hidden_size=768
fi
if [[ "${input_model}" =~ "roberta-base" ]]; then
model_name_or_path="Intel/roberta-base-mrpc"
TASK_NAME='mrpc'
num_heads=12
hidden_size=768
fi
if [[ "${input_model}" =~ "xlm-roberta-base" ]]; then
model_name_or_path="Intel/xlm-roberta-base-mrpc"
TASK_NAME='mrpc'
num_heads=12
hidden_size=768
fi
if [[ "${input_model}" =~ "camembert-base" ]]; then
model_name_or_path="Intel/camembert-base-mrpc"
TASK_NAME='mrpc'
num_heads=12
hidden_size=768
fi
if [[ "${input_model}" =~ "distilbert-base" ]]; then
model_name_or_path="distilbert-base-uncased-finetuned-sst-2-english"
TASK_NAME='sst-2'
num_heads=12
hidden_size=768
fi
if [[ "${input_model}" =~ "albert-base" ]]; then
model_name_or_path="Alireza1044/albert-base-v2-sst2"
TASK_NAME='sst-2'
num_heads=12
hidden_size=768
fi
if [[ "${input_model}" =~ "MiniLM-L6" ]]; then
model_name_or_path="philschmid/MiniLM-L6-H384-uncased-sst2"
TASK_NAME='sst-2'
num_heads=12
hidden_size=384
fi
if [[ "${input_model}" =~ "MiniLM-L12" ]]; then
model_name_or_path="Intel/MiniLM-L12-H384-uncased-mrpc"
TASK_NAME='mrpc'
num_heads=12
hidden_size=384
fi
if [[ "${input_model}" =~ "bert-base-cased" ]]; then
model_name_or_path="bert-base-cased-finetuned-mrpc"
TASK_NAME='mrpc'
num_heads=12
hidden_size=384
fi
if [[ "${input_model}" =~ "xlnet-base-cased" ]]; then
model_name_or_path="Intel/xlnet-base-cased-mrpc"
TASK_NAME='mrpc'
num_heads=12
hidden_size=768
fi
if [[ "${input_model}" =~ "bert-mini" ]]; then
model_name_or_path="M-FAC/bert-mini-finetuned-mrpc"
TASK_NAME='mrpc'
num_heads=4
hidden_size=256
fi
if [[ "${input_model}" =~ "electra-small-discriminator" ]]; then
model_name_or_path="Intel/electra-small-discriminator-mrpc"
TASK_NAME='mrpc'
num_heads=4
hidden_size=256
fi
if [[ "${input_model}" =~ "bart" ]]; then
model_name_or_path="Intel/bart-large-mrpc"
TASK_NAME='mrpc'
num_heads=16
hidden_size=4096
fi

python main.py \
--model_name_or_path ${model_name_or_path} \
--model_path ${input_model} \
--output_model ${output_model} \
--data_path ${dataset_location} \
--batch_size ${batch_size-1} \
--task ${TASK_NAME} \
--num_heads ${num_heads} \
--hidden_size ${hidden_size} \
--do_eval
}

main "$@"



Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
import argparse

import torch
from transformers import AutoConfig, AutoModelForSequenceClassification

def export_onnx_model(args, model):
with torch.no_grad():
symbolic_names = {0: 'batch_size', 1: 'max_seq_len'}
if args.model_name_or_path in ['Intel/roberta-base-mrpc',
'Intel/xlm-roberta-base-mrpc',
'Intel/camembert-base-mrpc',
'distilbert-base-uncased-finetuned-sst-2-english']:
inputs = {'input_ids': torch.ones(1, args.max_len, dtype=torch.int64),
'attention_mask': torch.ones(1, args.max_len, dtype=torch.int64)}
torch.onnx.export(model, # model being run
(inputs['input_ids'], # model input (or a tuple for multiple inputs)
inputs['attention_mask']),
args.output_model, # where to save the model (can be a file or file-like object)
opset_version=14, # the ONNX version to export the model
do_constant_folding=True, # whether to execute constant folding
input_names=['input_ids', # the model's input names
'attention_mask'],
output_names=['logits'],
dynamic_axes={'input_ids': symbolic_names, # variable length axes
'attention_mask' : symbolic_names})
else:
inputs = {'input_ids': torch.ones(1, args.max_len, dtype=torch.int64),
'attention_mask': torch.ones(1, args.max_len, dtype=torch.int64),
'token_type_ids': torch.ones(1, args.max_len, dtype=torch.int64)}
torch.onnx.export(model, # model being run
(inputs['input_ids'], # model input (or a tuple for multiple inputs)
inputs['attention_mask'],
inputs['token_type_ids']),
args.output_model, # where to save the model (can be a file or file-like object)
opset_version=14, # the ONNX version to export the model
do_constant_folding=True, # whether to execute constant folding
input_names=['input_ids', # the model's input names
'attention_mask',
'token_type_ids'],
output_names=['logits'],
dynamic_axes={'input_ids': symbolic_names, # variable length axes
'attention_mask' : symbolic_names,
'token_type_ids' : symbolic_names})
print("ONNX Model exported to {0}".format(args.output_model))

if __name__ == "__main__":
parser = argparse.ArgumentParser(
description='Export huggingface onnx model',
formatter_class=argparse.ArgumentDefaultsHelpFormatter)
parser.add_argument(
'--model_name_or_path',
type=str,
choices=['Intel/bert-base-uncased-mrpc',
'Intel/roberta-base-mrpc',
'Intel/xlm-roberta-base-mrpc',
'Intel/camembert-base-mrpc',
'distilbert-base-uncased-finetuned-sst-2-english',
'Alireza1044/albert-base-v2-sst2',
'philschmid/MiniLM-L6-H384-uncased-sst2',
'Intel/MiniLM-L12-H384-uncased-mrpc'],
help='pretrained model name or path')
parser.add_argument(
'--max_len',
type=int,
default=128,
help='Maximum length of the sentence pairs')
args = parser.parse_args()
args.output_model = args.model_name_or_path.split('/')[-1] + '.onnx'

model = AutoModelForSequenceClassification.from_pretrained(
args.model_name_or_path,
config=AutoConfig.from_pretrained(args.model_name_or_path))

export_onnx_model(args, model)
Loading