diff --git a/docs/_static/images/notebook_eye.png b/docs/_static/images/notebook_eye.png new file mode 100644 index 00000000000000..ecc13e7bdfba89 --- /dev/null +++ b/docs/_static/images/notebook_eye.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1a2e58cf3e5703356b0e060ebc7cb0cbb852db9cde003d41c1d86bafc3a4ccb1 +size 68559 diff --git a/docs/articles_en/learn_openvino/tutorials.md b/docs/articles_en/learn_openvino/tutorials.md index 3fa4a11ee8b330..779fd4995edaf7 100644 --- a/docs/articles_en/learn_openvino/tutorials.md +++ b/docs/articles_en/learn_openvino/tutorials.md @@ -6,7 +6,7 @@ .. meta:: :description: Run Python tutorials on Jupyter notebooks to learn how to use OpenVINO™ toolkit for optimized - deep learning inference. + deep learning inference. .. toctree:: @@ -15,6 +15,12 @@ :hidden: notebooks_installation + notebooks_section_0_get_started + notebooks_section_1_convert__optimize + notebooks_section_2_model_demos + notebooks_section_3_model_training + notebooks_section_4_live_demos + This collection of Python tutorials are written for running on Jupyter notebooks. @@ -29,382 +35,122 @@ its name and the Jupyter notebook will start it in a new tab of a browser. .. note:: - `Binder `__ and `Google Colab `__ - are free online services with limited resources. For the best performance - and more control, you should run the notebooks locally. Follow the + `Binder `__ and `Google Colab `__ + are free online services with limited resources. For the best performance + and more control, you should run the notebooks locally. Follow the :doc:`Installation Guide ` in order to get information on how to run and manage the notebooks on your machine. -More examples along with additonal details regarding OpenVINO Notebooks are available in +More examples along with additional details regarding OpenVINO Notebooks are available in OpenVINO™ Notebooks `Github Repository. `__ The Jupyter notebooks are categorized into following classes: -- `First steps with OpenVINO <#first-steps-with-openvino>`__ -- `Convert & Optimize <#convert-optimize>`__ -- `Model Demos <#model-demos>`__ -- `Model Training <#model-training>`__ -- `Live Demos <#live-demos>`__ +- :doc:`First steps with OpenVINO ` +- :doc:`Convert & Optimize ` +- :doc:`Model Demos ` +- :doc:`Model Training ` +- :doc:`Live Demos ` -Recommended Tutorials -###################### +Below you will find a selection of recommended tutorials that demonstrate inference on a particular model. These tutorials are guaranteed to provide a great experience with inference in OpenVINO: + +.. showcase:: + :title: 269-film-slowmo + :img: https://github.com/googlestaging/frame-interpolation/raw/main/moment.gif + + Frame interpolation using FILM and OpenVINO. + +.. showcase:: + :title: 268-table-question-answering + :img: _static/images/notebook_eye.png + + Table Question Answering using TAPAS and OpenVINO. + +.. showcase:: + :title: 267-distil-whisper-asr + :img: _static/images/notebook_eye.png + + Automatic speech recognition using Distil-Whisper and OpenVINO. + +.. showcase:: + :title: 266-speculative-sampling + :img: _static/images/notebook_eye.png + + Text Generation via Speculative Sampling, KV Caching, and OpenVINO. + +.. showcase:: + :title: 265-wuerstchen-image-generation + :img: https://user-images.githubusercontent.com/76161256/277724498-6917c558-d74c-4cc9-b81a-679ce0a299ee.png + + Image generation with Würstchen and OpenVINO. + +.. showcase:: + :title: 264-qrcode-monster + :img: https://user-images.githubusercontent.com/76463150/278011447-1a5978c6-e7a0-4824-9318-a3d8f4912c47.png + + Generate creative QR codes with ControlNet QR Code Monster and OpenVINO. + +.. showcase:: + :title: 263-latent-consistency-models-image-generation + :img: https://user-images.githubusercontent.com/29454499/277367065-13a8f622-8ea7-4d12-b3f8-241d4499305e.png + + Image generation with Latent Consistency Model and OpenVINO. + +.. showcase:: + :title: 262-softvc-voice-conversion + :img: _static/images/notebook_eye.png + + SoftVC VITS Singing Voice Conversion and OpenVINO. -The following tutorials are guaranteed to provide a great experience with inference in OpenVINO: - -+----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ -| Notebook | | Preview | -+==============================================================================================================================================+============================================================================================================================================+====================================================+ -| `YOLOv8 - Optimization `__ |br| |c230c| | Convert and Optimize YOLOv8 real-time object detection with OpenVINO™. | |n230-img1| | -+----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ -| `SAM - Segment Anything Model `__ | Prompt based object segmentation mask generation, using Segment Anything and OpenVINO™. | |n237-img1| | -+----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ -| `ControlNet - Stable-Diffusion `__ | A text-to-image generation with ControlNet Conditioning and OpenVINO™. | |n235-img1| | -+----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ -| `Stable Diffusion v2 `__ | Text-to-image generation and Infinite Zoom with Stable Diffusion v2 and OpenVINO™. | |n236-img1| | -+----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ -| `Whisper - Subtitles generation `__ |br| |c227| | Generate subtitles for video with OpenAI Whisper and OpenVINO. | |n227-img1| | -+----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ -| `CLIP - zero-shot-image-classification `__ | Perform Zero-shot image classification with CLIP and OpenVINO. | |n228-img1| | -+----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ -| `BLIP - Visual-language-processing `__ | Visual question answering and image captioning using BLIP and OpenVINO™. | |n233-img1| | -+----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ -| `Instruct pix2pix - Image-editing `__ | Image editing with InstructPix2Pix. | |n231-img1| | -+----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ -| `DeepFloyd IF - Text-to-Image generation `__ | Text-to-image generation with DeepFloyd IF and OpenVINO™. | |n238-img1| | -+----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ -| `ImageBind `__ | Binding multimodal data, using ImageBind and OpenVINO™. | |n239-img1| | -+----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ -| `Dolly v2 `__ | Instruction following using Databricks Dolly 2.0 and OpenVINO™. | |n240-img1| | -+----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ -| `Stable Diffusion XL `__ | Image generation with Stable Diffusion XL and OpenVINO™. | |n248-img1| | -+----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ -| `MusicGen `__ |br| |n250| |br| |c250| | Controllable Music Generation with MusicGen and OpenVINO™. | |n250-img1| | -+----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ -| `Tiny SD `__ |br| |c251| | Image Generation with Tiny-SD and OpenVINO™. | |n251-img1| | -+----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ -| `253-zeroscope-text2video `__ | Text-to video synthesis with ZeroScope and OpenVINO™. | A panda eating bamboo on a rock. |br| |n253-img1| | -+----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ -| `254-llm-chatbot `__ | Create LLM-powered Chatbot using OpenVINO. | |n254-img1| | -+----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ -| `256-bark-text-to-audio `__ | Text-to-Speech generation with BARK and OpenVINO. | |n256-img1| | -+----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ -| `257-llava-multimodal-chatbot `__ | Visual-language assistant with LLaVA and OpenVINO. | |n257-img1| | -+----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ -| `258-blip-diffusion-subject-generation `__ | Subject-driven image generation and editing using BLIP Diffusion and OpenVINO. | |n258-img1| | -+----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ -| `259-decidiffusion-image-generation `__ | Image Generation with DeciDiffusion. | |n259-img1| | -+----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ -| `261-fast-segment-anything `__ |br| |n261| |br| |c261| | Object segmentations with FastSAM and OpenVINO™. | |n261-img1| | -+----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ -| `262-softvc-voice-conversion `__ |br| |c262| | Text-to video synthesis with ZeroScope and OpenVINO™. | | -+----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ -| `263-latent-consistency-models-image-generation `__ | Image generation with Latent Consistency Model and OpenVINO. ||n263-img1| | -+----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ - - - - -First steps with OpenVINO -########################## - -Brief tutorials that demonstrate how to use Python API for inference in OpenVINO. - -+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+ -| Notebook | Description | Preview | -+===============================================================================================================================+============================================================================================================================================+===========================================+ -| `001-hello-world `__ |br| |n001| | Classify an image with OpenVINO. | |n001-img1| | -+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+ -| `002-openvino-api `__ |br| |n002| |br| |c002| | Learn the OpenVINO Python API. | |n002-img1| | -+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+ -| `003-hello-segmentation `__ |br| |n003| | Semantic segmentation with OpenVINO. | |n003-img1| | -+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+ -| `004-hello-detection `__ |br| |n004| | Text detection with OpenVINO. | |n004-img1| | -+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+ - -Convert & Optimize -################### - -Tutorials that explain how to optimize and quantize models with OpenVINO tools. - -+------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+ -| Notebook | Description | Preview | -+====================================================================================================================================+============================================================================================================================================+===========================================+ -| `101-tensorflow-classification-to-openvino `__ |br| |n101| | Convert TensorFlow models to OpenVINO IR. | |n101-img1| | -+------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+ -| `102-pytorch-to-openvino `__ |br| |c102| | Convert PyTorch models to OpenVINO IR. | |n102-img1| | -+------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+ -| `103-paddle-onnx-to-openvino `__ |br| |n103| | Convert PaddlePaddle models to OpenVINO IR. | |n103-img1| | -+------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+ -| `121-convert-to-openvino `__ |br| |n121| |br| |c121| | Learn OpenVINO model conversion API | | -+------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+ - -.. dropdown:: Explore more notebooks here. - - +----------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+ - | Notebook | Description | - +================================================================================================================================================================+=============================================================================================================================================+ - | `102-pytorch-onnx-to-openvino `__ | Convert PyTorch models to OpenVINO IR. | - +----------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+ - | `104-model-tools `__ |br| |n104| | Download, convert and benchmark models from Open Model Zoo. | - +----------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+ - | `105-language-quantize-bert `__ | Optimize and quantize a pre-trained BERT model. | - +----------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+ - | `106-auto-device `__ |br| |n106| | Demonstrates how to use AUTO Device. | - +----------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+ - | `107-speech-recognition-quantization `__ |br| |c107| | Optimize and quantize a pre-trained Data2Vec speech model. | - +----------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+ - | `107-speech-recognition-quantization `__ | Optimize and quantize a pre-trained Wav2Vec2 speech model. | - +----------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+ - | `108-gpu-device `__ | Working with GPUs in OpenVINO™ | - +----------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+ - | `109-latency-tricks `__ | Performance tricks for latency mode in OpenVINO™. | - +----------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+ - | `109-throughput-tricks `__ | Performance tricks for throughput mode in OpenVINO™. | - +----------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+ - | `110-ct-segmentation-quantize `__ |br| |n110| | Live inference of a kidney segmentation model and benchmark CT-scan data with OpenVINO. | - +----------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+ - | `110-ct-segmentation-quantize `__ | Quantize a kidney segmentation model and show live inference. | - +----------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+ - | `111-yolov5-quantization-migration `__ |br| |c111| | Migrate YOLOv5 POT API based quantization pipeline on Neural Network Compression Framework (NNCF). | - +----------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+ - | `112-pytorch-post-training-quantization-nncf `__ | Use Neural Network Compression Framework (NNCF) to quantize PyTorch model in post-training mode (without model fine-tuning). | - +----------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+ - | `113-image-classification-quantization `__ |br| |n113| | Quantize MobileNet image classification. | - +----------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+ - | `115-async-api `__ |br| |n115| |br| |c115| | Use asynchronous execution to improve data pipelining. | - +----------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+ - | `116-sparsity-optimization `__ |br| |c116| | Improve performance of sparse Transformer models. | - +----------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+ - | `117-model-server `__ | Improve performance of sparse Transformer models. | - +----------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+ - | `118-optimize-preprocessing `__ | Improve performance of image preprocessing step. | - +----------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+ - | `119-tflite-to-openvino `__ |br| |c119| | Convert TensorFlow Lite models to OpenVINO IR. | - +----------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+ - | `120-tensorflow-object-detection-to-openvino `__ |br| |n120| |br| |c120| | Convert TensorFlow Object Detection models to OpenVINO IR | - +----------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+ - | `120-tensorflow-instance-segmentation-to-openvino `__ |br| |n120a| |br| |c120a| | Convert the Mask R-CNN with Inception ResNet V2 Instance Segmentation model and then segment instances in an image using OpenVINO Runtime. | - +----------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+ - | `122-speech-recognition-quantization-wav2vec2 `__ | Quantize Speech Recognition Models with accuracy control using NNCF PTQ API. | - +----------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+ - | `122-yolov8-quantization-with-accuracy-control `__ | Convert and Optimize YOLOv8 with OpenVINO™. | - +----------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+ - | `123-detectron2-to-openvino `__ |br| |n123| |br| |c123| | Convert Detection2 Models to OpenVINO™. | - +----------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+ - | `124-hugging-face-hub `__ |br| |n124| |br| |c124| | Hugging Face Model Hub with OpenVINO™. | - +----------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+ - | `125-convnext-classification `__ | Convert TorchVision ConvNext classification model to OpenVINO IR. | - +----------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+ - | `126-tensorflow-hub `__ |br| |n126| |br| |c126| | Convert TensorFlow Hub models to OpenVINO IR. | - +----------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+ - - -Model Demos -################### - -Demos that demonstrate inference on a particular model. - -+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+ -| Notebook | Description | Preview | -+===============================================================================================================================+============================================================================================================================================+===========================================+ -| `205-vision-background-removal `__ |br| |n205| |br| |c205| | Remove and replace the background in an image using salient object detection. | |n205-img1| | -+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+ -| `209-handwritten-ocr `__ |br| |n209| | OCR for handwritten simplified Chinese and Japanese. | |n209-img1| |br| |chinese-text| | -+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+ -| `211-speech-to-text `__ |br| |n211| | Run inference on speech-to-text recognition model. | |n211-img1| | -+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+ -| `215-image-inpainting `__ |br| |n215| | Fill missing pixels with image in-painting. | |n215-img1| | -+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+ -| `218-vehicle-detection-and-recognition `__ |br| |n218| | Use pre-trained models to detect and recognize vehicles and their attributes with OpenVINO. | |n218-img1| | -+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+ - -.. dropdown:: Explore more notebooks below. - - +----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ - | Notebook | Description | Preview | - +==============================================================================================================================================+============================================================================================================================================+====================================================+ - | `201-vision-monodepth `__ |br| |n201| |br| |c201| | Monocular depth estimation with images and video. | |n201-img1| | - +----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ - | `202-vision-superresolution-image `__ |br| |n202i| |br| |c202i| | Upscale raw images with a super resolution model. | |n202i-img1| → |n202i-img2| | - +----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ - | `202-vision-superresolution-video `__ |br| |n202v| |br| |c202v| | Turn 360p into 1080p video using a super resolution model. | |n202v-img1| → |n202v-img2| | - +----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ - | `203-meter-reader `__ |br| |n203| | PaddlePaddle pre-trained models to read industrial meter's value. | |n203-img1| | - +----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ - | `204-segmenter-semantic-segmentation `__ |br| |c204| | Semantic segmentation with OpenVINO™ using Segmenter. | |n204-img1| | - +----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ - | `206-vision-paddlegan-anime `__ | Turn an image into anime using a GAN. | |n206-img1| → |n206-img2| | - +----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ - | `207-vision-paddlegan-superresolution `__ | Upscale small images with superresolution using a PaddleGAN model. | |n207-img1| → |n207-img2| | - +----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ - | `208-optical-character-recognition `__ | Annotate text on images using text recognition resnet. | |n208-img1| | - +----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ - | `212-pyannote-speaker-diarization `__ | Run inference on speaker diarization pipeline. | |n212-img1| | - +----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ - | `210-slowfast-video-recognition `__ |br| |n210| | Video Recognition using SlowFast and OpenVINO™ | |n210-img1| | - +----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ - | `213-question-answering `__ |br| |n213| | Answer your questions basing on a context. | |n213-img1| | - +----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ - | `214-grammar-correction `__ | Grammatical error correction with OpenVINO. | | - +----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ - | `216-attention-center `__ | The attention center model with OpenVINO™ | | - +----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ - | `217-vision-deblur `__ |br| |n217| | Deblur images with DeblurGAN-v2. | |n217-img1| | - +----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ - | `219-knowledge-graphs-conve `__ |br| |n219| | Optimize the knowledge graph embeddings model (ConvE) with OpenVINO. | | - +----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ - | `220-cross-lingual-books-alignment `__ |br| |n220| |br| |c220| | Cross-lingual Books Alignment With Transformers and OpenVINO™ | |n220-img1| | - +----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ - | `221-machine-translation `__ |br| |n221| |br| |c221| | Real-time translation from English to German. | | - +----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ - | `222-vision-image-colorization `__ |br| |n222| | Use pre-trained models to colorize black & white images using OpenVINO. | |n222-img1| | - +----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ - | `223-text-prediction `__ |br| |c223| | Use pre-trained models to perform text prediction on an input sequence. | |n223-img1| | - +----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ - | `224-3D-segmentation-point-clouds `__ |br| |n224| |br| |c224| | Process point cloud data and run 3D Part Segmentation with OpenVINO. | |n224-img1| | - +----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ - | `225-stable-diffusion-text-to-image `__ |br| |c225| | Text-to-image generation with Stable Diffusion method. | |n225-img1| | - +----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ - | `226-yolov7-optimization `__ | Optimize YOLOv7, using NNCF PTQ API. | |n226-img1| | - +----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ - | `227-whisper-subtitles-generation `__ |br| |c227| | Generate subtitles for video with OpenAI Whisper and OpenVINO. | |n227-img1| | - +----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ - | `228-clip-zero-shot-convert `__ | Zero-shot Image Classification with OpenAI CLIP and OpenVINO™ | |n228-img1| | - +----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ - | `228-clip-zero-shot-quantize `__ | Post-Training Quantization of OpenAI CLIP model with NNCF | |n228-img2| | - +----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ - | `229-distilbert-sequence-classification `__ |br| |n229| | Sequence classification with OpenVINO. | |n229-img1| | - +----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ - | `230-yolov8-instance-segmentation `__ |br| |c230a| | Convert and Optimize YOLOv8 instance segmentation model with OpenVINO™. | |n230-img1| | - +----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ - | `230-yolov8-keypoint-detection `__ |br| |c230b| | Convert and Optimize YOLOv8 keypoint detection model with OpenVINO™. | |n230-img1| | - +----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ - | `230-yolov8-object-detection `__ |br| |c230c| | Convert and Optimize YOLOv8 real-time object detection with OpenVINO™. | |n230-img1| | - +----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ - | `231-instruct-pix2pix-image-editing `__ | Image editing with InstructPix2Pix. | |n231-img1| | - +----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ - | `232-clip-language-saliency-map `__ |br| |c232| | Language-visual saliency with CLIP and OpenVINO™. | |n232-img1| | - +----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ - | `233-blip-convert `__ | Visual Question Answering and Image Captioning using BLIP and OpenVINO. | |n233-img1| | - +----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ - | `233-blip-optimize `__ | Post-Training Quantization and Weights Compression of OpenAI BLIP model with NNCF. | |n233-img1| | - +----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ - | `234-encodec-audio-compression `__ | Audio compression with EnCodec and OpenVINO™. | |n234-img1| | - +----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ - | `235-controlnet-stable-diffusion `__ | A text-to-image generation with ControlNet Conditioning and OpenVINO™. | |n235-img1| | - +----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ - | `236-stable-diffusion-v2 `__ | Text-to-image generation and Infinite Zoom with Stable Diffusion v2 and OpenVINO™. | |n236-img1| | - +----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ - | `236-stable-diffusion-v2 `__ | Stable Diffusion v2.1 using Optimum-Intel OpenVINO and multiple Intel Hardware. | |n236-img4| | - +----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ - | `236-stable-diffusion-v2 `__ | Stable Diffusion v2.1 using Optimum-Intel OpenVINO. | |n236-img4| | - +----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ - | `236-stable-diffusion-v2 `__ | Stable Diffusion Text-to-Image Demo. | |n236-img4| | - +----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ - | `236-stable-diffusion-v2 `__ | Text-to-image generation with Stable Diffusion v2 and OpenVINO™. | |n236-img1| | - +----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ - | `237-segment-anything `__ | Prompt based object segmentation mask generation, using Segment Anything and OpenVINO™. | |n237-img1| | - +----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ - | `238-deep-floyd-if-optimize `__ | Text-to-image generation with DeepFloyd IF and OpenVINO™. | |n238-img1| | - +----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ - | `239-image-bind `__ | Binding multimodal data, using ImageBind and OpenVINO™. | |n239-img1| | - +----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ - | `240-dolly-2-instruction-following `__ | Instruction following using Databricks Dolly 2.0 and OpenVINO™. | |n240-img1| | - +----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ - | `241-riffusion-text-to-music `__ | Text-to-Music generation using Riffusion and OpenVINO™. | |n241-img1| | - +----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ - | `242-freevc-voice-conversion `__ | High-Quality Text-Free One-Shot Voice Conversion with FreeVC and OpenVINO™ | | - +----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ - | `243-tflite-selfie-segmentation `__ |br| |n243| |br| |c243| | Selfie Segmentation using TFLite and OpenVINO™. | |n243-img1| | - +----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ - | `244-named-entity-recognition `__ |br| |c244| | Named entity recognition with OpenVINO™. | | - +----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ - | `245-typo-detector `__ | English Typo Detection in sentences with OpenVINO™. | |n245-img1| | - +----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ - | `246-depth-estimation-videpth `__ | Monocular Visual-Inertial Depth Estimation with OpenVINO™. | |n246-img1| | - +----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ - | `247-code-language-id `__ |br| |n247| | Identify the programming language used in an arbitrary code snippet. | | - +----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ - | `248-stable-diffusion-xl `__ | Image generation with Stable Diffusion XL and OpenVINO™. | |n248-img1| | - +----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ - | `249-oneformer-segmentation `__ | Universal segmentation with OneFormer and OpenVINO™. | |n249-img1| | - +----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ - | `250-music-generation `__ |br| |n250| |br| |c250| | Controllable Music Generation with MusicGen and OpenVINO™. | |n250-img1| | - +----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ - | `251-tiny-sd-image-generation `__ |br| |c251| | Image Generation with Tiny-SD and OpenVINO™. | |n251-img1| | - +----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ - | `252-fastcomposer-image-generation `__ | Image generation with FastComposer and OpenVINO™. | | - +----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ - | `253-zeroscope-text2video `__ | Text-to video synthesis with ZeroScope and OpenVINO™. | A panda eating bamboo on a rock. |br| |n253-img1| | - +----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ - | `254-llm-chatbot `__ | Create LLM-powered Chatbot using OpenVINO. | |n254-img1| | - +----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ - | `255-mms-massively-multilingual-speech `__ | MMS: Scaling Speech Technology to 1000+ languages with OpenVINO™. | | - +----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ - | `256-bark-text-to-audio `__ | Text-to-Speech generation with BARK and OpenVINO. | |n256-img1| | - +----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ - | `257-llava-multimodal-chatbot `__ | Visual-language assistant with LLaVA and OpenVINO. | |n257-img1| | - +----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ - | `258-blip-diffusion-subject-generation `__ | Subject-driven image generation and editing using BLIP Diffusion and OpenVINO. | |n258-img1| | - +----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ - | `259-decidiffusion-image-generation `__ | Image Generation with DeciDiffusion. | |n259-img1| | - +----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ - | `260-pix2struct-docvqa `__ |br| |c260| | Document Visual Question Answering Using Pix2Struct and OpenVINO. | | - +----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ - | `261-fast-segment-anything `__ |br| |n261| |br| |c261| | Object segmentations with FastSAM and OpenVINO™. | |n261-img1| | - +----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ - | `262-softvc-voice-conversion `__ |br| |c262| | Text-to video synthesis with ZeroScope and OpenVINO™. | | - +----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ - | `263-latent-consistency-models-image-generation `__ | Image generation with Latent Consistency Model and OpenVINO. ||n263-img1| | - +----------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+ - - - - - - - -Model Training -################## - -Tutorials that include code to train neural networks. - - -+--------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+ -| Notebook | Description | Preview | -+======================================================================================================================================+============================================================================================================================================+===========================================+ -| `301-tensorflow-training-openvino `__ | Train a flower classification model from TensorFlow, then convert to OpenVINO IR. | |n301-img1| | -+--------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+ -| `302-pytorch-quantization-aware-training `__ | Use Neural Network Compression Framework (NNCF) to quantize PyTorch model. | | -+--------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+ -| `305-tensorflow-quantization-aware-training `__ |br| |c305| | Use Neural Network Compression Framework (NNCF) to quantize TensorFlow model. | | -+--------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+ - -Live Demos -################ - -Live inference demos that run on a webcam or video files. - - -+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+ -| Notebook | Description | Preview | -+===============================================================================================================================+============================================================================================================================================+===========================================+ -| `401-object-detection-webcam `__ |br| |n401| |br| |c401| | Object detection with a webcam or video file. | |n401-img1| | -+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+ -| `402-pose-estimation-webcam `__ |br| |n402| | Human pose estimation with a webcam or video file. | |n402-img1| | -+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+ -| `403-action-recognition-webcam `__ |br| |n403| | Human action recognition with a webcam or video file. | |n403-img1| | -+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+ -| `404-style-transfer-webcam `__ |br| |n404| |br| |c404| | Style transfer with a webcam or video file. | |n404-img1| | -+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+ -| `405-paddle-ocr-webcam `__ |br| |n405| |br| |c405| | OCR with a webcam or video file. | |n405-img1| | -+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+ -| `406-3D-pose-estimation-webcam `__ |br| |n406| | 3D display of human pose estimation with a webcam or video file. | |n406-img1| | -+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+ -| `407-person-tracking-webcam `__ |br| |n407| |br| |c407| | Person tracking with a webcam or video file. | |n407-img1| | -+-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+ +.. showcase:: + :title: 261-fast-segment-anything + :img: https://user-images.githubusercontent.com/26833433/248551984-d98f0f6d-7535-45d0-b380-2e1440b52ad7.jpg + + Object segmentation with FastSAM and OpenVINO. + +.. showcase:: + :title: 259-decidiffusion-image-generation + :img: https://user-images.githubusercontent.com/29454499/274927904-cd734349-9954-4656-ab96-08a903e846ef.png + + Image generation with DeciDiffusion and OpenVINO. + +.. showcase:: + :title: 258-blip-diffusion-subject-generation + :img: https://user-images.githubusercontent.com/76161256/275485611-0ecf621f-b544-44ae-8258-8a49be704989.png + + Subject-driven image generation and editing using BLIP Diffusion and OpenVINO. + +.. showcase:: + :title: 257-llava-multimodal-chatbot + :img: https://raw.githubusercontent.com/haotian-liu/LLaVA/main/images/llava_logo.png + + Visual-language assistant with LLaVA and OpenVINO. + +.. showcase:: + :title: 256-bark-text-to-audio + :img: https://user-images.githubusercontent.com/29454499/269278630-9a770279-0045-480e-95f2-1a2f2d0a5115.png + + Text-to-speech generation using Bark and OpenVINO. + +.. showcase:: + :title: 254-llm-chatbot + :img: _static/images/notebook_eye.png + + Create an LLM-powered Chatbot using OpenVINO. + +.. showcase:: + :title: 253-zeroscope-text2video + :img: https://camo.githubusercontent.com/64eec6e52d060ca971c5a3be3f0d60e712907c98b4661b454d7e3e9575c2bc6b/68747470733a2f2f68756767696e67666163652e636f2f64617461736574732f68756767696e67666163652f646f63756d656e746174696f6e2d696d616765732f7265736f6c76652f6d61696e2f6469666675736572732f646172746876616465725f63657270656e73652e676966 + + Video generation with ZeroScope and OpenVINO. + +.. showcase:: + :title: 251-tiny-sd-image-generation + :img: https://user-images.githubusercontent.com/29454499/260904650-274fc2f9-24d2-46a3-ac3d-d660ec3c9a19.png + + Image Generation with Tiny-SD and OpenVINO. .. note:: @@ -420,400 +166,11 @@ Additional Resources * `Google Colab `__ -.. |br| raw:: html - -
- -.. |chinese-text| raw:: html - - 的人不一了是他有为在责新中任自之我们 - -.. |n001-img1| image:: https://user-images.githubusercontent.com/36741649/127170593-86976dc3-e5e4-40be-b0a6-206379cd7df5.jpg - :target: https://user-images.githubusercontent.com/36741649/127170593-86976dc3-e5e4-40be-b0a6-206379cd7df5.jpg -.. |n002-img1| image:: https://user-images.githubusercontent.com/15709723/127787560-d8ec4d92-b4a0-411f-84aa-007e90faba98.png - :target: https://user-images.githubusercontent.com/15709723/127787560-d8ec4d92-b4a0-411f-84aa-007e90faba98.png -.. |n003-img1| image:: https://user-images.githubusercontent.com/15709723/128290691-e2eb875c-775e-4f4d-a2f4-15134044b4bb.png - :target: https://user-images.githubusercontent.com/15709723/128290691-e2eb875c-775e-4f4d-a2f4-15134044b4bb.png -.. |n004-img1| image:: https://user-images.githubusercontent.com/36741649/128489933-bf215a3f-06fa-4918-8833-cb0bf9fb1cc7.jpg - :target: https://user-images.githubusercontent.com/36741649/128489933-bf215a3f-06fa-4918-8833-cb0bf9fb1cc7.jpg -.. |n101-img1| image:: https://user-images.githubusercontent.com/15709723/127779167-9d33dcc6-9001-4d74-a089-8248310092fe.png - :target: https://user-images.githubusercontent.com/15709723/127779167-9d33dcc6-9001-4d74-a089-8248310092fe.png -.. |n102-img1| image:: https://user-images.githubusercontent.com/15709723/127779246-32e7392b-2d72-4a7d-b871-e79e7bfdd2e9.png - :target: https://user-images.githubusercontent.com/15709723/127779246-32e7392b-2d72-4a7d-b871-e79e7bfdd2e9.png -.. |n103-img1| image:: https://user-images.githubusercontent.com/15709723/127779326-dc14653f-a960-4877-b529-86908a6f2a61.png - :target: https://user-images.githubusercontent.com/15709723/127779326-dc14653f-a960-4877-b529-86908a6f2a61.png -.. |n104-img1| image:: https://user-images.githubusercontent.com/10940214/157541917-c5455105-b0d9-4adf-91a7-fbc142918015.png - :target: https://user-images.githubusercontent.com/10940214/157541917-c5455105-b0d9-4adf-91a7-fbc142918015.png -.. |n201-img1| image:: https://user-images.githubusercontent.com/15709723/127752390-f6aa371f-31b5-4846-84b9-18dd4f662406.gif - :target: https://user-images.githubusercontent.com/15709723/127752390-f6aa371f-31b5-4846-84b9-18dd4f662406.gif -.. |n202i-img1| image:: https://user-images.githubusercontent.com/36741649/170005347-e4409f9e-ec34-416b-afdf-a9d8185929ca.jpg - :width: 70 - :target: https://user-images.githubusercontent.com/36741649/170005347-e4409f9e-ec34-416b-afdf-a9d8185929ca.jpg -.. |n202i-img2| image:: https://user-images.githubusercontent.com/36741649/170005347-e4409f9e-ec34-416b-afdf-a9d8185929ca.jpg - :width: 130 - :target: https://user-images.githubusercontent.com/36741649/170005347-e4409f9e-ec34-416b-afdf-a9d8185929ca.jpg -.. |n202v-img1| image:: https://user-images.githubusercontent.com/15709723/127269258-a8e2c03e-731e-4317-b5b2-ed2ee767ff5e.gif - :target: https://user-images.githubusercontent.com/15709723/127269258-a8e2c03e-731e-4317-b5b2-ed2ee767ff5e.gif - :width: 80 -.. |n202v-img2| image:: https://user-images.githubusercontent.com/15709723/127269258-a8e2c03e-731e-4317-b5b2-ed2ee767ff5e.gif - :width: 125 - :target: https://user-images.githubusercontent.com/15709723/127269258-a8e2c03e-731e-4317-b5b2-ed2ee767ff5e.gif -.. |n203-img1| image:: https://user-images.githubusercontent.com/91237924/166135627-194405b0-6c25-4fd8-9ad1-83fb3a00a081.jpg - :target: https://user-images.githubusercontent.com/91237924/166135627-194405b0-6c25-4fd8-9ad1-83fb3a00a081.jpg -.. |n204-img1| image:: https://user-images.githubusercontent.com/61357777/223854308-d1ac4a39-cc0c-4618-9e4f-d9d4d8b991e8.jpg - :target: https://user-images.githubusercontent.com/61357777/223854308-d1ac4a39-cc0c-4618-9e4f-d9d4d8b991e8.jpg -.. |n205-img1| image:: https://user-images.githubusercontent.com/15709723/125184237-f4b6cd00-e1d0-11eb-8e3b-d92c9a728372.png - :target: https://user-images.githubusercontent.com/15709723/125184237-f4b6cd00-e1d0-11eb-8e3b-d92c9a728372.png -.. |n206-img1| image:: https://user-images.githubusercontent.com/15709723/127788059-1f069ae1-8705-4972-b50e-6314a6f36632.jpeg - :target: https://user-images.githubusercontent.com/15709723/127788059-1f069ae1-8705-4972-b50e-6314a6f36632.jpeg -.. |n206-img2| image:: https://user-images.githubusercontent.com/15709723/125184441-b4584e80-e1d2-11eb-8964-d8131cd97409.png - :target: https://user-images.githubusercontent.com/15709723/125184441-b4584e80-e1d2-11eb-8964-d8131cd97409.png -.. |n207-img1| image:: https://user-images.githubusercontent.com/36741649/127170593-86976dc3-e5e4-40be-b0a6-206379cd7df5.jpg - :target: https://user-images.githubusercontent.com/36741649/127170593-86976dc3-e5e4-40be-b0a6-206379cd7df5.jpg - :width: 70 -.. |n207-img2| image:: https://user-images.githubusercontent.com/36741649/127170593-86976dc3-e5e4-40be-b0a6-206379cd7df5.jpg - :target: https://user-images.githubusercontent.com/36741649/127170593-86976dc3-e5e4-40be-b0a6-206379cd7df5.jpg - :width: 130 -.. |n208-img1| image:: https://user-images.githubusercontent.com/36741649/129315292-a37266dc-dfb2-4749-bca5-2ac9c1e93d64.jpg - :target: https://user-images.githubusercontent.com/36741649/129315292-a37266dc-dfb2-4749-bca5-2ac9c1e93d64.jpg -.. |n209-img1| image:: https://user-images.githubusercontent.com/36741649/132660640-da2211ec-c389-450e-8980-32a75ed14abb.png - :target: https://user-images.githubusercontent.com/36741649/132660640-da2211ec-c389-450e-8980-32a75ed14abb.png -.. |n210-img1| image:: https://github.com/facebookresearch/SlowFast/raw/main/demo/ava_demo.gif - :target: https://github.com/facebookresearch/SlowFast/raw/main/demo/ava_demo.gif -.. |n211-img1| image:: https://user-images.githubusercontent.com/36741649/140987347-279de058-55d7-4772-b013-0f2b12deaa61.png - :target: https://user-images.githubusercontent.com/36741649/140987347-279de058-55d7-4772-b013-0f2b12deaa61.png -.. |n213-img1| image:: https://user-images.githubusercontent.com/4547501/152571639-ace628b2-e3d2-433e-8c28-9a5546d76a86.gif - :target: https://user-images.githubusercontent.com/4547501/152571639-ace628b2-e3d2-433e-8c28-9a5546d76a86.gif -.. |n212-img1| image:: https://user-images.githubusercontent.com/29454499/218432101-0bd0c424-e1d8-46af-ba1d-ee29ed6d1229.png - :target: https://user-images.githubusercontent.com/29454499/218432101-0bd0c424-e1d8-46af-ba1d-ee29ed6d1229.png -.. |n215-img1| image:: https://user-images.githubusercontent.com/4547501/167121084-ec58fbdb-b269-4de2-9d4c-253c5b95de1e.png - :target: https://user-images.githubusercontent.com/4547501/167121084-ec58fbdb-b269-4de2-9d4c-253c5b95de1e.png -.. |n216-img1| image:: https://user-images.githubusercontent.com/70456146/162759539-4a0a996f-dabe-40ea-98d6-85b4dce8511d.png - :target: https://user-images.githubusercontent.com/70456146/162759539-4a0a996f-dabe-40ea-98d6-85b4dce8511d.png -.. |n217-img1| image:: https://user-images.githubusercontent.com/41332813/158430181-05d07f42-cdb8-4b7a-b7dc-e7f7d9391877.png - :target: https://user-images.githubusercontent.com/41332813/158430181-05d07f42-cdb8-4b7a-b7dc-e7f7d9391877.png -.. |n218-img1| image:: https://user-images.githubusercontent.com/47499836/163544861-fa2ad64b-77df-4c16-b065-79183e8ed964.png - :target: https://user-images.githubusercontent.com/47499836/163544861-fa2ad64b-77df-4c16-b065-79183e8ed964.png -.. |n220-img1| image:: https://user-images.githubusercontent.com/51917466/254583163-3bb85143-627b-4f02-b628-7bef37823520.png - :target: https://user-images.githubusercontent.com/51917466/254583163-3bb85143-627b-4f02-b628-7bef37823520.png -.. |n222-img1| image:: https://user-images.githubusercontent.com/18904157/166343139-c6568e50-b856-4066-baef-5cdbd4e8bc18.png - :target: https://user-images.githubusercontent.com/18904157/166343139-c6568e50-b856-4066-baef-5cdbd4e8bc18.png -.. |n223-img1| image:: https://user-images.githubusercontent.com/91228207/185105225-0f996b0b-0a3b-4486-872d-364ac6fab68b.png - :target: https://user-images.githubusercontent.com/91228207/185105225-0f996b0b-0a3b-4486-872d-364ac6fab68b.png -.. |n224-img1| image:: https://user-images.githubusercontent.com/91237924/185752178-3882902c-907b-4614-b0e6-ea1de08bf3ef.png - :target: https://user-images.githubusercontent.com/91237924/185752178-3882902c-907b-4614-b0e6-ea1de08bf3ef.png -.. |n225-img1| image:: https://user-images.githubusercontent.com/15709723/200945747-1c584e5c-b3f2-4e43-b1c1-e35fd6edc2c3.png - :target: https://user-images.githubusercontent.com/15709723/200945747-1c584e5c-b3f2-4e43-b1c1-e35fd6edc2c3.png -.. |n226-img1| image:: https://raw.githubusercontent.com/WongKinYiu/yolov7/main/figure/horses_prediction.jpg - :target: https://raw.githubusercontent.com/WongKinYiu/yolov7/main/figure/horses_prediction.jpg -.. |n227-img1| image:: https://user-images.githubusercontent.com/29454499/204548693-1304ef33-c790-490d-8a8b-d5766acb6254.png - :target: https://user-images.githubusercontent.com/29454499/204548693-1304ef33-c790-490d-8a8b-d5766acb6254.png -.. |n228-img1| image:: https://camo.githubusercontent.com/8beb0eedc6a3bcafc397399d55a7e7da4184c1c799e6351a07a7c4aef534ffc4/68747470733a2f2f757365722d696d616765732e67697468756275736572636f6e74656e742e636f6d2f32393435343439392f3230373737333438312d64373763616366382d366364632d343736352d613331622d6131363639343736643632302e706e67 - :target: https://camo.githubusercontent.com/8beb0eedc6a3bcafc397399d55a7e7da4184c1c799e6351a07a7c4aef534ffc4/68747470733a2f2f757365722d696d616765732e67697468756275736572636f6e74656e742e636f6d2f32393435343439392f3230373737333438312d64373763616366382d366364632d343736352d613331622d6131363639343736643632302e706e67 -.. |n228-img2| image:: https://user-images.githubusercontent.com/29454499/207795060-437b42f9-e801-4332-a91f-cc26471e5ba2.png - :target: https://user-images.githubusercontent.com/29454499/207795060-437b42f9-e801-4332-a91f-cc26471e5ba2.png -.. |n229-img1| image:: https://user-images.githubusercontent.com/95271966/206130638-d9847414-357a-4c79-9ca7-76f4ae5a6d7f.png - :target: https://user-images.githubusercontent.com/95271966/206130638-d9847414-357a-4c79-9ca7-76f4ae5a6d7f.png -.. |n230-img1| image:: https://user-images.githubusercontent.com/29454499/212105105-f61c8aab-c1ff-40af-a33f-d0ed1fccc72e.png - :target: https://user-images.githubusercontent.com/29454499/212105105-f61c8aab-c1ff-40af-a33f-d0ed1fccc72e.png -.. |n231-img1| image:: https://user-images.githubusercontent.com/29454499/219943222-d46a2e2d-d348-4259-8431-37cf14727eda.png - :target: https://user-images.githubusercontent.com/29454499/219943222-d46a2e2d-d348-4259-8431-37cf14727eda.png -.. |n232-img1| image:: https://user-images.githubusercontent.com/29454499/218967961-9858efd5-fff2-4eb0-bde9-60852f4b31cb.JPG - :target: https://user-images.githubusercontent.com/29454499/218967961-9858efd5-fff2-4eb0-bde9-60852f4b31cb.JPG -.. |n233-img1| image:: https://user-images.githubusercontent.com/29454499/221933762-4ff32ecb-5e5d-4484-80e1-e9396cb3c511.png - :target: https://user-images.githubusercontent.com/29454499/221933762-4ff32ecb-5e5d-4484-80e1-e9396cb3c511.png -.. |n234-img1| image:: https://github.com/facebookresearch/encodec/raw/main/thumbnail.png - :target: https://github.com/facebookresearch/encodec/raw/main/thumbnail.png -.. |n235-img1| image:: https://user-images.githubusercontent.com/29454499/224541412-9d13443e-0e42-43f2-8210-aa31820c5b44.png - :target: https://user-images.githubusercontent.com/29454499/224541412-9d13443e-0e42-43f2-8210-aa31820c5b44.png -.. |n236-img1| image:: https://user-images.githubusercontent.com/29454499/228882108-25c1f65d-4c23-4e1d-8ba4-f6164280a3e3.gif - :target: https://user-images.githubusercontent.com/29454499/228882108-25c1f65d-4c23-4e1d-8ba4-f6164280a3e3.gif -.. |n236-img4| image:: https://user-images.githubusercontent.com/1720147/229231281-065641fd-53ea-4940-8c52-b1eebfbaa7fa.png - :target: https://user-images.githubusercontent.com/1720147/229231281-065641fd-53ea-4940-8c52-b1eebfbaa7fa.png -.. |n237-img1| image:: https://user-images.githubusercontent.com/29454499/231468849-1cd11e68-21e2-44ed-8088-b792ef50c32d.png - :target: https://user-images.githubusercontent.com/29454499/231468849-1cd11e68-21e2-44ed-8088-b792ef50c32d.png -.. |n238-img1| image:: https://user-images.githubusercontent.com/29454499/241643886-dfcf3c48-8d50-4730-ae28-a21595d9504f.png - :target: https://user-images.githubusercontent.com/29454499/241643886-dfcf3c48-8d50-4730-ae28-a21595d9504f.png -.. |n239-img1| image:: https://user-images.githubusercontent.com/29454499/240364108-39868933-d221-41e6-9b2e-dac1b14ef32f.png - :target: https://user-images.githubusercontent.com/29454499/240364108-39868933-d221-41e6-9b2e-dac1b14ef32f.png -.. |n240-img1| image:: https://user-images.githubusercontent.com/29454499/237291423-022f07d2-966b-4be2-9a1c-98f1cf0691c2.png - :target: https://user-images.githubusercontent.com/29454499/237291423-022f07d2-966b-4be2-9a1c-98f1cf0691c2.png -.. |n241-img1| image:: https://user-images.githubusercontent.com/29454499/244291912-bbc6e08c-c0a9-41fe-bc2d-5f89a0d2463b.png - :target: https://user-images.githubusercontent.com/29454499/244291912-bbc6e08c-c0a9-41fe-bc2d-5f89a0d2463b.png -.. |n243-img1| image:: https://user-images.githubusercontent.com/29454499/251085926-14045ebc-273b-4ccb-b04f-82a3f7811b87.gif - :target: https://user-images.githubusercontent.com/29454499/251085926-14045ebc-273b-4ccb-b04f-82a3f7811b87.gif -.. |n245-img1| image:: https://user-images.githubusercontent.com/80534358/224564463-ee686386-f846-4b2b-91af-7163586014b7.png - :target: https://user-images.githubusercontent.com/80534358/224564463-ee686386-f846-4b2b-91af-7163586014b7.png -.. |n246-img1| image:: https://raw.githubusercontent.com/alexklwong/void-dataset/master/figures/void_samples.png - :target: https://raw.githubusercontent.com/alexklwong/void-dataset/master/figures/void_samples.png -.. |n248-img1| image:: https://user-images.githubusercontent.com/29454499/258651862-28b63016-c5ff-4263-9da8-73ca31100165.jpeg - :target: https://user-images.githubusercontent.com/29454499/258651862-28b63016-c5ff-4263-9da8-73ca31100165.jpeg -.. |n249-img1| image:: https://camo.githubusercontent.com/f46c3642d3266e9d56d8ea8a943e67825597de3ff51698703ea2ddcb1086e541/68747470733a2f2f6769746875622d70726f64756374696f6e2d757365722d61737365742d3632313064662e73332e616d617a6f6e6177732e636f6d2f37363136313235362f3235383634303731332d66383031626430392d653932372d346162642d616132662d3939393064653463616638642e676966 - :target: https://camo.githubusercontent.com/f46c3642d3266e9d56d8ea8a943e67825597de3ff51698703ea2ddcb1086e541/68747470733a2f2f6769746875622d70726f64756374696f6e2d757365722d61737365742d3632313064662e73332e616d617a6f6e6177732e636f6d2f37363136313235362f3235383634303731332d66383031626430392d653932372d346162642d616132662d3939393064653463616638642e676966 -.. |n250-img1| image:: https://user-images.githubusercontent.com/76463150/260439306-81c81c8d-1f9c-41d0-b881-9491766def8e.png - :target: https://user-images.githubusercontent.com/76463150/260439306-81c81c8d-1f9c-41d0-b881-9491766def8e.png -.. |n251-img1| image:: https://user-images.githubusercontent.com/29454499/260904650-274fc2f9-24d2-46a3-ac3d-d660ec3c9a19.png - :target: https://user-images.githubusercontent.com/29454499/260904650-274fc2f9-24d2-46a3-ac3d-d660ec3c9a19.png -.. |n253-img1| image:: https://user-images.githubusercontent.com/76161256/261102399-500956d5-4aac-4710-a77c-4df34bcda3be.gif - :target: https://user-images.githubusercontent.com/76161256/261102399-500956d5-4aac-4710-a77c-4df34bcda3be.gif -.. |n254-img1| image:: https://user-images.githubusercontent.com/29454499/255799218-611e7189-8979-4ef5-8a80-5a75e0136b50.png - :target: https://user-images.githubusercontent.com/29454499/255799218-611e7189-8979-4ef5-8a80-5a75e0136b50.png -.. |n256-img1| image:: https://user-images.githubusercontent.com/29454499/269278630-9a770279-0045-480e-95f2-1a2f2d0a5115.png - :target: https://user-images.githubusercontent.com/29454499/269278630-9a770279-0045-480e-95f2-1a2f2d0a5115.png -.. |n257-img1| image:: https://raw.githubusercontent.com/haotian-liu/LLaVA/main/images/llava_logo.png - :target: https://raw.githubusercontent.com/haotian-liu/LLaVA/main/images/llava_logo.png -.. |n258-img1| image:: https://user-images.githubusercontent.com/76161256/275485611-0ecf621f-b544-44ae-8258-8a49be704989.png - :target: https://user-images.githubusercontent.com/76161256/275485611-0ecf621f-b544-44ae-8258-8a49be704989.png -.. |n259-img1| image:: https://user-images.githubusercontent.com/29454499/274927904-cd734349-9954-4656-ab96-08a903e846ef.png - :target: https://user-images.githubusercontent.com/29454499/274927904-cd734349-9954-4656-ab96-08a903e846ef.png -.. |n261-img1| image:: https://user-images.githubusercontent.com/26833433/248551984-d98f0f6d-7535-45d0-b380-2e1440b52ad7.jpg - :target: https://user-images.githubusercontent.com/26833433/248551984-d98f0f6d-7535-45d0-b380-2e1440b52ad7.jpg -.. |n263-img1| image:: https://user-images.githubusercontent.com/29454499/277367065-13a8f622-8ea7-4d12-b3f8-241d4499305e.png - :target: https://user-images.githubusercontent.com/29454499/277367065-13a8f622-8ea7-4d12-b3f8-241d4499305e.png -.. |n301-img1| image:: https://user-images.githubusercontent.com/15709723/127779607-8fa34947-1c35-4260-8d04-981c41a2a2cc.png - :target: https://user-images.githubusercontent.com/15709723/127779607-8fa34947-1c35-4260-8d04-981c41a2a2cc.png -.. |n401-img1| image:: https://user-images.githubusercontent.com/4547501/141471665-82b28c86-cf64-4bfe-98b3-c314658f2d96.gif - :target: https://user-images.githubusercontent.com/4547501/141471665-82b28c86-cf64-4bfe-98b3-c314658f2d96.gif -.. |n402-img1| image:: https://user-images.githubusercontent.com/4547501/138267961-41d754e7-59db-49f6-b700-63c3a636fad7.gif - :target: https://user-images.githubusercontent.com/4547501/138267961-41d754e7-59db-49f6-b700-63c3a636fad7.gif -.. |n403-img1| image:: https://user-images.githubusercontent.com/10940214/151552326-642d6e49-f5a0-4fc1-bf14-ae3f457e1fec.gif - :target: https://user-images.githubusercontent.com/10940214/151552326-642d6e49-f5a0-4fc1-bf14-ae3f457e1fec.gif -.. |n404-img1| image:: https://user-images.githubusercontent.com/109281183/203772234-f17a0875-b068-43ef-9e77-403462fde1f5.gif - :target: https://user-images.githubusercontent.com/109281183/203772234-f17a0875-b068-43ef-9e77-403462fde1f5.gif -.. |n405-img1| image:: https://raw.githubusercontent.com/yoyowz/classification/master/images/paddleocr.gif - :target: https://raw.githubusercontent.com/yoyowz/classification/master/images/paddleocr.gif -.. |n406-img1| image:: https://user-images.githubusercontent.com/42672437/183292131-576cc05a-a724-472c-8dc9-f6bc092190bf.gif - :target: https://user-images.githubusercontent.com/42672437/183292131-576cc05a-a724-472c-8dc9-f6bc092190bf.gif -.. |n407-img1| image:: https://user-images.githubusercontent.com/91237924/210479548-b70dbbaa-5948-4e49-b48e-6cb6613226da.gif - :target: https://user-images.githubusercontent.com/91237924/210479548-b70dbbaa-5948-4e49-b48e-6cb6613226da.gif -.. |launch-jupyter| image:: https://user-images.githubusercontent.com/15709723/120527271-006fd200-c38f-11eb-9935-2d36d50bab9f.gif - :target: https://user-images.githubusercontent.com/15709723/120527271-006fd200-c38f-11eb-9935-2d36d50bab9f.gif - -.. |Apache License Version 2.0| image:: https://img.shields.io/badge/license-Apache_2.0-green.svg - :target: https://github.com/openvinotoolkit/openvino_notebooks/blob/main/LICENSE - -.. |nbval| image:: https://github.com/openvinotoolkit/openvino_notebooks/actions/workflows/nbval.yml/badge.svg - :target: https://github.com/openvinotoolkit/openvino_notebooks/actions/workflows/nbval.yml?query=branch%3Amain -.. |nbval-docker| image:: https://github.com/openvinotoolkit/openvino_notebooks/actions/workflows/docker.yml/badge.svg - :target: https://github.com/openvinotoolkit/openvino_notebooks/actions/workflows/nbval.yml?query=branch%3Amain - -.. |n001| image:: https://mybinder.org/badge_logo.svg - :target: https://mybinder.org/v2/gh/openvinotoolkit/openvino_notebooks/HEAD?filepath=notebooks%2F001-hello-world%2F001-hello-world.ipynb -.. |n002| image:: https://mybinder.org/badge_logo.svg - :target: https://mybinder.org/v2/gh/openvinotoolkit/openvino_notebooks/HEAD?filepath=notebooks%2F002-openvino-api%2F002-openvino-api.ipynb -.. |c002| image:: https://camo.githubusercontent.com/84f0493939e0c4de4e6dbe113251b4bfb5353e57134ffd9fcab6b8714514d4d1/68747470733a2f2f636f6c61622e72657365617263682e676f6f676c652e636f6d2f6173736574732f636f6c61622d62616467652e737667 - :width: 109 - :target: https://colab.research.google.com/github/openvinotoolkit/openvino_notebooks/blob/main/notebooks/002-openvino-api/002-openvino-api.ipynb -.. |n003| image:: https://mybinder.org/badge_logo.svg - :target: https://mybinder.org/v2/gh/openvinotoolkit/openvino_notebooks/HEAD?filepath=notebooks%2F003-hello-segmentation%2F003-hello-segmentation.ipynb -.. |n004| image:: https://mybinder.org/badge_logo.svg - :target: https://mybinder.org/v2/gh/openvinotoolkit/openvino_notebooks/HEAD?filepath=notebooks%2F004-hello-detection%2F004-hello-detection.ipynb -.. |n101| image:: https://mybinder.org/badge_logo.svg - :target: https://mybinder.org/v2/gh/openvinotoolkit/openvino_notebooks/HEAD?filepath=notebooks%2F101-tensorflow-to-openvino%2F101-tensorflow-to-openvino.ipynb -.. |c102| image:: https://camo.githubusercontent.com/84f0493939e0c4de4e6dbe113251b4bfb5353e57134ffd9fcab6b8714514d4d1/68747470733a2f2f636f6c61622e72657365617263682e676f6f676c652e636f6d2f6173736574732f636f6c61622d62616467652e737667 - :width: 109 - :target: https://colab.research.google.com/github/openvinotoolkit/openvino_notebooks/blob/main/notebooks/102-pytorch-to-openvino/102-pytorch-to-openvino.ipynb -.. |n103| image:: https://mybinder.org/badge_logo.svg - :target: https://mybinder.org/v2/gh/openvinotoolkit/openvino_notebooks/HEAD?filepath=notebooks%2F103-paddle-onnx-to-openvino-classification%2F103-paddle-onnx-to-openvino-classification.ipynb -.. |n104| image:: https://mybinder.org/badge_logo.svg - :target: https://mybinder.org/v2/gh/openvinotoolkit/openvino_notebooks/HEAD?filepath=notebooks%2F104-model-tools%2F104-model-tools.ipynb -.. |n106| image:: https://mybinder.org/badge_logo.svg - :target: https://mybinder.org/v2/gh/openvinotoolkit/openvino_notebooks/HEAD?labpath=notebooks%2F106-auto-device%2F106-auto-device.ipynb -.. |c107| image:: https://camo.githubusercontent.com/84f0493939e0c4de4e6dbe113251b4bfb5353e57134ffd9fcab6b8714514d4d1/68747470733a2f2f636f6c61622e72657365617263682e676f6f676c652e636f6d2f6173736574732f636f6c61622d62616467652e737667 - :width: 109 - :target: https://colab.research.google.com/github/openvinotoolkit/openvino_notebooks/blob/main/notebooks/107-speech-recognition-quantization/107-speech-recognition-quantization-data2vec.ipynb -.. |n110| image:: https://mybinder.org/badge_logo.svg - :target: https://mybinder.org/v2/gh/openvinotoolkit/openvino_notebooks/HEAD?filepath=notebooks%2F110-ct-segmentation-quantize%2F110-ct-scan-live-inference.ipynb -.. |c111| image:: https://camo.githubusercontent.com/84f0493939e0c4de4e6dbe113251b4bfb5353e57134ffd9fcab6b8714514d4d1/68747470733a2f2f636f6c61622e72657365617263682e676f6f676c652e636f6d2f6173736574732f636f6c61622d62616467652e737667 - :width: 109 - :target: https://colab.research.google.com/github/openvinotoolkit/openvino_notebooks/blob/main/notebooks/111-yolov5-quantization-migration/111-yolov5-quantization-migration.ipynb -.. |n113| image:: https://mybinder.org/badge_logo.svg - :target: https://mybinder.org/v2/gh/openvinotoolkit/openvino_notebooks/HEAD?labpath=notebooks%2F113-image-classification-quantization%2F113-image-classification-quantization.ipynb -.. |n115| image:: https://mybinder.org/badge_logo.svg - :target: https://mybinder.org/v2/gh/openvinotoolkit/openvino_notebooks/HEAD?labpath=notebooks%2F115-async-api%2F115-async-api.ipynb -.. |c115| image:: https://camo.githubusercontent.com/84f0493939e0c4de4e6dbe113251b4bfb5353e57134ffd9fcab6b8714514d4d1/68747470733a2f2f636f6c61622e72657365617263682e676f6f676c652e636f6d2f6173736574732f636f6c61622d62616467652e737667 - :width: 109 - :target: https://colab.research.google.com/github/openvinotoolkit/openvino_notebooks/blob/main/notebooks/115-async-api/115-async-api.ipynb -.. |c116| image:: https://camo.githubusercontent.com/84f0493939e0c4de4e6dbe113251b4bfb5353e57134ffd9fcab6b8714514d4d1/68747470733a2f2f636f6c61622e72657365617263682e676f6f676c652e636f6d2f6173736574732f636f6c61622d62616467652e737667 - :width: 109 - :target: https://colab.research.google.com/github/openvinotoolkit/openvino_notebooks/blob/main/notebooks/116-sparsity-optimization/116-sparsity-optimization.ipynb -.. |c119| image:: https://camo.githubusercontent.com/84f0493939e0c4de4e6dbe113251b4bfb5353e57134ffd9fcab6b8714514d4d1/68747470733a2f2f636f6c61622e72657365617263682e676f6f676c652e636f6d2f6173736574732f636f6c61622d62616467652e737667 - :width: 109 - :target: https://colab.research.google.com/github/openvinotoolkit/openvino_notebooks/blob/main/notebooks/119-tflite-to-openvino/119-tflite-to-openvino.ipynb -.. |n120| image:: https://mybinder.org/badge_logo.svg - :target: https://mybinder.org/v2/gh/openvinotoolkit/openvino_notebooks/HEAD?filepath=notebooks%2F120-tensorflow-object-detection-to-openvino%2F120-tensorflow-object-detection-to-openvino.ipynb -.. |c120| image:: https://camo.githubusercontent.com/84f0493939e0c4de4e6dbe113251b4bfb5353e57134ffd9fcab6b8714514d4d1/68747470733a2f2f636f6c61622e72657365617263682e676f6f676c652e636f6d2f6173736574732f636f6c61622d62616467652e737667 - :width: 109 - :target: https://colab.research.google.com/github/openvinotoolkit/openvino_notebooks/blob/main/notebooks/120-tensorflow-object-detection-to-openvino/120-tensorflow-object-detection-to-openvino.ipynb -.. |n120a| image:: https://mybinder.org/badge_logo.svg - :target: https://mybinder.org/v2/gh/openvinotoolkit/openvino_notebooks/HEAD?filepath=notebooks%2F120-tensorflow-object-detection-to-openvino%2F120-tensorflow-instance-segmentation-to-openvino.ipynb -.. |c120a| image:: https://camo.githubusercontent.com/84f0493939e0c4de4e6dbe113251b4bfb5353e57134ffd9fcab6b8714514d4d1/68747470733a2f2f636f6c61622e72657365617263682e676f6f676c652e636f6d2f6173736574732f636f6c61622d62616467652e737667 - :width: 109 - :target: https://colab.research.google.com/github/openvinotoolkit/openvino_notebooks/blob/main/notebooks/120-tensorflow-object-detection-to-openvino/120-tensorflow-instance-segmentation-to-openvino.ipynb -.. |n121| image:: https://mybinder.org/badge_logo.svg - :target: https://mybinder.org/v2/gh/openvinotoolkit/openvino_notebooks/HEAD?filepath=notebooks%2F121-convert-to-openvino%2F121-convert-to-openvino.ipynb -.. |c121| image:: https://camo.githubusercontent.com/84f0493939e0c4de4e6dbe113251b4bfb5353e57134ffd9fcab6b8714514d4d1/68747470733a2f2f636f6c61622e72657365617263682e676f6f676c652e636f6d2f6173736574732f636f6c61622d62616467652e737667 - :width: 109 - :target: https://colab.research.google.com/github/openvinotoolkit/openvino_notebooks/blob/main/notebooks/121-convert-to-openvino/121-convert-to-openvino.ipynb -.. |n123| image:: https://mybinder.org/badge_logo.svg - :target: https://mybinder.org/v2/gh/openvinotoolkit/openvino_notebooks/HEAD?filepath=notebooks%2F123-detectron2-to-openvino%2F123-detectron2-to-openvino.ipynb -.. |c123| image:: https://camo.githubusercontent.com/84f0493939e0c4de4e6dbe113251b4bfb5353e57134ffd9fcab6b8714514d4d1/68747470733a2f2f636f6c61622e72657365617263682e676f6f676c652e636f6d2f6173736574732f636f6c61622d62616467652e737667 - :width: 109 - :target: https://colab.research.google.com/github/openvinotoolkit/openvino_notebooks/blob/main/notebooks/123-detectron2-to-openvino/123-detectron2-to-openvino.ipynb -.. |n124| image:: https://mybinder.org/badge_logo.svg - :target: https://mybinder.org/v2/gh/openvinotoolkit/openvino_notebooks/HEAD?filepath=notebooks%2F124-hugging-face-hub%2F124-hugging-face-hub.ipynb -.. |c124| image:: https://camo.githubusercontent.com/84f0493939e0c4de4e6dbe113251b4bfb5353e57134ffd9fcab6b8714514d4d1/68747470733a2f2f636f6c61622e72657365617263682e676f6f676c652e636f6d2f6173736574732f636f6c61622d62616467652e737667 - :width: 109 - :target: https://colab.research.google.com/github/openvinotoolkit/openvino_notebooks/blob/main/notebooks/124-hugging-face-hub/124-hugging-face-hub.ipynb -.. |n126| image:: https://mybinder.org/badge_logo.svg - :target: https://mybinder.org/v2/gh/openvinotoolkit/openvino_notebooks/HEAD?filepath=notebooks%2F126-tensorflow-hub%2F126-tensorflow-hub.ipynb -.. |c126| image:: https://camo.githubusercontent.com/84f0493939e0c4de4e6dbe113251b4bfb5353e57134ffd9fcab6b8714514d4d1/68747470733a2f2f636f6c61622e72657365617263682e676f6f676c652e636f6d2f6173736574732f636f6c61622d62616467652e737667 - :width: 109 - :target: https://colab.research.google.com/github/openvinotoolkit/openvino_notebooks/blob/main/notebooks/126-tensorflow-hub/126-tensorflow-hub.ipynb -.. |n209| image:: https://mybinder.org/badge_logo.svg - :target: https://mybinder.org/v2/gh/openvinotoolkit/openvino_notebooks/HEAD?filepath=notebooks%2F209-handwritten-ocr%2F209-handwritten-ocr.ipynb -.. |n201| image:: https://mybinder.org/badge_logo.svg - :target: https://mybinder.org/v2/gh/openvinotoolkit/openvino_notebooks/HEAD?filepath=notebooks%2F201-vision-monodepth%2F201-vision-monodepth.ipynb -.. |c201| image:: https://camo.githubusercontent.com/84f0493939e0c4de4e6dbe113251b4bfb5353e57134ffd9fcab6b8714514d4d1/68747470733a2f2f636f6c61622e72657365617263682e676f6f676c652e636f6d2f6173736574732f636f6c61622d62616467652e737667 - :width: 109 - :target: https://colab.research.google.com/github/openvinotoolkit/openvino_notebooks/blob/main/notebooks/201-vision-monodepth/201-vision-monodepth.ipynb -.. |n202i| image:: https://mybinder.org/badge_logo.svg - :target: https://mybinder.org/v2/gh/openvinotoolkit/openvino_notebooks/HEAD?filepath=notebooks%2F202-vision-superresolution%2F202-vision-superresolution-image.ipynb -.. |c202i| image:: https://camo.githubusercontent.com/84f0493939e0c4de4e6dbe113251b4bfb5353e57134ffd9fcab6b8714514d4d1/68747470733a2f2f636f6c61622e72657365617263682e676f6f676c652e636f6d2f6173736574732f636f6c61622d62616467652e737667 - :width: 109 - :target: https://colab.research.google.com/github/openvinotoolkit/openvino_notebooks/blob/main/notebooks/202-vision-superresolution/202-vision-superresolution-image.ipynb -.. |n202v| image:: https://mybinder.org/badge_logo.svg - :target: https://mybinder.org/v2/gh/openvinotoolkit/openvino_notebooks/HEAD?filepath=notebooks%2F202-vision-superresolution%2F202-vision-superresolution-video.ipynb -.. |c202v| image:: https://camo.githubusercontent.com/84f0493939e0c4de4e6dbe113251b4bfb5353e57134ffd9fcab6b8714514d4d1/68747470733a2f2f636f6c61622e72657365617263682e676f6f676c652e636f6d2f6173736574732f636f6c61622d62616467652e737667 - :width: 109 - :target: https://colab.research.google.com/github/openvinotoolkit/openvino_notebooks/blob/main/notebooks/202-vision-superresolution/202-vision-superresolution-video.ipynb -.. |n203| image:: https://mybinder.org/badge_logo.svg - :target: https://mybinder.org/v2/gh/openvinotoolkit/openvino_notebooks/HEAD?labpath=notebooks%2F203-meter-reader%2F203-meter-reader.ipynb -.. |c204| image:: https://camo.githubusercontent.com/84f0493939e0c4de4e6dbe113251b4bfb5353e57134ffd9fcab6b8714514d4d1/68747470733a2f2f636f6c61622e72657365617263682e676f6f676c652e636f6d2f6173736574732f636f6c61622d62616467652e737667 - :width: 109 - :target: https://colab.research.google.com/github/openvinotoolkit/openvino_notebooks/blob/main/204-segmenter-semantic-segmentation/204-segmenter-semantic-segmentation.ipynb -.. |n205| image:: https://mybinder.org/badge_logo.svg - :target: https://mybinder.org/v2/gh/openvinotoolkit/openvino_notebooks/HEAD?filepath=notebooks%2F205-vision-background-removal%2F205-vision-background-removal.ipynb -.. |c205| image:: https://camo.githubusercontent.com/84f0493939e0c4de4e6dbe113251b4bfb5353e57134ffd9fcab6b8714514d4d1/68747470733a2f2f636f6c61622e72657365617263682e676f6f676c652e636f6d2f6173736574732f636f6c61622d62616467652e737667 - :width: 109 - :target: https://colab.research.google.com/github/openvinotoolkit/openvino_notebooks/blob/main/notebooks/205-vision-background-removal/205-vision-background-removal.ipynb -.. |c206| image:: https://camo.githubusercontent.com/84f0493939e0c4de4e6dbe113251b4bfb5353e57134ffd9fcab6b8714514d4d1/68747470733a2f2f636f6c61622e72657365617263682e676f6f676c652e636f6d2f6173736574732f636f6c61622d62616467652e737667 - :width: 109 - :target: https://colab.research.google.com/github/openvinotoolkit/openvino_notebooks/blob/main/notebooks/206-vision-paddlegan-anime/206-vision-paddlegan-anime.ipynb -.. |n210| image:: https://mybinder.org/badge_logo.svg - :target: https://mybinder.org/v2/gh/openvinotoolkit/openvino_notebooks/HEAD?filepath=notebooks%2F210-slowfast-video-recognition%2F210-slowfast-video-recognition.ipynb -.. |n211| image:: https://mybinder.org/badge_logo.svg - :target: https://mybinder.org/v2/gh/openvinotoolkit/openvino_notebooks/HEAD?filepath=notebooks%2F211-speech-to-text%2F211-speech-to-text.ipynb -.. |n213| image:: https://mybinder.org/badge_logo.svg - :target: https://mybinder.org/v2/gh/openvinotoolkit/openvino_notebooks/HEAD?filepath=notebooks%2F213-question-answering%2F213-question-answering.ipynb -.. |n215| image:: https://mybinder.org/badge_logo.svg - :target: https://mybinder.org/v2/gh/openvinotoolkit/openvino_notebooks/HEAD?labpath=notebooks%2F215-image-inpainting%2F215-image-inpainting.ipynb -.. |n216| image:: https://mybinder.org/badge_logo.svg - :target: https://mybinder.org/v2/gh/openvinotoolkit/openvino_notebooks/HEAD?labpath=notebooks%2F216-license-plate-recognition%2F216-license-plate-recognition.ipynb -.. |n217| image:: https://mybinder.org/badge_logo.svg - :target: https://mybinder.org/v2/gh/ThanosM97/openvino_notebooks/217-vision-deblur?labpath=notebooks%2F217-vision-deblur%2F217-vision-deblur.ipynb -.. |n218| image:: https://mybinder.org/badge_logo.svg - :target: https://mybinder.org/v2/gh/openvinotoolkit/openvino_notebooks/HEAD?labpath=notebooks%2F218-vehicle-detection-and-recognition%2F218-vehicle-detection-and-recognition.ipynb -.. |n219| image:: https://mybinder.org/badge_logo.svg - :target: https://mybinder.org/v2/gh/openvinotoolkit/openvino_notebooks/HEAD?labpath=notebooks%2F219-knowledge-graphs-conve%2F219-knowledge-graphs-conve.ipynb -.. |n220| image:: https://mybinder.org/badge_logo.svg - :target: https://mybinder.org/v2/gh/openvinotoolkit/openvino_notebooks/HEAD?labpath=notebooks%2F220-cross-lingual-books-alignment%2F220-cross-lingual-books-alignment.ipynb -.. |c220| image:: https://camo.githubusercontent.com/84f0493939e0c4de4e6dbe113251b4bfb5353e57134ffd9fcab6b8714514d4d1/68747470733a2f2f636f6c61622e72657365617263682e676f6f676c652e636f6d2f6173736574732f636f6c61622d62616467652e737667 - :width: 109 - :target: https://colab.research.google.com/github/openvinotoolkit/openvino_notebooks/blob/main/notebooks/220-cross-lingual-books-alignment/220-cross-lingual-books-alignment.ipynb -.. |n221| image:: https://mybinder.org/badge_logo.svg - :target: https://mybinder.org/v2/gh/openvinotoolkit/openvino_notebooks/HEAD?labpath=notebooks%2F221-machine-translation%2F221-machine-translation.ipynb -.. |c221| image:: https://camo.githubusercontent.com/84f0493939e0c4de4e6dbe113251b4bfb5353e57134ffd9fcab6b8714514d4d1/68747470733a2f2f636f6c61622e72657365617263682e676f6f676c652e636f6d2f6173736574732f636f6c61622d62616467652e737667 - :width: 109 - :target: https://colab.research.google.com/github/openvinotoolkit/openvino_notebooks/blob/main/notebooks/221-machine-translation/221-machine-translation.ipynb -.. |n222| image:: https://mybinder.org/badge_logo.svg - :target: https://mybinder.org/v2/gh/openvinotoolkit/openvino_notebooks/HEAD?labpath=notebooks%2F222-vision-image-colorization%2F222-vision-image-colorization.ipynb -.. |c223| image:: https://camo.githubusercontent.com/84f0493939e0c4de4e6dbe113251b4bfb5353e57134ffd9fcab6b8714514d4d1/68747470733a2f2f636f6c61622e72657365617263682e676f6f676c652e636f6d2f6173736574732f636f6c61622d62616467652e737667 - :width: 109 - :target: https://colab.research.google.com/github/openvinotoolkit/openvino_notebooks/blob/main/notebooks/223-text-prediction/223-text-prediction.ipynb -.. |n224| image:: https://mybinder.org/badge_logo.svg - :target: https://mybinder.org/v2/gh/openvinotoolkit/openvino_notebooks/HEAD?labpath=notebooks%2F224-3D-segmentation-point-clouds%2F224-3D-segmentation-point-clouds.ipynb -.. |c224| image:: https://camo.githubusercontent.com/84f0493939e0c4de4e6dbe113251b4bfb5353e57134ffd9fcab6b8714514d4d1/68747470733a2f2f636f6c61622e72657365617263682e676f6f676c652e636f6d2f6173736574732f636f6c61622d62616467652e737667 - :width: 109 - :target: https://colab.research.google.com/github/openvinotoolkit/openvino_notebooks/blob/main/notebooks/224-3D-segmentation-point-clouds/224-3D-segmentation-point-clouds.ipynb -.. |c225| image:: https://camo.githubusercontent.com/84f0493939e0c4de4e6dbe113251b4bfb5353e57134ffd9fcab6b8714514d4d1/68747470733a2f2f636f6c61622e72657365617263682e676f6f676c652e636f6d2f6173736574732f636f6c61622d62616467652e737667 - :width: 109 - :target: https://colab.research.google.com/github/openvinotoolkit/openvino_notebooks/blob/main/notebooks/224-3D-segmentation-point-clouds/224-3D-segmentation-point-clouds.ipynb -.. |c227| image:: https://camo.githubusercontent.com/84f0493939e0c4de4e6dbe113251b4bfb5353e57134ffd9fcab6b8714514d4d1/68747470733a2f2f636f6c61622e72657365617263682e676f6f676c652e636f6d2f6173736574732f636f6c61622d62616467652e737667 - :width: 109 - :target: https://colab.research.google.com/github/openvinotoolkit/openvino_notebooks/blob/main/notebooks/227-whisper-subtitles-generation/227-whisper-subtitles-generation.ipynb -.. |n229| image:: https://mybinder.org/badge_logo.svg - :target: https://mybinder.org/v2/gh/openvinotoolkit/openvino_notebooks/HEAD?labpath=notebooks%2F229-distilbert-sequence-classification%2F229-distilbert-sequence-classification.ipynb -.. |c230a| image:: https://camo.githubusercontent.com/84f0493939e0c4de4e6dbe113251b4bfb5353e57134ffd9fcab6b8714514d4d1/68747470733a2f2f636f6c61622e72657365617263682e676f6f676c652e636f6d2f6173736574732f636f6c61622d62616467652e737667 - :width: 109 - :target: https://colab.research.google.com/github/openvinotoolkit/openvino_notebooks/blob/main/notebooks/230-yolov8-optimization/230-yolov8-instance-segmentation.ipynb -.. |c230b| image:: https://camo.githubusercontent.com/84f0493939e0c4de4e6dbe113251b4bfb5353e57134ffd9fcab6b8714514d4d1/68747470733a2f2f636f6c61622e72657365617263682e676f6f676c652e636f6d2f6173736574732f636f6c61622d62616467652e737667 - :width: 109 - :target: https://colab.research.google.com/github/openvinotoolkit/openvino_notebooks/blob/main/notebooks/230-yolov8-optimization/230-yolov8-keypoint-detection.ipynb -.. |c230c| image:: https://camo.githubusercontent.com/84f0493939e0c4de4e6dbe113251b4bfb5353e57134ffd9fcab6b8714514d4d1/68747470733a2f2f636f6c61622e72657365617263682e676f6f676c652e636f6d2f6173736574732f636f6c61622d62616467652e737667 - :width: 109 - :target: https://colab.research.google.com/github/openvinotoolkit/openvino_notebooks/blob/main/notebooks/230-yolov8-optimization/230-yolov8-object-detection.ipynb -.. |c232| image:: https://camo.githubusercontent.com/84f0493939e0c4de4e6dbe113251b4bfb5353e57134ffd9fcab6b8714514d4d1/68747470733a2f2f636f6c61622e72657365617263682e676f6f676c652e636f6d2f6173736574732f636f6c61622d62616467652e737667 - :width: 109 - :target: https://colab.research.google.com/github/openvinotoolkit/openvino_notebooks/blob/main/notebooks/232-clip-language-saliency-map/232-clip-language-saliency-map.ipynb -.. |n243| image:: https://mybinder.org/badge_logo.svg - :target: https://mybinder.org/v2/gh/openvinotoolkit/openvino_notebooks/HEAD?filepath=notebooks%2F243-tflite-selfie-segmentation%2F243-tflite-selfie-segmentation.ipynb -.. |c243| image:: https://camo.githubusercontent.com/84f0493939e0c4de4e6dbe113251b4bfb5353e57134ffd9fcab6b8714514d4d1/68747470733a2f2f636f6c61622e72657365617263682e676f6f676c652e636f6d2f6173736574732f636f6c61622d62616467652e737667 - :target: https://colab.research.google.com/github/openvinotoolkit/openvino_notebooks/blob/main/notebooks/243-tflite-selfie-segmentation/243-tflite-selfie-segmentation.ipynb -.. |c244| image:: https://camo.githubusercontent.com/84f0493939e0c4de4e6dbe113251b4bfb5353e57134ffd9fcab6b8714514d4d1/68747470733a2f2f636f6c61622e72657365617263682e676f6f676c652e636f6d2f6173736574732f636f6c61622d62616467652e737667 - :target: https://colab.research.google.com/github/openvinotoolkit/openvino_notebooks/blob/main/notebooks/244-named-entity-recognition/244-named-entity-recognition.ipynb -.. |n247| image:: https://mybinder.org/badge_logo.svg - :target: https://mybinder.org/v2/gh/openvinotoolkit/openvino_notebooks/HEAD?filepath=notebooks%2F247-code-language-id%2F247-code-language-id.ipynb -.. |n250| image:: https://mybinder.org/badge_logo.svg - :target: https://mybinder.org/v2/gh/openvinotoolkit/openvino_notebooks/HEAD?filepath=notebooks%2F250-music-generation%2F250-music-generation.ipynb -.. |c250| image:: https://camo.githubusercontent.com/84f0493939e0c4de4e6dbe113251b4bfb5353e57134ffd9fcab6b8714514d4d1/68747470733a2f2f636f6c61622e72657365617263682e676f6f676c652e636f6d2f6173736574732f636f6c61622d62616467652e737667 - :target: https://colab.research.google.com/github/openvinotoolkit/openvino_notebooks/blob/main/notebooks/250-music-generation/250-music-generation.ipynb -.. |c251| image:: https://camo.githubusercontent.com/84f0493939e0c4de4e6dbe113251b4bfb5353e57134ffd9fcab6b8714514d4d1/68747470733a2f2f636f6c61622e72657365617263682e676f6f676c652e636f6d2f6173736574732f636f6c61622d62616467652e737667 - :target: https://colab.research.google.com/github/openvinotoolkit/openvino_notebooks/blob/main/notebooks/251-tiny-sd-image-generation/251-tiny-sd-image-generation.ipynb -.. |c260| image:: https://camo.githubusercontent.com/84f0493939e0c4de4e6dbe113251b4bfb5353e57134ffd9fcab6b8714514d4d1/68747470733a2f2f636f6c61622e72657365617263682e676f6f676c652e636f6d2f6173736574732f636f6c61622d62616467652e737667 - :target: https://colab.research.google.com/github/openvinotoolkit/openvino_notebooks/blob/main/notebooks/260-pix2struct-docvqa/260-pix2struct-docvqa.ipynb -.. |n261| image:: https://mybinder.org/badge_logo.svg - :target: https://mybinder.org/v2/gh/openvinotoolkit/openvino_notebooks/HEAD?filepath=notebooks%2F261-fast-segment-anything%2F261-fast-segment-anything.ipynb -.. |c261| image:: https://camo.githubusercontent.com/84f0493939e0c4de4e6dbe113251b4bfb5353e57134ffd9fcab6b8714514d4d1/68747470733a2f2f636f6c61622e72657365617263682e676f6f676c652e636f6d2f6173736574732f636f6c61622d62616467652e737667 - :width: 109 - :target: https://colab.research.google.com/github/openvinotoolkit/openvino_notebooks/blob/main/notebooks/261-fast-segment-anything/261-fast-segment-anything.ipynb -.. |c262| image:: https://camo.githubusercontent.com/84f0493939e0c4de4e6dbe113251b4bfb5353e57134ffd9fcab6b8714514d4d1/68747470733a2f2f636f6c61622e72657365617263682e676f6f676c652e636f6d2f6173736574732f636f6c61622d62616467652e737667 - :width: 109 - :target: https://colab.research.google.com/github/openvinotoolkit/openvino_notebooks/blob/main/notebooks/262-softvc-voice-conversion/262-softvc-voice-conversion.ipynb -.. |c305| image:: https://camo.githubusercontent.com/84f0493939e0c4de4e6dbe113251b4bfb5353e57134ffd9fcab6b8714514d4d1/68747470733a2f2f636f6c61622e72657365617263682e676f6f676c652e636f6d2f6173736574732f636f6c61622d62616467652e737667 - :width: 109 - :target: https://colab.research.google.com/github/openvinotoolkit/openvino_notebooks/blob/main/notebooks/305-tensorflow-quantization-aware-training/305-tensorflow-quantization-aware-training.ipynb -.. |n401| image:: https://mybinder.org/badge_logo.svg - :target: https://mybinder.org/v2/gh/openvinotoolkit/openvino_notebooks/HEAD?filepath=notebooks%2F401-object-detection-webcam%2F401-object-detection.ipynb -.. |c401| image:: https://camo.githubusercontent.com/84f0493939e0c4de4e6dbe113251b4bfb5353e57134ffd9fcab6b8714514d4d1/68747470733a2f2f636f6c61622e72657365617263682e676f6f676c652e636f6d2f6173736574732f636f6c61622d62616467652e737667 - :width: 109 - :target: https://colab.research.google.com/github/openvinotoolkit/openvino_notebooks/blob/main/notebooks/401-object-detection-webcam/401-object-detection.ipynb -.. |n402| image:: https://mybinder.org/badge_logo.svg - :target: https://mybinder.org/v2/gh/openvinotoolkit/openvino_notebooks/HEAD?filepath=notebooks%2F402-pose-estimation-webcam%2F402-pose-estimation.ipynb -.. |n403| image:: https://mybinder.org/badge_logo.svg - :target: https://mybinder.org/v2/gh/openvinotoolkit/openvino_notebooks/HEAD?filepath=notebooks%2F403-action-recognition-webcam%2F403-action-recognition-webcam.ipynb -.. |n404| image:: https://mybinder.org/badge_logo.svg - :target: https://mybinder.org/v2/gh/openvinotoolkit/openvino_notebooks/HEAD?filepath=notebooks%2F404-style-transfer-webcam%2F404-style-transfer.ipynb -.. |c404| image:: https://camo.githubusercontent.com/84f0493939e0c4de4e6dbe113251b4bfb5353e57134ffd9fcab6b8714514d4d1/68747470733a2f2f636f6c61622e72657365617263682e676f6f676c652e636f6d2f6173736574732f636f6c61622d62616467652e737667 - :width: 109 - :target: https://colab.research.google.com/github/openvinotoolkit/openvino_notebooks/blob/main/notebooks/404-style-transfer-webcam/404-style-transfer.ipynb -.. |n405| image:: https://mybinder.org/badge_logo.svg - :target: https://mybinder.org/v2/gh/openvinotoolkit/openvino_notebooks/HEAD?filepath=notebooks%2F405-paddle-ocr-webcam%2F405-paddle-ocr-webcam.ipynb -.. |c405| image:: https://camo.githubusercontent.com/84f0493939e0c4de4e6dbe113251b4bfb5353e57134ffd9fcab6b8714514d4d1/68747470733a2f2f636f6c61622e72657365617263682e676f6f676c652e636f6d2f6173736574732f636f6c61622d62616467652e737667 - :width: 109 - :target: https://colab.research.google.com/github/openvinotoolkit/openvino_notebooks/blob/main/notebooks/405-paddle-ocr-webcam/405-paddle-ocr-webcam.ipynb -.. |n406| image:: https://mybinder.org/badge_logo.svg - :target: https://mybinder.org/v2/gh/openvinotoolkit/openvino_notebooks.git/main?labpath=notebooks%2F406-3D-pose-estimation-webcam%2F406-3D-pose-estimation.ipynb -.. |n407| image:: https://mybinder.org/badge_logo.svg - :target: https://mybinder.org/v2/gh/openvinotoolkit/openvino_notebooks/HEAD?filepath=notebooks%2F407-person-tracking-webcam%2F407-person-tracking.ipynb -.. |c407| image:: https://camo.githubusercontent.com/84f0493939e0c4de4e6dbe113251b4bfb5353e57134ffd9fcab6b8714514d4d1/68747470733a2f2f636f6c61622e72657365617263682e676f6f676c652e636f6d2f6173736574732f636f6c61622d62616467652e737667 - :width: 109 - :target: https://colab.research.google.com/github/openvinotoolkit/openvino_notebooks/blob/main/notebooks/407-person-tracking-webcam/407-person-tracking.ipynb - .. |binder logo| image:: https://mybinder.org/badge_logo.svg :alt: Binder button .. |colab logo| image:: https://camo.githubusercontent.com/84f0493939e0c4de4e6dbe113251b4bfb5353e57134ffd9fcab6b8714514d4d1/68747470733a2f2f636f6c61622e72657365617263682e676f6f676c652e636f6d2f6173736574732f636f6c61622d62616467652e737667 :width: 109 :alt: Google Colab button + @endsphinxdirective diff --git a/docs/articles_en/learn_openvino/tutorials/notebooks_section_0.md b/docs/articles_en/learn_openvino/tutorials/notebooks_section_0.md new file mode 100644 index 00000000000000..d3adae6a77ceda --- /dev/null +++ b/docs/articles_en/learn_openvino/tutorials/notebooks_section_0.md @@ -0,0 +1,37 @@ +# First steps with OpenVINO {#notebooks_section_0_get_started} + +@sphinxdirective + +.. toctree:: + :maxdepth: 1 + :hidden: + + +Brief tutorials that demonstrate how to use Python API for inference in OpenVINO. + + +.. showcase:: + :title: 004-hello-detection + :img: https://user-images.githubusercontent.com/36741649/128489933-bf215a3f-06fa-4918-8833-cb0bf9fb1cc7.jpg + + Text detection with OpenVINO. + +.. showcase:: + :title: 003-hello-segmentation + :img: https://user-images.githubusercontent.com/15709723/128290691-e2eb875c-775e-4f4d-a2f4-15134044b4bb.png + + Semantic segmentation with OpenVINO. + +.. showcase:: + :title: 002-openvino-api + :img: _static/images/notebook_eye.png + + Learn the OpenVINO Python API. + +.. showcase:: + :title: 001-hello-world + :img: https://user-images.githubusercontent.com/36741649/127170593-86976dc3-e5e4-40be-b0a6-206379cd7df5.jpg + + Classify an image with OpenVINO. + +@endsphinxdirective diff --git a/docs/articles_en/learn_openvino/tutorials/notebooks_section_1.md b/docs/articles_en/learn_openvino/tutorials/notebooks_section_1.md new file mode 100644 index 00000000000000..62840083409be1 --- /dev/null +++ b/docs/articles_en/learn_openvino/tutorials/notebooks_section_1.md @@ -0,0 +1,193 @@ +# Convert & Optimize {#notebooks_section_1_convert__optimize} + +@sphinxdirective + +.. toctree:: + :maxdepth: 1 + :hidden: + + +Tutorials that explain how to optimize and quantize models with OpenVINO tools. + +.. showcase:: + :title: 126-tensorflow-hub + :img: _static/images/notebook_eye.png + + Convert TensorFlow Hub models to OpenVINO Intermediate Representation (IR). + +.. showcase:: + :title: 125-convnext-classification + :img: _static/images/notebook_eye.png + + Classification with ConvNeXt and OpenVINO. + +.. showcase:: + :title: 124-hugging-face-hub + :img: _static/images/notebook_eye.png + + Hugging Face Model Hub with OpenVINO™. + +.. showcase:: + :title: 123-detectron2-to-openvino + :img: _static/images/notebook_eye.png + + Convert Detectron2 Models to OpenVINO™. + +.. showcase:: + :title: 122-yolov8-quantization-with-accuracy-control + :img: _static/images/notebook_eye.png + + Convert and Optimize YOLOv8 with OpenVINO™. + +.. showcase:: + :title: 122-speech-recognition-quantization-wav2vec2 + :img: _static/images/notebook_eye.png + + Quantize Speech Recognition Models with accuracy control using NNCF PTQ API. + +.. showcase:: + :title: 121-convert-to-openvino + :img: _static/images/notebook_eye.png + + Learn OpenVINO model conversion API. + +.. showcase:: + :title: 120-tensorflow-object-detection-to-openvino + :img: _static/images/notebook_eye.png + + Convert TensorFlow Object Detection models to OpenVINO IR. + +.. showcase:: + :title: 119-tflite-to-openvino + :img: _static/images/notebook_eye.png + + Convert TensorFlow Lite models to OpenVINO IR. + +.. showcase:: + :title: 118-optimize-preprocessing + :img: _static/images/notebook_eye.png + + Improve performance of image preprocessing step. + +.. showcase:: + :title: 117-model-server + :img: _static/images/notebook_eye.png + + Improve performance of sparse Transformer models. + +.. showcase:: + :title: 116-sparsity-optimization + :img: _static/images/notebook_eye.png + + Improve performance of sparse Transformer models. + +.. showcase:: + :title: 115-async-api + :img: _static/images/notebook_eye.png + + Use asynchronous execution to improve data pipelining. + +.. showcase:: + :title: 113-image-classification-quantization + :img: _static/images/notebook_eye.png + + Quantize MobileNet image classification. + +.. showcase:: + :title: 112-pytorch-post-training-quantization-nncf + :img: _static/images/notebook_eye.png + + Use Neural Network Compression Framework (NNCF) to quantize PyTorch model in post-training mode (without model fine-tuning). + +.. showcase:: + :title: 111-yolov5-quantization-migration + :img: _static/images/notebook_eye.png + + Migrate YOLOv5 POT API based quantization pipeline on Neural Network Compression Framework (NNCF). + +.. showcase:: + :title: 110-ct-segmentation-quantize-nncf + :img: _static/images/notebook_eye.png + + Quantize a kidney segmentation model and show live inference. + +.. showcase:: + :title: 110-ct-scan-live-inference + :img: _static/images/notebook_eye.png + + Live inference of a kidney segmentation model and benchmark CT-scan data with OpenVINO. + +.. showcase:: + :title: 109-throughput-tricks + :img: _static/images/notebook_eye.png + + Performance tricks for throughput mode in OpenVINO™. + +.. showcase:: + :title: 109-latency-tricks + :img: _static/images/notebook_eye.png + + Performance tricks for latency mode in OpenVINO™. + +.. showcase:: + :title: 108-gpu-device + :img: _static/images/notebook_eye.png + + Working with GPUs in OpenVINO™ + +.. showcase:: + :title: 107-speech-recognition-quantization-data2vec + :img: _static/images/notebook_eye.png + + Optimize and quantize a pre-trained Data2Vec speech model. + +.. showcase:: + :title: 107-speech-recognition-quantization-wav2vec2 + :img: _static/images/notebook_eye.png + + Optimize and quantize a pre-trained Wav2Vec2 speech model. + +.. showcase:: + :title: 106-auto-device + :img: _static/images/notebook_eye.png + + Demonstrates how to use AUTO Device. + +.. showcase:: + :title: 105-language-quantize-bert + :img: _static/images/notebook_eye.png + + Optimize and quantize a pre-trained BERT model. + +.. showcase:: + :title: 104-model-tools + :img: _static/images/notebook_eye.png + + Download, convert and benchmark models from Open Model Zoo. + +.. showcase:: + :title: 103-paddle-onnx-to-openvino + :img: https://user-images.githubusercontent.com/15709723/127779326-dc14653f-a960-4877-b529-86908a6f2a61.png + + Convert PaddlePaddle models to OpenVINO IR. + +.. showcase:: + :title: 102-pytorch-to-openvino + :img: https://user-images.githubusercontent.com/15709723/127779326-dc14653f-a960-4877-b529-86908a6f2a61.png + + Convert PyTorch models to OpenVINO IR. + +.. showcase:: + :title: 102-pytorch-onnx-to-openvino + :img: _static/images/notebook_eye.png + + Convert PyTorch models to OpenVINO IR. + +.. showcase:: + :title: 101-tensorflow-classification-to-openvino + :img: https://user-images.githubusercontent.com/36741649/127170593-86976dc3-e5e4-40be-b0a6-206379cd7df5.jpg + + Convert TensorFlow models to OpenVINO IR. + + +@endsphinxdirective diff --git a/docs/articles_en/learn_openvino/tutorials/notebooks_section_2.md b/docs/articles_en/learn_openvino/tutorials/notebooks_section_2.md new file mode 100644 index 00000000000000..db96d4f16dfcfe --- /dev/null +++ b/docs/articles_en/learn_openvino/tutorials/notebooks_section_2.md @@ -0,0 +1,452 @@ +# Model Demos {#notebooks_section_2_model_demos} + +@sphinxdirective + +.. toctree:: + :maxdepth: 1 + :hidden: + + +Demos that demonstrate inference on a particular model. + +.. showcase:: + :title: 269-film-slowmo + :img: https://github.com/googlestaging/frame-interpolation/raw/main/moment.gif + + Frame interpolation using FILM and OpenVINO. + +.. showcase:: + :title: 268-table-question-answering + :img: _static/images/notebook_eye.png + + Table Question Answering using TAPAS and OpenVINO. + +.. showcase:: + :title: 267-distil-whisper-asr + :img: _static/images/notebook_eye.png + + Automatic speech recognition using Distil-Whisper and OpenVINO. + +.. showcase:: + :title: 266-speculative-sampling + :img: _static/images/notebook_eye.png + + Text Generation via Speculative Sampling, KV Caching, and OpenVINO. + +.. showcase:: + :title: 265-wuerstchen-image-generation + :img: https://user-images.githubusercontent.com/76161256/277724498-6917c558-d74c-4cc9-b81a-679ce0a299ee.png + + Image generation with Würstchen and OpenVINO. + +.. showcase:: + :title: 264-qrcode-monster + :img: https://user-images.githubusercontent.com/76463150/278011447-1a5978c6-e7a0-4824-9318-a3d8f4912c47.png + + Generate creative QR codes with ControlNet QR Code Monster and OpenVINO. + +.. showcase:: + :title: 263-latent-consistency-models-image-generation + :img: https://user-images.githubusercontent.com/29454499/277367065-13a8f622-8ea7-4d12-b3f8-241d4499305e.png + + Image generation with Latent Consistency Model and OpenVINO. + +.. showcase:: + :title: 262-softvc-voice-conversion + :img: _static/images/notebook_eye.png + + SoftVC VITS Singing Voice Conversion and OpenVINO. + +.. showcase:: + :title: 261-fast-segment-anything + :img: https://user-images.githubusercontent.com/26833433/248551984-d98f0f6d-7535-45d0-b380-2e1440b52ad7.jpg + + Object segmentation with FastSAM and OpenVINO. + +.. showcase:: + :title: 259-decidiffusion-image-generation + :img: https://user-images.githubusercontent.com/29454499/274927904-cd734349-9954-4656-ab96-08a903e846ef.png + + Image generation with DeciDiffusion and OpenVINO. + +.. showcase:: + :title: 258-blip-diffusion-subject-generation + :img: https://user-images.githubusercontent.com/76161256/275485611-0ecf621f-b544-44ae-8258-8a49be704989.png + + Subject-driven image generation and editing using BLIP Diffusion and OpenVINO. + +.. showcase:: + :title: 257-llava-multimodal-chatbot + :img: https://raw.githubusercontent.com/haotian-liu/LLaVA/main/images/llava_logo.png + + Visual-language assistant with LLaVA and OpenVINO. + +.. showcase:: + :title: 256-bark-text-to-audio + :img: https://user-images.githubusercontent.com/29454499/269278630-9a770279-0045-480e-95f2-1a2f2d0a5115.png + + Text-to-speech generation using Bark and OpenVINO. + +.. showcase:: + :title: 254-llm-chatbot + :img: _static/images/notebook_eye.png + + Create an LLM-powered Chatbot using OpenVINO. + +.. showcase:: + :title: 253-zeroscope-text2video + :img: https://user-images.githubusercontent.com/76161256/261102399-500956d5-4aac-4710-a77c-4df34bcda3be.gif + + Text-to video synthesis with ZeroScope and OpenVINO™. + +.. showcase:: + :title: 252-fastcomposer-image-generation + :img: _static/images/notebook_eye.png + + Image generation with FastComposer and OpenVINO™. + +.. showcase:: + :title: 251-tiny-sd-image-generation + :img: https://user-images.githubusercontent.com/29454499/260904650-274fc2f9-24d2-46a3-ac3d-d660ec3c9a19.png + + Image Generation with Tiny-SD and OpenVINO™. + +.. showcase:: + :title: 250-music-generation + :img: https://user-images.githubusercontent.com/76463150/260439306-81c81c8d-1f9c-41d0-b881-9491766def8e.png + + Controllable Music Generation with MusicGen and OpenVINO™. + +.. showcase:: + :title: 249-oneformer-segmentation + :img: https://camo.githubusercontent.com/f46c3642d3266e9d56d8ea8a943e67825597de3ff51698703ea2ddcb1086e541/68747470733a2f2f6769746875622d70726f64756374696f6e2d757365722d61737365742d3632313064662e73332e616d617a6f6e6177732e636f6d2f37363136313235362f3235383634303731332d66383031626430392d653932372d346162642d616132662d3939393064653463616638642e676966 + + Universal segmentation with OneFormer and OpenVINO™. + +.. showcase:: + :title: 248-stable-diffusion-xl + :img: https://user-images.githubusercontent.com/29454499/258651862-28b63016-c5ff-4263-9da8-73ca31100165.jpeg + + Image generation with Stable Diffusion XL and OpenVINO™. + +.. showcase:: + :title: 247-code-language-id + :img: _static/images/notebook_eye.png + + Identify the programming language used in an arbitrary code snippet. + + +.. showcase:: + :title: 246-depth-estimation-videpth + :img: https://raw.githubusercontent.com/alexklwong/void-dataset/master/figures/void_samples.png + + Monocular Visual-Inertial Depth Estimation with OpenVINO™. + +.. showcase:: + :title: 245-typo-detector + :img: https://user-images.githubusercontent.com/80534358/224564463-ee686386-f846-4b2b-91af-7163586014b7.png + + English Typo Detection in sentences with OpenVINO™. + +.. showcase:: + :title: 244-named-entity-recognition + :img: _static/images/notebook_eye.png + + Named entity recognition with OpenVINO™. + +.. showcase:: + :title: 243-tflite-selfie-segmentation + :img: https://user-images.githubusercontent.com/29454499/251085926-14045ebc-273b-4ccb-b04f-82a3f7811b87.gif + + Selfie Segmentation using TFLite and OpenVINO™. + +.. showcase:: + :title: 242-freevc-voice-conversion + :img: https://user-images.githubusercontent.com/47499836/163544861-fa2ad64b-77df-4c16-b065-79183e8ed964.png + + High-Quality Text-Free One-Shot Voice Conversion with FreeVC and OpenVINO™ + +.. showcase:: + :title: 241-riffusion-text-to-music + :img: https://user-images.githubusercontent.com/29454499/244291912-bbc6e08c-c0a9-41fe-bc2d-5f89a0d2463b.png + + Text-to-Music generation using Riffusion and OpenVINO™. + +.. showcase:: + :title: 240-dolly-2-instruction-following + :img: https://github-production-user-asset-6210df.s3.amazonaws.com/29454499/237160118-e881f4a4-fcc8-427a-afe1-7dd80aebd66e.png + + Instruction following using Databricks Dolly 2.0 and OpenVINO™. + +.. showcase:: + :title: 239-image-bind-convert + :img: https://user-images.githubusercontent.com/29454499/240364108-39868933-d221-41e6-9b2e-dac1b14ef32f.png + + Binding multimodal data, using ImageBind and OpenVINO™. + +.. showcase:: + :title: 238-deep-floyd-if-optimize + :img: https://user-images.githubusercontent.com/29454499/241643886-dfcf3c48-8d50-4730-ae28-a21595d9504f.png + + Text-to-image generation with DeepFloyd IF and OpenVINO™. + +.. showcase:: + :title: 237-segment-anything + :img: https://user-images.githubusercontent.com/29454499/231468849-1cd11e68-21e2-44ed-8088-b792ef50c32d.png + + Prompt based object segmentation mask generation, using Segment Anything and OpenVINO™. + +.. showcase:: + :title: 236-stable-diffusion-v2-text-to-image + :img: https://user-images.githubusercontent.com/29454499/228882108-25c1f65d-4c23-4e1d-8ba4-f6164280a3e3.gif + + Text-to-image generation with Stable Diffusion v2 and OpenVINO™. + +.. showcase:: + :title: 236-stable-diffusion-v2-text-to-image-demo + :img: https://user-images.githubusercontent.com/1720147/229231281-065641fd-53ea-4940-8c52-b1eebfbaa7fa.png + + Stable Diffusion Text-to-Image Demo. + +.. showcase:: + :title: 236-stable-diffusion-v2-optimum-demo + :img: https://user-images.githubusercontent.com/1720147/229231281-065641fd-53ea-4940-8c52-b1eebfbaa7fa.png + + Stable Diffusion v2.1 using Optimum-Intel OpenVINO. + +.. showcase:: + :title: 236-stable-diffusion-v2-optimum-demo-comparison + :img: https://user-images.githubusercontent.com/1720147/229231281-065641fd-53ea-4940-8c52-b1eebfbaa7fa.png + + Stable Diffusion v2.1 using Optimum-Intel OpenVINO and multiple Intel Hardware + +.. showcase:: + :title: 236-stable-diffusion-v2-infinite-zoom + :img: https://user-images.githubusercontent.com/29454499/228882108-25c1f65d-4c23-4e1d-8ba4-f6164280a3e3.gif + + Text-to-image generation and Infinite Zoom with Stable Diffusion v2 and OpenVINO™. + +.. showcase:: + :title: 235-controlnet-stable-diffusion + :img: https://user-images.githubusercontent.com/29454499/224541412-9d13443e-0e42-43f2-8210-aa31820c5b44.png + + A text-to-image generation with ControlNet Conditioning and OpenVINO™. + +.. showcase:: + :title: 234-encodec-audio-compression + :img: https://github.com/facebookresearch/encodec/raw/main/thumbnail.png + + Audio compression with EnCodec and OpenVINO™. + +.. showcase:: + :title: 233-blip-convert + :img: https://user-images.githubusercontent.com/29454499/221933762-4ff32ecb-5e5d-4484-80e1-e9396cb3c511.png + + Visual Question Answering and Image Captioning using BLIP and OpenVINO. + +.. showcase:: + :title: 233-blip-optimize + :img: https://user-images.githubusercontent.com/29454499/221933762-4ff32ecb-5e5d-4484-80e1-e9396cb3c511.png + + Post-Training Quantization and Weights Compression of OpenAI BLIP model with NNCF. + +.. showcase:: + :title: 232-clip-language-saliency-map + :img: https://user-images.githubusercontent.com/29454499/218967961-9858efd5-fff2-4eb0-bde9-60852f4b31cb.JPG + + Language-visual saliency with CLIP and OpenVINO™. + +.. showcase:: + :title: 231-instruct-pix2pix-image-editing + :img: https://user-images.githubusercontent.com/29454499/219943222-d46a2e2d-d348-4259-8431-37cf14727eda.png + + Image editing with InstructPix2Pix. + +.. showcase:: + :title: 230-yolov8-optimization + :img: https://user-images.githubusercontent.com/29454499/212105105-f61c8aab-c1ff-40af-a33f-d0ed1fccc72e.png + + Optimize YOLOv8, using NNCF PTQ API. + +.. showcase:: + :title: 229-distilbert-sequence-classification + :img: https://user-images.githubusercontent.com/95271966/206130638-d9847414-357a-4c79-9ca7-76f4ae5a6d7f.png + + Sequence classification with OpenVINO. + +.. showcase:: + :title: 228-clip-zero-shot-quantize + :img: https://user-images.githubusercontent.com/29454499/207795060-437b42f9-e801-4332-a91f-cc26471e5ba2.png + + Post-Training Quantization of OpenAI CLIP model with NNCF. + +.. showcase:: + :title: 228-clip-zero-shot-convert + :img: https://camo.githubusercontent.com/8beb0eedc6a3bcafc397399d55a7e7da4184c1c799e6351a07a7c4aef534ffc4/68747470733a2f2f757365722d696d616765732e67697468756275736572636f6e74656e742e636f6d2f32393435343439392f3230373737333438312d64373763616366382d366364632d343736352d613331622d6131363639343736643632302e706e67 + + Zero-shot Image Classification with OpenAI CLIP and OpenVINO™. + +.. showcase:: + :title: 227-whisper-subtitles-generation + :img: https://user-images.githubusercontent.com/29454499/204548693-1304ef33-c790-490d-8a8b-d5766acb6254.png + + Generate subtitles for video with OpenAI Whisper and OpenVINO. + +.. showcase:: + :title: 226-yolov7-optimization + :img: https://raw.githubusercontent.com/WongKinYiu/yolov7/main/figure/horses_prediction.jpg + + Optimize YOLOv7, using NNCF PTQ API. + +.. showcase:: + :title: 225-stable-diffusion-text-to-image + :img: https://user-images.githubusercontent.com/15709723/200945747-1c584e5c-b3f2-4e43-b1c1-e35fd6edc2c3.png + + Text-to-image generation with Stable Diffusion method. + +.. showcase:: + :title: 224-3D-segmentation-point-clouds + :img: https://user-images.githubusercontent.com/91237924/185752178-3882902c-907b-4614-b0e6-ea1de08bf3ef.png + + Process point cloud data and run 3D Part Segmentation with OpenVINO. + +.. showcase:: + :title: 223-text-prediction + :img: https://user-images.githubusercontent.com/91228207/185105225-0f996b0b-0a3b-4486-872d-364ac6fab68b.png + + Use pre-trained models to perform text prediction on an input sequence. + +.. showcase:: + :title: 222-vision-image-colorization + :img: https://user-images.githubusercontent.com/18904157/166343139-c6568e50-b856-4066-baef-5cdbd4e8bc18.png + + Use pre-trained models to colorize black & white images using OpenVINO. + +.. showcase:: + :title: 221-machine-translation + :img: _static/images/notebook_eye.png + + Real-time translation from English to German. + +.. showcase:: + :title: 220-cross-lingual-books-alignment + :img: https://user-images.githubusercontent.com/51917466/254583163-3bb85143-627b-4f02-b628-7bef37823520.png + + Cross-lingual Books Alignment With Transformers and OpenVINO™ + +.. showcase:: + :title: 219-knowledge-graphs-conve + :img: _static/images/notebook_eye.png + + Optimize the knowledge graph embeddings model (ConvE) with OpenVINO. + +.. showcase:: + :title: 218-vehicle-detection-and-recognition + :img: https://user-images.githubusercontent.com/47499836/163544861-fa2ad64b-77df-4c16-b065-79183e8ed964.png + + Use pre-trained models to detect and recognize vehicles and their attributes with OpenVINO. + +.. showcase:: + :title: 217-vision-deblur + :img: https://user-images.githubusercontent.com/41332813/158430181-05d07f42-cdb8-4b7a-b7dc-e7f7d9391877.png + + Deblur images with DeblurGAN-v2. + +.. showcase:: + :title: 216-attention-center + :img: _static/images/notebook_eye.png + + The attention center model with OpenVINO™ + +.. showcase:: + :title: 215-image-inpainting + :img: https://user-images.githubusercontent.com/4547501/167121084-ec58fbdb-b269-4de2-9d4c-253c5b95de1e.png + + Fill missing pixels with image in-painting. + +.. showcase:: + :title: 214-grammar-correction + :img: _static/images/notebook_eye.png + + Grammatical error correction with OpenVINO. + +.. showcase:: + :title: 213-question-answering + :img: https://user-images.githubusercontent.com/4547501/152571639-ace628b2-e3d2-433e-8c28-9a5546d76a86.gif + + Answer your questions basing on a context. + +.. showcase:: + :title: 212-pyannote-speaker-diarization + :img: https://user-images.githubusercontent.com/29454499/218432101-0bd0c424-e1d8-46af-ba1d-ee29ed6d1229.png + + Run inference on speaker diarization pipeline. + +.. showcase:: + :title: 211-speech-to-text + :img: https://user-images.githubusercontent.com/36741649/140987347-279de058-55d7-4772-b013-0f2b12deaa61.png + + Run inference on speech-to-text recognition model. + +.. showcase:: + :title: 210-slowfast-video-recognition + :img: https://github.com/facebookresearch/SlowFast/raw/main/demo/ava_demo.gif + + Video Recognition using SlowFast and OpenVINO™ + +.. showcase:: + :title: 209-handwritten-ocrn + :img: https://user-images.githubusercontent.com/36741649/132660640-da2211ec-c389-450e-8980-32a75ed14abb.png + + OCR for handwritten simplified Chinese and Japanese. + +.. showcase:: + :title: 208-optical-character-recognition + :img: https://user-images.githubusercontent.com/36741649/129315292-a37266dc-dfb2-4749-bca5-2ac9c1e93d64.jpg + + Annotate text on images using text recognition resnet. + +.. showcase:: + :title: 207-vision-paddlegan-superresolution + :img: https://user-images.githubusercontent.com/36741649/127170593-86976dc3-e5e4-40be-b0a6-206379cd7df5.jpg + + Upscale small images with superresolution using a PaddleGAN model. + +.. showcase:: + :title: 206-vision-paddlegan-anime + :img: https://user-images.githubusercontent.com/15709723/127788059-1f069ae1-8705-4972-b50e-6314a6f36632.jpeg + + Turn an image into anime using a GAN. + +.. showcase:: + :title: 204-segmenter-semantic-segmentation + :img: https://user-images.githubusercontent.com/61357777/223854308-d1ac4a39-cc0c-4618-9e4f-d9d4d8b991e8.jpg + + Semantic segmentation with OpenVINO™ using Segmenter. + +.. showcase:: + :title: 203-meter-reader + :img: https://user-images.githubusercontent.com/91237924/166135627-194405b0-6c25-4fd8-9ad1-83fb3a00a081.jpg + + PaddlePaddle pre-trained models to read industrial meter’s value. + +.. showcase:: + :title: 202-vision-superresolution-video + :img: https://user-images.githubusercontent.com/15709723/127269258-a8e2c03e-731e-4317-b5b2-ed2ee767ff5e.gif + + Turn 360p into 1080p video using a super resolution model. + +.. showcase:: + :title: 202-vision-superresolution-image + :img: https://user-images.githubusercontent.com/36741649/170005347-e4409f9e-ec34-416b-afdf-a9d8185929ca.jpg + + Upscale raw images with a super resolution model. + +.. showcase:: + :title: 201-vision-monodepth + :img: https://user-images.githubusercontent.com/15709723/127752390-f6aa371f-31b5-4846-84b9-18dd4f662406.gif + + Monocular depth estimation with images and video. + + +@endsphinxdirective diff --git a/docs/articles_en/learn_openvino/tutorials/notebooks_section_3.md b/docs/articles_en/learn_openvino/tutorials/notebooks_section_3.md new file mode 100644 index 00000000000000..f7e41f1d7837a1 --- /dev/null +++ b/docs/articles_en/learn_openvino/tutorials/notebooks_section_3.md @@ -0,0 +1,30 @@ +# Model Training {#notebooks_section_3_model_training} + +@sphinxdirective + +.. toctree:: + :maxdepth: 1 + :hidden: + + +Tutorials that include code to train neural networks. + +.. showcase:: + :title: 305-tensorflow-quantization-aware-training + :img: _static/images/notebook_eye.png + + Use Neural Network Compression Framework (NNCF) to quantize TensorFlow model. + +.. showcase:: + :title: 302-pytorch-quantization-aware-training + :img: _static/images/notebook_eye.png + + Use Neural Network Compression Framework (NNCF) to quantize PyTorch model. + +.. showcase:: + :title: 301-tensorflow-training-openvino-nncf + :img: _static/images/notebook_eye.png + + Use Neural Network Compression Framework (NNCF) to quantize model from TensorFlow + +@endsphinxdirective diff --git a/docs/articles_en/learn_openvino/tutorials/notebooks_section_4.md b/docs/articles_en/learn_openvino/tutorials/notebooks_section_4.md new file mode 100644 index 00000000000000..0794635b6d91c8 --- /dev/null +++ b/docs/articles_en/learn_openvino/tutorials/notebooks_section_4.md @@ -0,0 +1,54 @@ +# Live Demos {#notebooks_section_4_live_demos} + +@sphinxdirective + +.. toctree:: + :maxdepth: 1 + :hidden: + + +Live inference demos that run on a webcam or video files. + +.. showcase:: + :title: 407-person-tracking + :img: https://user-images.githubusercontent.com/91237924/210479548-b70dbbaa-5948-4e49-b48e-6cb6613226da.gif + + Person tracking with a webcam or video file. + +.. showcase:: + :title: 406-3D-pose-estimation + :img: https://user-images.githubusercontent.com/42672437/183292131-576cc05a-a724-472c-8dc9-f6bc092190bf.gif + + 3D display of human pose estimation with a webcam or video file. + +.. showcase:: + :title: 405-paddle-ocr-webcam + :img: https://raw.githubusercontent.com/yoyowz/classification/master/images/paddleocr.gif + + OCR with a webcam or video file. + +.. showcase:: + :title: 404-style-transfer + :img: https://user-images.githubusercontent.com/109281183/203772234-f17a0875-b068-43ef-9e77-403462fde1f5.gif + + Style transfer with a webcam or video file. + +.. showcase:: + :title: 403-action-recognition-webcam + :img: https://user-images.githubusercontent.com/10940214/151552326-642d6e49-f5a0-4fc1-bf14-ae3f457e1fec.gif + + Human action recognition with a webcam or video file. + +.. showcase:: + :title: 402-pose-estimation + :img: https://user-images.githubusercontent.com/4547501/138267961-41d754e7-59db-49f6-b700-63c3a636fad7.gif + + Human pose estimation with a webcam or video file. + +.. showcase:: + :title: 401-object-detection + :img: https://user-images.githubusercontent.com/4547501/141471665-82b28c86-cf64-4bfe-98b3-c314658f2d96.gif + + Object detection with a webcam or video file. + +@endsphinxdirective diff --git a/docs/nbdoc/consts.py b/docs/nbdoc/consts.py index 779f0f8baf3a8a..5a53c137abeb7f 100644 --- a/docs/nbdoc/consts.py +++ b/docs/nbdoc/consts.py @@ -1,16 +1,17 @@ -notebooks_path = "notebooks" +from pathlib import Path +notebooks_path = "notebooks" repo_directory = "notebooks" - repo_owner = "openvinotoolkit" - repo_name = "openvino_notebooks" - repo_branch = "tree/main" - -artifacts_link = "http://repository.toolbox.iotg.sclab.intel.com/projects/ov-notebook/0.1.0-latest/20231030220807/dist/rst_files/" - +artifacts_link = "http://repository.toolbox.iotg.sclab.intel.com/projects/ov-notebook/0.1.0-latest/20231114220808/dist/rst_files/" blacklisted_extensions = ['.xml', '.bin'] +notebooks_repo = "https://github.com/openvinotoolkit/openvino_notebooks/blob/main/" +notebooks_binder = "https://mybinder.org/v2/gh/openvinotoolkit/openvino_notebooks/HEAD?filepath=" +notebooks_colab = "https://colab.research.google.com/github/openvinotoolkit/openvino_notebooks/blob/main/" +openvino_notebooks_ipynb_list = Path('../../docs/notebooks/all_notebooks_paths.txt').resolve(strict=True) + # Templates @@ -27,11 +28,11 @@ .. |binder_link| raw:: html - Binder + Binder .. |github_link| raw:: html - Github + Github \n """ @@ -48,11 +49,11 @@ .. |colab_link| raw:: html - Google Colab + Google Colab .. |github_link| raw:: html - Github + Github \n """ @@ -69,15 +70,15 @@ .. |binder_link| raw:: html - Binder + Binder .. |colab_link| raw:: html - Google Colab + Google Colab .. |github_link| raw:: html - Github + Github \n """ @@ -93,7 +94,7 @@ .. |github_link| raw:: html - Github + Github \n """ diff --git a/docs/nbdoc/nbdoc.py b/docs/nbdoc/nbdoc.py index a890c13c0f2f55..45bcfcada36694 100644 --- a/docs/nbdoc/nbdoc.py +++ b/docs/nbdoc/nbdoc.py @@ -16,8 +16,12 @@ no_binder_template, repo_directory, repo_name, - repo_branch, + openvino_notebooks_ipynb_list, repo_owner, + notebooks_repo, + notebooks_binder, + notebooks_colab, + ) from notebook import Notebook from section import Section @@ -25,10 +29,13 @@ from lxml import html from jinja2 import Template from urllib.request import urlretrieve -from requests import get +import requests import os +import re import sys +matching_notebooks_paths = [] + class NbTravisDownloader: @staticmethod @@ -59,7 +66,7 @@ def traverse(path: Path, link: str, blacklisted_extensions: list = blacklisted_e :type link: str """ path.mkdir(exist_ok=True) - page = get(link, verify=False).content + page = requests.get(link, verify=False).content tree = html.fromstring(page) # retrieve all links on page returning their content tree = tree.xpath('//a[@*]/@href') @@ -76,17 +83,25 @@ def traverse(path: Path, link: str, blacklisted_extensions: list = blacklisted_e class NbProcessor: def __init__(self, nb_path: str = notebooks_path): self.nb_path = nb_path - self.binder_data = { - "owner": repo_owner, - "repo": repo_name, - "folder": repo_directory, - "branch": repo_branch, - } - self.colab_data = { - "owner": repo_owner, - "repo": repo_name, - "folder": repo_directory, - } + + with open(openvino_notebooks_ipynb_list, 'r+', encoding='cp437') as ipynb_file: + openvino_notebooks_paths_list = ipynb_file.readlines() + + for notebook_name in [ + nb for nb in os.listdir(self.nb_path) if + verify_notebook_name(nb) + ]: + + if not os.path.exists(openvino_notebooks_ipynb_list): + raise FileNotFoundError("all_notebooks_paths.txt is not found") + else: + ipynb_list = [x for x in openvino_notebooks_paths_list if re.match("notebooks/[0-9]{3}.*\.ipynb$", x)] + notebook_with_ext = notebook_name[:-16] + ".ipynb" + matching_notebooks = [re.sub('[\n]', '', match) for match in ipynb_list if notebook_with_ext in match] + + if matching_notebooks is not None: + for n in matching_notebooks: + matching_notebooks_paths.append(n) def fetch_binder_list(self, file) -> list: """Function that fetches list of notebooks with binder buttons @@ -131,31 +146,48 @@ def add_binder(self, buttons_list: list, cbuttons_list: list, template_with_col :raises FileNotFoundError: In case of failure of adding content, error will appear """ - - for notebook in [ + for notebook_file, nb_path in zip([ nb for nb in os.listdir(self.nb_path) if verify_notebook_name(nb) - ]: - notebook_item = '-'.join(notebook.split('-')[:-2]) + ], matching_notebooks_paths): + + notebook_item = '-'.join(notebook_file.split('-')[:-2]) + + binder_data = { + "owner": repo_owner, + "repo": repo_name, + "folder": repo_directory, + "link_git": notebooks_repo + nb_path, + "link_binder": notebooks_binder + nb_path, + "link_colab ": notebooks_colab + nb_path, + } if notebook_item in buttons_list: template = template_with_colab_and_binder if notebook_item in cbuttons_list else template_with_binder else: template = template_with_colab if notebook_item in cbuttons_list else template_without_binder - button_text = create_content(template, self.binder_data, notebook) - if not add_content_below(button_text, f"{self.nb_path}/{notebook}"): + button_text = create_content(template, binder_data, notebook_file) + if not add_content_below(button_text, f"{self.nb_path}/{notebook_file}"): raise FileNotFoundError("Unable to modify file") -def add_glob_directive(tutorials_file): - with open(tutorials_file, 'r+', encoding='cp437') as mainfile: - readfile = mainfile.read() - if ':glob:' not in readfile: - add_glob = readfile\ - .replace(":hidden:\n", ":hidden:\n :glob:\n")\ - .replace("notebooks_installation\n", "notebooks_installation\n notebooks/*\n") - mainfile.seek(0) - mainfile.write(add_glob) - mainfile.truncate() + +def add_glob_directive(): + """This function modifies toctrees of the five node articles in tutorials + section. It adds the notebooks found in docs/notebooks directory to the menu. + """ + tutorials_path = Path('../../docs/articles_en/learn_openvino/tutorials').resolve(strict=True) + tutorials_files = [x for x in os.listdir(tutorials_path) if re.match("notebooks_section_[0-9]{1}\.md$", x)] + for tutorials_file in tutorials_files: + file_name = os.path.join(tutorials_path, tutorials_file) + with open(file_name, 'r+', encoding='cp437') as section_file: + section_number = ''.join(c for c in str(tutorials_file) if c.isdigit()) + read_file = section_file.read() + if ':glob:' not in read_file: + add_glob = read_file\ + .replace(":hidden:\n", ":hidden:\n :glob:\n :reversed:\n\n notebooks/" + section_number +"*\n") + section_file.seek(0) + section_file.write(add_glob) + section_file.truncate() def main(): parser = argparse.ArgumentParser() @@ -166,8 +198,7 @@ def main(): sourcedir = args.sourcedir outdir = args.outdir - main_tutorials_file = Path('../../docs/articles_en/learn_openvino/tutorials.md').resolve(strict=True) - add_glob_directive(main_tutorials_file) + add_glob_directive() if args.download: outdir.mkdir(parents=True, exist_ok=True) @@ -184,3 +215,4 @@ def main(): if __name__ == '__main__': main() + diff --git a/docs/notebooks/101-tensorflow-classification-to-openvino-with-output.rst b/docs/notebooks/101-tensorflow-classification-to-openvino-with-output.rst index 654f9b4a72b1c4..a3585b7972aaad 100644 --- a/docs/notebooks/101-tensorflow-classification-to-openvino-with-output.rst +++ b/docs/notebooks/101-tensorflow-classification-to-openvino-with-output.rst @@ -13,7 +13,6 @@ and do inference with a sample image. **Table of contents:** - - `Imports <#imports>`__ - `Settings <#settings>`__ - `Download model <#download-model>`__ @@ -23,7 +22,8 @@ and do inference with a sample image. - `Convert a TensorFlow Model to OpenVINO IR Format <#convert-a-tensorflow-model-to-openvino-ir-format>`__ -- `Test Inference on the Converted Model <#test-inference-on-the-converted-model>`__ +- `Test Inference on the Converted + Model <#test-inference-on-the-converted-model>`__ - `Load the Model <#load-the-model>`__ @@ -46,8 +46,10 @@ and do inference with a sample image. Note: you may need to restart the kernel to use updated packages. -Imports -------------------------------------------------- +Imports +------- + + .. code:: ipython3 @@ -72,14 +74,16 @@ Imports .. parsed-literal:: - 2023-10-30 22:29:25.672741: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`. - 2023-10-30 22:29:25.706557: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. + 2023-11-14 22:30:46.626761: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`. + 2023-11-14 22:30:46.661288: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. - 2023-10-30 22:29:26.218506: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT + 2023-11-14 22:30:47.171314: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT + + +Settings +-------- -Settings --------------------------------------------------- .. code:: ipython3 @@ -91,8 +95,10 @@ Settings ir_path = Path("model/v3-small_224_1.0_float.xml") -Download model --------------------------------------------------------- +Download model +-------------- + + Load model using `tf.keras.applications api `__ @@ -111,13 +117,30 @@ and save it to the disk. .. parsed-literal:: - 2023-10-30 22:29:27.284203: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1960] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform. - Skipping registering GPU devices... + 2023-11-14 22:30:50.201471: E tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:266] failed call to cuInit: CUDA_ERROR_COMPAT_NOT_SUPPORTED_ON_DEVICE: forward compatibility was attempted on non supported HW + 2023-11-14 22:30:50.201504: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:168] retrieving CUDA diagnostic information for host: iotg-dev-workstation-07 + 2023-11-14 22:30:50.201508: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:175] hostname: iotg-dev-workstation-07 + 2023-11-14 22:30:50.201646: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:199] libcuda reported version is: 470.223.2 + 2023-11-14 22:30:50.201662: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:203] kernel reported version is: 470.182.3 + 2023-11-14 22:30:50.201665: E tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:312] kernel version 470.182.3 does not match DSO version 470.223.2 -- cannot find working devices in this configuration .. parsed-literal:: WARNING:tensorflow:Compiled the loaded model, but the compiled metrics have yet to be built. `model.compile_metrics` will be empty until you train or evaluate the model. + + +.. parsed-literal:: + + 2023-11-14 22:30:54.370304: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'inputs' with dtype float and shape [?,1,1,1024] + [[{{node inputs}}]] + 2023-11-14 22:30:57.509389: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'inputs' with dtype float and shape [?,1,1,1024] + [[{{node inputs}}]] + WARNING:absl:Found untraced functions such as _jit_compiled_convolution_op, _jit_compiled_convolution_op, _jit_compiled_convolution_op, _jit_compiled_convolution_op, _jit_compiled_convolution_op while saving (showing 5 of 54). These functions will not be directly callable after loading. + + +.. parsed-literal:: + INFO:tensorflow:Assets written to: model/v3-small_224_1.0_float/assets @@ -126,11 +149,15 @@ and save it to the disk. INFO:tensorflow:Assets written to: model/v3-small_224_1.0_float/assets -Convert a Model to OpenVINO IR Format -------------------------------------------------------------------------------- +Convert a Model to OpenVINO IR Format +------------------------------------- + + + +Convert a TensorFlow Model to OpenVINO IR Format +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + -Convert a TensorFlow Model to OpenVINO IR Format -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Use the model conversion Python API to convert the TensorFlow model to OpenVINO IR. The ``ov.convert_model`` function accept path to saved @@ -158,19 +185,25 @@ models. Exporting TensorFlow model to IR... This may take a few minutes. -Test Inference on the Converted Model -------------------------------------------------------------------------------- +Test Inference on the Converted Model +------------------------------------- + + + +Load the Model +~~~~~~~~~~~~~~ + -Load the Model -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code:: ipython3 core = ov.Core() model = core.read_model(ir_path) -Select inference device ------------------------------------------------------------------ +Select inference device +----------------------- + + select device from dropdown list for running inference using OpenVINO @@ -200,8 +233,10 @@ select device from dropdown list for running inference using OpenVINO compiled_model = core.compile_model(model=model, device_name=device.value) -Get Model Information -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Get Model Information +~~~~~~~~~~~~~~~~~~~~~ + + .. code:: ipython3 @@ -209,8 +244,10 @@ Get Model Information output_key = compiled_model.output(0) network_input_shape = input_key.shape -Load an Image -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Load an Image +~~~~~~~~~~~~~ + + Load an image, resize it, and convert it to the input shape of the network. @@ -245,8 +282,10 @@ network. .. image:: 101-tensorflow-classification-to-openvino-with-output_files/101-tensorflow-classification-to-openvino-with-output_19_1.png -Do Inference -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Do Inference +~~~~~~~~~~~~ + + .. code:: ipython3 @@ -282,8 +321,10 @@ Do Inference -Timing ------------------------------------------------- +Timing +------ + + Measure the time it takes to do inference on thousand images. This gives an indication of performance. For more accurate benchmarking, use the @@ -312,5 +353,5 @@ performance. .. parsed-literal:: - IR model in OpenVINO Runtime/CPU: 0.0011 seconds per image, FPS: 928.36 + IR model in OpenVINO Runtime/CPU: 0.0010 seconds per image, FPS: 962.52 diff --git a/docs/notebooks/101-tensorflow-classification-to-openvino-with-output_files/index.html b/docs/notebooks/101-tensorflow-classification-to-openvino-with-output_files/index.html index e7a68470e62656..50f5758c278950 100644 --- a/docs/notebooks/101-tensorflow-classification-to-openvino-with-output_files/index.html +++ b/docs/notebooks/101-tensorflow-classification-to-openvino-with-output_files/index.html @@ -1,7 +1,7 @@ -Index of /projects/ov-notebook/0.1.0-latest/20231030220807/dist/rst_files/101-tensorflow-classification-to-openvino-with-output_files/ +Index of /projects/ov-notebook/0.1.0-latest/20231114220808/dist/rst_files/101-tensorflow-classification-to-openvino-with-output_files/ -

Index of /projects/ov-notebook/0.1.0-latest/20231030220807/dist/rst_files/101-tensorflow-classification-to-openvino-with-output_files/


../
-101-tensorflow-classification-to-openvino-with-..> 31-Oct-2023 00:35              387941
+

Index of /projects/ov-notebook/0.1.0-latest/20231114220808/dist/rst_files/101-tensorflow-classification-to-openvino-with-output_files/


../
+101-tensorflow-classification-to-openvino-with-..> 15-Nov-2023 00:43              387941
 

diff --git a/docs/notebooks/109-latency-tricks-with-output.rst b/docs/notebooks/109-latency-tricks-with-output.rst index 97938ff9ebd8c6..60ba8c7a7d2b62 100644 --- a/docs/notebooks/109-latency-tricks-with-output.rst +++ b/docs/notebooks/109-latency-tricks-with-output.rst @@ -27,8 +27,8 @@ The quantization and pre-post-processing API are not included here as they change the precision (quantization) or processing graph (prepostprocessor). You can find examples of how to apply them to optimize performance on OpenVINO IR files in -`111-detection-quantization <../111-detection-quantization>`__ and -`118-optimize-preprocessing <../118-optimize-preprocessing>`__. +`111-detection-quantization <111-detection-quantization-with-output.html>`__ and +`118-optimize-preprocessing <118-optimize-preprocessing-with-output.html>`__. |image0| @@ -44,7 +44,7 @@ optimize performance on OpenVINO IR files in result in different performance. A similar notebook focused on the throughput mode is available -`here <109-throughput-tricks.ipynb>`__. +`here <109-throughput-tricks-with-output.html>`__. **Table of contents:** @@ -596,9 +596,9 @@ Other tricks There are other tricks for performance improvement, such as quantization and pre-post-processing or dedicated to throughput mode. To get even more from your model, please visit -`111-detection-quantization <../111-detection-quantization>`__, -`118-optimize-preprocessing <../118-optimize-preprocessing>`__, and -`109-throughput-tricks <109-throughput-tricks.ipynb>`__. +`111-detection-quantization <111-detection-quantization-with-output.html>`__, +`118-optimize-preprocessing <118-optimize-preprocessing-with-output.html>`__, and +`109-throughput-tricks <109-throughput-tricks-with-output.html>`__. Performance comparison ---------------------------------------------------------------- diff --git a/docs/notebooks/109-throughput-tricks-with-output.rst b/docs/notebooks/109-throughput-tricks-with-output.rst index ff70c41d7c398a..446d6beac1be5d 100644 --- a/docs/notebooks/109-throughput-tricks-with-output.rst +++ b/docs/notebooks/109-throughput-tricks-with-output.rst @@ -24,8 +24,8 @@ The quantization and pre-post-processing API are not included here as they change the precision (quantization) or processing graph (prepostprocessor). You can find examples of how to apply them to optimize performance on OpenVINO IR files in -`111-detection-quantization <../111-detection-quantization>`__ and -`118-optimize-preprocessing <../118-optimize-preprocessing>`__. +`111-detection-quantization <111-detection-quantization-with-output.html>`__ and +`118-optimize-preprocessing `__. |image0| @@ -41,7 +41,7 @@ optimize performance on OpenVINO IR files in result in different performance. A similar notebook focused on the latency mode is available -`here <109-latency-tricks.ipynb>`__. +`here <109-latency-tricks-with-output.html>`__. **Table of contents:** @@ -642,9 +642,9 @@ options, quantization and pre-post-processing or dedicated to latency mode. To get even more from your model, please visit `advanced throughput options `__, -`109-latency-tricks <109-latency-tricks.ipynb>`__, -`111-detection-quantization <../111-detection-quantization>`__, and -`118-optimize-preprocessing <../118-optimize-preprocessing>`__. +`109-latency-tricks <109-latency-tricks-with-output.html>`__, +`111-detection-quantization <111-detection-quantization-with-output.html>`__, and +`118-optimize-preprocessing <118-optimize-preprocessing-with-output.html>`__. Performance comparison ---------------------------------------------------------------- diff --git a/docs/notebooks/110-ct-scan-live-inference-with-output.rst b/docs/notebooks/110-ct-scan-live-inference-with-output.rst index 344cc3ec6509cb..e965f2fdd48be7 100644 --- a/docs/notebooks/110-ct-scan-live-inference-with-output.rst +++ b/docs/notebooks/110-ct-scan-live-inference-with-output.rst @@ -18,7 +18,7 @@ This notebook needs a quantized OpenVINO IR model and images from the `KiTS-19 `__ dataset, converted to 2D images. (To learn how the model is quantized, see the `Convert and Quantize a UNet Model and Show Live -Inference <110-ct-segmentation-quantize-nncf.ipynb>`__ tutorial.) +Inference <110-ct-segmentation-quantize-nncf-with-output.html>`__ tutorial.) This notebook provides a pre-trained model, trained for 20 epochs with the full KiTS-19 frames dataset, which has an F1 score on the validation diff --git a/docs/notebooks/110-ct-segmentation-quantize-nncf-with-output.rst b/docs/notebooks/110-ct-segmentation-quantize-nncf-with-output.rst index f189bad8482e59..4ddd722e7a1831 100644 --- a/docs/notebooks/110-ct-segmentation-quantize-nncf-with-output.rst +++ b/docs/notebooks/110-ct-segmentation-quantize-nncf-with-output.rst @@ -24,13 +24,13 @@ This third tutorial in the series shows how to: All notebooks in this series: - `Data Preparation for 2D Segmentation of 3D Medical - Data `__ + Data `__ - `Train a 2D-UNet Medical Imaging Model with PyTorch - Lightning `__ + Lightning `__ - Convert and Quantize a Segmentation Model and Show Live Inference (this notebook) - `Live Inference and Benchmark CT-scan - data <110-ct-scan-live-inference.ipynb>`__ + data <110-ct-scan-live-inference-with-output.html>`__ Instructions ------------ @@ -39,7 +39,7 @@ This notebook needs a trained UNet model. We provide a pre-trained model, trained for 20 epochs with the full `Kits-19 `__ frames dataset, which has an F1 score on the validation set of 0.9. The training code is -available in `this notebook `__. +available in `this notebook `__. NNCF for PyTorch models requires a C++ compiler. On Windows, install `Microsoft Visual Studio @@ -198,7 +198,7 @@ Settings By default, this notebook will download one CT scan from the KITS19 dataset that will be used for quantization. To use the full dataset, set ``BASEDIR`` to the path of the dataset, as prepared according to the -`Data Preparation `__ notebook. +`Data Preparation `__ notebook. .. code:: ipython3 @@ -217,7 +217,7 @@ notebook is a `BasicUNet `__ model from `MONAI `__. We provide a pre-trained checkpoint. To see how this model performs, check out the `training -notebook `__. +notebook `__. .. code:: ipython3 @@ -289,7 +289,7 @@ Dataset The ``KitsDataset`` class in the next cell expects images and masks in the *``basedir``* directory, in a folder per patient. It is a simplified version of the Dataset class in the `training -notebook `__. +notebook `__. Images are loaded with MONAI’s `LoadImage `__, diff --git a/docs/notebooks/115-async-api-with-output.rst b/docs/notebooks/115-async-api-with-output.rst index 10c37b8199e54c..51f74ce38d7b3f 100644 --- a/docs/notebooks/115-async-api-with-output.rst +++ b/docs/notebooks/115-async-api-with-output.rst @@ -13,7 +13,6 @@ requests) rather than wait for the current inference to complete first. **Table of contents:** - - `Imports <#imports>`__ - `Prepare model and data processing <#prepare-model-and-data-processing>`__ @@ -28,33 +27,27 @@ requests) rather than wait for the current inference to complete first. processing <#how-to-improve-the-throughput-of-video-processing>`__ - `Sync Mode (default) <#sync-mode-default>`__ - - `Test performance in Sync - Mode <#test-performance-in-sync-mode>`__ + - `Test performance in Sync Mode <#test-performance-in-sync-mode>`__ - `Async Mode <#async-mode>`__ - `Test the performance in Async Mode <#test-the-performance-in-async-mode>`__ - `Compare the performance <#compare-the-performance>`__ -- `AsyncInferQueue `__ +- `AsyncInferQueue <#asyncinferqueue>`__ - `Setting Callback <#setting-callback>`__ - `Test the performance with AsyncInferQueue <#test-the-performance-with-asyncinferqueue>`__ -Imports -------------------------------------------------- - -.. code:: ipython3 +Imports +------- - %pip install -q "openvino>=2023.1.0" - %pip install -q opencv-python matplotlib -.. parsed-literal:: - - Note: you may need to restart the kernel to use updated packages. - Note: you may need to restart the kernel to use updated packages. +.. code:: ipython3 + # %pip install -q "openvino>=2023.1.0" + # %pip install -q opencv-python matplotlib .. code:: ipython3 @@ -74,11 +67,15 @@ Imports import notebook_utils as utils -Prepare model and data processing ---------------------------------------------------------------------------- +Prepare model and data processing +--------------------------------- + + + +Download test model +~~~~~~~~~~~~~~~~~~~ + -Download test model -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ We use a pre-trained model from OpenVINO’s `Open Model Zoo `__ to start the @@ -116,8 +113,10 @@ each frame of the video. -Load the model -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Load the model +~~~~~~~~~~~~~~ + + .. code:: ipython3 @@ -136,8 +135,10 @@ Load the model N, C, H, W = input_layer_ir.shape shape = (H, W) -Create functions for data processing -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Create functions for data processing +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + .. code:: ipython3 @@ -178,21 +179,27 @@ Create functions for data processing cv2.putText(image, str(round(fps, 2)) + " fps", (5, 20), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 255, 0), 3) return image -Get the test video -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Get the test video +~~~~~~~~~~~~~~~~~~ + + .. code:: ipython3 video_path = 'https://storage.openvinotoolkit.org/repositories/openvino_notebooks/data/data/video/CEO%20Pat%20Gelsinger%20on%20Leading%20Intel.mp4' -How to improve the throughput of video processing -------------------------------------------------------------------------------------------- +How to improve the throughput of video processing +------------------------------------------------- + + Below, we compare the performance of the synchronous and async-based approaches: -Sync Mode (default) -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Sync Mode (default) +~~~~~~~~~~~~~~~~~~~ + + Let us see how video processing works with the default approach. Using the synchronous approach, the frame is captured with OpenCV and then @@ -281,8 +288,10 @@ immediately processed: player.stop() return sync_fps -Test performance in Sync Mode -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Test performance in Sync Mode +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + .. code:: ipython3 @@ -297,11 +306,13 @@ Test performance in Sync Mode .. parsed-literal:: Source ended - average throuput in sync mode: 38.68 fps + average throuput in sync mode: 38.27 fps + + +Async Mode +~~~~~~~~~~ -Async Mode -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Let us see how the OpenVINO Async API can improve the overall frame rate of an application. The key advantage of the Async approach is as @@ -349,6 +360,7 @@ pipeline (decoding vs inference) and not by the sum of the stages. curr_request = compiled_model.create_infer_request() next_request = compiled_model.create_infer_request() player = None + async_fps = 0 try: # Create a video player player = utils.VideoPlayer(source, flip=flip, fps=fps, skip_first_frames=skip_first_frames) @@ -375,28 +387,28 @@ pipeline (decoding vs inference) and not by the sum of the stages. # Start the NEXT inference request next_request.start_async() # Waiting for CURRENT inference result - if curr_request.wait_for(-1) == 1: - res = curr_request.get_output_tensor(0).data - stop_time = time.time() - total_time = stop_time - start_time - frame_number = frame_number + 1 - async_fps = frame_number / total_time - frame = postprocess(res, frame, async_fps) - # Display the results - if use_popup: - cv2.imshow(title, frame) - key = cv2.waitKey(1) - # escape = 27 - if key == 27: - break - else: - # Encode numpy array to jpg - _, encoded_img = cv2.imencode(".jpg", frame, params=[cv2.IMWRITE_JPEG_QUALITY, 90]) - # Create IPython image - i = display.Image(data=encoded_img) - # Display the image in this notebook - display.clear_output(wait=True) - display.display(i) + curr_request.wait() + res = curr_request.get_output_tensor(0).data + stop_time = time.time() + total_time = stop_time - start_time + frame_number = frame_number + 1 + async_fps = frame_number / total_time + frame = postprocess(res, frame, async_fps) + # Display the results + if use_popup: + cv2.imshow(title, frame) + key = cv2.waitKey(1) + # escape = 27 + if key == 27: + break + else: + # Encode numpy array to jpg + _, encoded_img = cv2.imencode(".jpg", frame, params=[cv2.IMWRITE_JPEG_QUALITY, 90]) + # Create IPython image + i = display.Image(data=encoded_img) + # Display the image in this notebook + display.clear_output(wait=True) + display.display(i) # Swap CURRENT and NEXT frames frame = next_frame # Swap CURRENT and NEXT infer requests @@ -415,8 +427,10 @@ pipeline (decoding vs inference) and not by the sum of the stages. player.stop() return async_fps -Test the performance in Async Mode -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Test the performance in Async Mode +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + .. code:: ipython3 @@ -431,11 +445,13 @@ Test the performance in Async Mode .. parsed-literal:: Source ended - average throuput in async mode: 73.57 fps + average throuput in async mode: 72.15 fps + + +Compare the performance +~~~~~~~~~~~~~~~~~~~~~~~ -Compare the performance -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code:: ipython3 @@ -462,8 +478,10 @@ Compare the performance .. image:: 115-async-api-with-output_files/115-async-api-with-output_21_0.png -``AsyncInferQueue`` -------------------------------------------------------------- +``AsyncInferQueue`` +------------------- + + Asynchronous mode pipelines can be supported with the `AsyncInferQueue `__ @@ -472,8 +490,10 @@ wrapper class. This class automatically spawns the pool of synchronization mechanisms to control the flow of the pipeline. It is a simpler way to manage the infer request queue in Asynchronous mode. -Setting Callback -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Setting Callback +~~~~~~~~~~~~~~~~ + + When ``callback`` is set, any job that ends inference calls upon the Python function. The ``callback`` function must have two arguments: one @@ -549,8 +569,10 @@ the possibility of passing runtime values. infer_queue.wait_all() player.stop() -Test the performance with ``AsyncInferQueue`` -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Test the performance with ``AsyncInferQueue`` +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + .. code:: ipython3 @@ -566,5 +588,5 @@ Test the performance with ``AsyncInferQueue`` .. parsed-literal:: - average throughput in async mode with async infer queue: 107.25 fps + average throughput in async mode with async infer queue: 105.36 fps diff --git a/docs/notebooks/115-async-api-with-output_files/115-async-api-with-output_21_0.png b/docs/notebooks/115-async-api-with-output_files/115-async-api-with-output_21_0.png index 9c667ad63b47f5..dcc102dd691908 100644 --- a/docs/notebooks/115-async-api-with-output_files/115-async-api-with-output_21_0.png +++ b/docs/notebooks/115-async-api-with-output_files/115-async-api-with-output_21_0.png @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:0ba110d0d82c00b211370ff95ad7be6995d288abc3954e53a122acce998ea965 -size 30445 +oid sha256:e4f523a824b6e628ef48fa654a0af2dedb2661f23bf18bc31d1e9cc37540fccd +size 30440 diff --git a/docs/notebooks/115-async-api-with-output_files/index.html b/docs/notebooks/115-async-api-with-output_files/index.html index 0a4b0d3326eb60..87539a38a88371 100644 --- a/docs/notebooks/115-async-api-with-output_files/index.html +++ b/docs/notebooks/115-async-api-with-output_files/index.html @@ -1,10 +1,10 @@ -Index of /projects/ov-notebook/0.1.0-latest/20231030220807/dist/rst_files/115-async-api-with-output_files/ +Index of /projects/ov-notebook/0.1.0-latest/20231114220808/dist/rst_files/115-async-api-with-output_files/ -

Index of /projects/ov-notebook/0.1.0-latest/20231030220807/dist/rst_files/115-async-api-with-output_files/


../
-115-async-api-with-output_15_0.png                 31-Oct-2023 00:35                4307
-115-async-api-with-output_19_0.png                 31-Oct-2023 00:35                4307
-115-async-api-with-output_21_0.png                 31-Oct-2023 00:35               30445
-115-async-api-with-output_27_0.png                 31-Oct-2023 00:35                4307
+

Index of /projects/ov-notebook/0.1.0-latest/20231114220808/dist/rst_files/115-async-api-with-output_files/


../
+115-async-api-with-output_15_0.png                 15-Nov-2023 00:43                4307
+115-async-api-with-output_19_0.png                 15-Nov-2023 00:43                4307
+115-async-api-with-output_21_0.png                 15-Nov-2023 00:43               30440
+115-async-api-with-output_27_0.png                 15-Nov-2023 00:43                4307
 

diff --git a/docs/notebooks/118-optimize-preprocessing-with-output.rst b/docs/notebooks/118-optimize-preprocessing-with-output.rst index c940e59c7a89d3..0723e03832d786 100644 --- a/docs/notebooks/118-optimize-preprocessing-with-output.rst +++ b/docs/notebooks/118-optimize-preprocessing-with-output.rst @@ -25,7 +25,6 @@ This tutorial include following steps: **Table of contents:** - - `Settings <#settings>`__ - `Imports <#imports>`__ @@ -42,8 +41,7 @@ This tutorial include following steps: API <#convert-model-to-openvino-ir-with-model-conversion-api>`__ - `Create PrePostProcessor Object <#create-prepostprocessor-object>`__ - - `Declare User’s Data - Format <#declare-users-data-format>`__ + - `Declare User’s Data Format <#declare-users-data-format>`__ - `Declaring Model Layout <#declaring-model-layout>`__ - `Preprocessing Steps <#preprocessing-steps>`__ - `Integrating Steps into a @@ -61,12 +59,13 @@ This tutorial include following steps: - `Compare results <#compare-results>`__ - - `Compare results on one - image <#compare-results-on-one-image>`__ + - `Compare results on one image <#compare-results-on-one-image>`__ - `Compare performance <#compare-performance>`__ -Settings --------------------------------------------------- +Settings +-------- + + .. code:: ipython3 @@ -79,8 +78,10 @@ Settings Note: you may need to restart the kernel to use updated packages. -Imports -------------------------------------------------- +Imports +------- + + .. code:: ipython3 @@ -104,14 +105,16 @@ Imports .. parsed-literal:: - 2023-10-30 22:59:29.607370: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`. - 2023-10-30 22:59:29.641564: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. + 2023-11-14 23:00:32.637266: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`. + 2023-11-14 23:00:32.671311: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. - 2023-10-30 22:59:30.151509: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT + 2023-11-14 23:00:33.179278: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT + + +Setup image and device +~~~~~~~~~~~~~~~~~~~~~~ -Setup image and device -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code:: ipython3 @@ -152,8 +155,10 @@ Setup image and device -Downloading the model -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Downloading the model +~~~~~~~~~~~~~~~~~~~~~ + + This tutorial uses the `InceptionResNetV2 `__. @@ -184,13 +189,26 @@ and save it to the disk. .. parsed-literal:: - 2023-10-30 22:59:32.526472: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1960] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform. - Skipping registering GPU devices... + 2023-11-14 23:00:37.345835: E tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:266] failed call to cuInit: CUDA_ERROR_COMPAT_NOT_SUPPORTED_ON_DEVICE: forward compatibility was attempted on non supported HW + 2023-11-14 23:00:37.345869: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:168] retrieving CUDA diagnostic information for host: iotg-dev-workstation-07 + 2023-11-14 23:00:37.345874: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:175] hostname: iotg-dev-workstation-07 + 2023-11-14 23:00:37.346012: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:199] libcuda reported version is: 470.223.2 + 2023-11-14 23:00:37.346027: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:203] kernel reported version is: 470.182.3 + 2023-11-14 23:00:37.346030: E tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:312] kernel version 470.182.3 does not match DSO version 470.223.2 -- cannot find working devices in this configuration .. parsed-literal:: WARNING:tensorflow:Compiled the loaded model, but the compiled metrics have yet to be built. `model.compile_metrics` will be empty until you train or evaluate the model. + + +.. parsed-literal:: + + WARNING:absl:Found untraced functions such as _jit_compiled_convolution_op, _jit_compiled_convolution_op, _jit_compiled_convolution_op, _jit_compiled_convolution_op, _jit_compiled_convolution_op while saving (showing 5 of 94). These functions will not be directly callable after loading. + + +.. parsed-literal:: + INFO:tensorflow:Assets written to: model/InceptionResNetV2/assets @@ -199,15 +217,19 @@ and save it to the disk. INFO:tensorflow:Assets written to: model/InceptionResNetV2/assets -Create core -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Create core +~~~~~~~~~~~ + + .. code:: ipython3 core = ov.Core() -Check the original parameters of image -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Check the original parameters of image +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + .. code:: ipython3 @@ -227,8 +249,10 @@ Check the original parameters of image .. image:: 118-optimize-preprocessing-with-output_files/118-optimize-preprocessing-with-output_14_1.png -Setup preprocessing steps with Preprocessing API and perform inference ----------------------------------------------------------------------------------------------------------------- +Setup preprocessing steps with Preprocessing API and perform inference +---------------------------------------------------------------------- + + Intuitively, preprocessing API consists of the following parts: @@ -253,8 +277,10 @@ Pre-processing support following operations (please, see more details - Color Conversion - Custom Operations -Convert model to OpenVINO IR with model conversion API -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Convert model to OpenVINO IR with model conversion API +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + The options for preprocessing are not required. @@ -272,8 +298,10 @@ The options for preprocessing are not required. input=[1,299,299,3]) ov.save_model(ppp_model, str(ir_path)) -Create ``PrePostProcessor`` Object -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Create ``PrePostProcessor`` Object +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + The `PrePostProcessor() `__ @@ -286,8 +314,10 @@ a model. ppp = PrePostProcessor(ppp_model) -Declare User’s Data Format -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Declare User’s Data Format +~~~~~~~~~~~~~~~~~~~~~~~~~~ + + To address particular input of a model/preprocessor, use the ``PrePostProcessor.input(input_name)`` method. If the model has only one @@ -325,12 +355,14 @@ for mean/scale normalization. .. parsed-literal:: - + + +Declaring Model Layout +~~~~~~~~~~~~~~~~~~~~~~ + -Declaring Model Layout -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Model input already has information about precision and shape. Preprocessing API is not intended to modify this. The only thing that @@ -354,12 +386,14 @@ may be specified is input data .. parsed-literal:: - + + + +Preprocessing Steps +~~~~~~~~~~~~~~~~~~~ -Preprocessing Steps -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Now, the sequence of preprocessing steps can be defined. For more information about preprocessing steps, see @@ -393,12 +427,14 @@ then such conversion will be added explicitly. .. parsed-literal:: - + -Integrating Steps into a Model -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Integrating Steps into a Model +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + Once the preprocessing steps have been finished, the model can be finally built. It is possible to display ``PrePostProcessor`` @@ -423,8 +459,10 @@ configuration for debugging purposes. -Load model and perform inference --------------------------------------------------------------------------- +Load model and perform inference +-------------------------------- + + .. code:: ipython3 @@ -441,19 +479,25 @@ Load model and perform inference ppp_input_tensor = prepare_image_api_preprocess(image_path) results = compiled_model_with_preprocess_api(ppp_input_tensor)[ppp_output_layer][0] -Fit image manually and perform inference ----------------------------------------------------------------------------------- +Fit image manually and perform inference +---------------------------------------- + + + +Load the model +~~~~~~~~~~~~~~ + -Load the model -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code:: ipython3 model = core.read_model(model=ir_path) compiled_model = core.compile_model(model=model, device_name=device.value) -Load image and fit it to model input -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Load image and fit it to model input +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + .. code:: ipython3 @@ -486,8 +530,10 @@ Load image and fit it to model input The data type of the image is float32 -Perform inference -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Perform inference +~~~~~~~~~~~~~~~~~ + + .. code:: ipython3 @@ -495,11 +541,15 @@ Perform inference result = compiled_model(input_tensor)[output_layer] -Compare results ---------------------------------------------------------- +Compare results +--------------- + + + +Compare results on one image +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + -Compare results on one image -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code:: ipython3 @@ -560,8 +610,10 @@ Compare results on one image n02100877 Irish setter, red setter, 0.00115 -Compare performance -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Compare performance +~~~~~~~~~~~~~~~~~~~ + + .. code:: ipython3 @@ -594,6 +646,6 @@ Compare performance .. parsed-literal:: - IR model in OpenVINO Runtime/CPU with manual image preprocessing: 0.0155 seconds per image, FPS: 64.33 - IR model in OpenVINO Runtime/CPU with preprocessing API: 0.0187 seconds per image, FPS: 53.39 + IR model in OpenVINO Runtime/CPU with manual image preprocessing: 0.0152 seconds per image, FPS: 65.58 + IR model in OpenVINO Runtime/CPU with preprocessing API: 0.0187 seconds per image, FPS: 53.52 diff --git a/docs/notebooks/118-optimize-preprocessing-with-output_files/index.html b/docs/notebooks/118-optimize-preprocessing-with-output_files/index.html index b2e11605fc82e0..77439bd3f1165a 100644 --- a/docs/notebooks/118-optimize-preprocessing-with-output_files/index.html +++ b/docs/notebooks/118-optimize-preprocessing-with-output_files/index.html @@ -1,7 +1,7 @@ -Index of /projects/ov-notebook/0.1.0-latest/20231030220807/dist/rst_files/118-optimize-preprocessing-with-output_files/ +Index of /projects/ov-notebook/0.1.0-latest/20231114220808/dist/rst_files/118-optimize-preprocessing-with-output_files/ -

Index of /projects/ov-notebook/0.1.0-latest/20231030220807/dist/rst_files/118-optimize-preprocessing-with-output_files/


../
-118-optimize-preprocessing-with-output_14_1.png    31-Oct-2023 00:35              387941
+

Index of /projects/ov-notebook/0.1.0-latest/20231114220808/dist/rst_files/118-optimize-preprocessing-with-output_files/


../
+118-optimize-preprocessing-with-output_14_1.png    15-Nov-2023 00:43              387941
 

diff --git a/docs/notebooks/119-tflite-to-openvino-with-output.rst b/docs/notebooks/119-tflite-to-openvino-with-output.rst index 3fc5cd80494c7b..843d31494aae02 100644 --- a/docs/notebooks/119-tflite-to-openvino-with-output.rst +++ b/docs/notebooks/119-tflite-to-openvino-with-output.rst @@ -133,7 +133,7 @@ Load model using OpenVINO TensorFlow Lite Frontend TensorFlow Lite models are supported via ``FrontEnd`` API. You may skip conversion to IR and read models directly by OpenVINO runtime API. For more examples supported formats reading via Frontend API, please look -this `tutorial <../002-openvino-api>`__. +this `tutorial <002-openvino-api-with-output.html>`__. .. code:: ipython3 diff --git a/docs/notebooks/120-tensorflow-object-detection-to-openvino-with-output.rst b/docs/notebooks/120-tensorflow-object-detection-to-openvino-with-output.rst index b0496634980513..393738a11682dc 100644 --- a/docs/notebooks/120-tensorflow-object-detection-to-openvino-with-output.rst +++ b/docs/notebooks/120-tensorflow-object-detection-to-openvino-with-output.rst @@ -180,7 +180,7 @@ or saved on disk using the ``save_model`` function to reduce loading time when the model is run in the future. See the `Model Converter Developer -Guide `__ +Guide `__ for more information about Model Converter and TensorFlow `models support `__. diff --git a/docs/notebooks/124-hugging-face-hub-with-output.rst b/docs/notebooks/124-hugging-face-hub-with-output.rst index 07e51a7b72dfff..ca4d046f9648ec 100644 --- a/docs/notebooks/124-hugging-face-hub-with-output.rst +++ b/docs/notebooks/124-hugging-face-hub-with-output.rst @@ -13,29 +13,44 @@ models, namely |image0| -Throughout this notebook we will learn: 1. How to load a HF pipeline -using the ``transformers`` package and then convert it to OpenVINO. 2. -How to load the same pipeline using Optimum Intel package. - -Contents: - -- `Converting a Model from the HF Transformers Package <#converting-a-model-from-the-hf-transformers-package>`__ -- `Installing Requirements <#installing-requirements>`__ -- `Imports <#imports>`__ -- `Initializing a Model Using the HF Transformers Package <#initializing-a-model-using-the-hf-transformers-package>`__ -- `Original Model inference <#original-model-inference>`__ -- `Converting the Model to OpenVINO IR format <#converting-the-model-to-openvino-ir-format>`__ -- `Converted Model Inference <#converted-model-inference>`__ -- `Converting a Model Using the Optimum Intel Package <#converting-a-model-using-the-optimum-intel-package>`__ -- `Installing Requirements <#install-requirements-for-optimum>`__ -- `Import Optimum <#import-optimum>`__ -- `Initialize and Convert the Model Automatically <#initialize-and-convert-the-model-automatically>`__ +Throughout this notebook we will learn: + +1. How to load a HF pipeline using the ``transformers`` package and then convert it to OpenVINO. +2. How to load the same pipeline using Optimum Intel package. + +**Table of contents:** + +- `Converting a Model from the HF Transformers + Package <#converting-a-model-from-the-hf-transformers-package>`__ + + - `Installing Requirements <#installing-requirements>`__ + - `Imports <#imports>`__ + - `Initializing a Model Using the HF Transformers + Package <#initializing-a-model-using-the-hf-transformers-package>`__ + - `Original Model inference <#original-model-inference>`__ + - `Converting the Model to OpenVINO IR + format <#converting-the-model-to-openvino-ir-format>`__ + - `Converted Model Inference <#converted-model-inference>`__ + +- `Converting a Model Using the Optimum Intel + Package <#converting-a-model-using-the-optimum-intel-package>`__ + + - `Install Requirements for + Optimum <#install-requirements-for-optimum>`__ + - `Import Optimum <#import-optimum>`__ + - `Initialize and Convert the Model Automatically using OVModel + class <#initialize-and-convert-the-model-automatically-using-ovmodel-class>`__ + - `Convert model using Optimum CLI + interface <#convert-model-using-optimum-cli-interface>`__ + - `The Optimum Model Inference <#the-optimum-model-inference>`__ .. |image0| image:: https://github.com/huggingface/optimum-intel/raw/main/readme_logo.png Converting a Model from the HF Transformers Package --------------------------------------------------- + + Hugging Face transformers package provides API for initializing a model and loading a set of pre-trained weights using the model text handle. Discovering a desired model name is straightforward with `HF website’s @@ -46,9 +61,11 @@ by popularity and novelty. Installing Requirements ~~~~~~~~~~~~~~~~~~~~~~~ + + .. code:: ipython3 - %pip install -q --extra-index-url https://download.pytorch.org/whl/cpu transformers[torch] + %pip install -q --extra-index-url https://download.pytorch.org/whl/cpu "transformers[torch]>=4.33.0" %pip install -q ipywidgets %pip install -q "openvino>=2023.1.0" @@ -63,6 +80,8 @@ Installing Requirements Imports ~~~~~~~ + + .. code:: ipython3 from pathlib import Path @@ -76,8 +95,9 @@ Imports Initializing a Model Using the HF Transformers Package ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -We will use `roberta text sentiment -classification `__ + + +We will use `roberta text sentiment classification `__ model in our example, it is a transformer-based encoder model pretrained in a special way, please refer to the model card to learn more. @@ -100,7 +120,7 @@ tutorials `__ + We use the OpenVINO `Model conversion API `__ to convert the model (this one is implemented in PyTorch) to OpenVINO Intermediate Representation (IR). @@ -159,6 +180,8 @@ Note how we reuse our real ``encoded_input``, passing it to the Converted Model Inference ~~~~~~~~~~~~~~~~~~~~~~~~~ + + First, we pick a device to do the model inference .. code:: ipython3 @@ -212,13 +235,11 @@ original model. This is a rather simple example as the pipeline includes just one encoder model. Contemporary state of the art pipelines often consist of -several model, feel free to explore other OpenVINO tutorials: 1. `Stable -Diffusion -v2 `__ -2. `Zero-shot Image Classification with OpenAI -CLIP `__ -3. `Controllable Music Generation with -MusicGen `__ +several model, feel free to explore other OpenVINO tutorials: + +1. `Stable Diffusion v2 `__ +2. `Zero-shot Image Classification with OpenAI CLIP `__ +3. `Controllable Music Generation with MusicGen `__ The workflow for the ``diffusers`` package is exactly the same. The first example in the list above relies on the ``diffusers``. @@ -226,6 +247,8 @@ first example in the list above relies on the ``diffusers``. Converting a Model Using the Optimum Intel Package -------------------------------------------------- + + 🤗 Optimum Intel is the interface between the 🤗 Transformers and Diffusers libraries and the different tools and libraries provided by Intel to accelerate end-to-end pipelines on Intel architectures. @@ -238,37 +261,11 @@ OpenVINO Runtime. Install Requirements for Optimum ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -.. code:: ipython3 - - %pip install -q "optimum==1.13.0" - %pip install -q "optimum-intel"@git+https://github.com/huggingface/optimum-intel.git - %pip install -q onnx -.. parsed-literal:: - - huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... - To disable this warning, you can either: - - Avoid using `tokenizers` before the fork if possible - - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) - - -.. parsed-literal:: - - Note: you may need to restart the kernel to use updated packages. - - -.. parsed-literal:: - - huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... - To disable this warning, you can either: - - Avoid using `tokenizers` before the fork if possible - - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) - - -.. parsed-literal:: +.. code:: ipython3 - Note: you may need to restart the kernel to use updated packages. + %pip install -q "optimum-intel"@git+https://github.com/huggingface/optimum-intel.git onnx .. parsed-literal:: @@ -287,15 +284,15 @@ Install Requirements for Optimum Import Optimum ~~~~~~~~~~~~~~ + + Documentation for Optimum Intel states: >You can now easily perform inference with OpenVINO Runtime on a variety of Intel processors (see the full list of supported devices). For that, just replace the ``AutoModelForXxx`` class with the corresponding ``OVModelForXxx`` class. -You can find `Optimum Intel -documentation `__ -on the Hugging Face website. +You can find more information in `Optimum Intel documentation `__. .. code:: ipython3 @@ -318,36 +315,51 @@ on the Hugging Face website. - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda' - 2023-10-30 23:06:03.589130: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`. - 2023-10-30 23:06:03.624230: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. + 2023-11-14 23:07:03.743874: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`. + 2023-11-14 23:07:03.778576: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. - 2023-10-30 23:06:04.183799: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT - /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-534/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations - warnings.warn( + 2023-11-14 23:07:04.334607: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT -Initialize and Convert the Model Automatically -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Initialize and Convert the Model Automatically using OVModel class +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -To load a Transformers model and convert it to the OpenVINO format -on-the-fly, you can set ``export=True`` when loading your model. + + +To load a Transformers model and convert it to the OpenVINO format on +the fly, you can set ``export=True`` when loading your model. The model +can be saved in OpenVINO format using ``save_pretrained`` method and +specifying a directory for storing the model as an argument. For the +next usage, you can avoid the conversion step and load the saved early +model from disk using ``from_pretrained`` method without export +specification. We also specified ``device`` parameter for compiling the +model on the specific device, if not provided, the default device will +be used. The device can be changed later in runtime using +``model.to(device)``, please note that it may require some time for +model compilation on a newly selected device. In some cases, it can be +useful to separate model initialization and compilation, for example, if +you want to reshape the model using ``reshape`` method, you can postpone +compilation, providing the parameter ``compile=False`` into +``from_pretrained`` method, compilation can be performed manually using +``compile`` method or will be performed automatically during first +inference run. .. code:: ipython3 model = OVModelForSequenceClassification.from_pretrained(MODEL, export=True, device=device.value) # The save_pretrained() method saves the model weights to avoid conversion on the next load. - model.save_pretrained('./models') + model.save_pretrained('./models/optimum_model') .. parsed-literal:: Framework not specified. Using pt to export to ONNX. - Some weights of the model checkpoint at cardiffnlp/twitter-roberta-base-sentiment-latest were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias'] + Some weights of the model checkpoint at cardiffnlp/twitter-roberta-base-sentiment-latest were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight'] - This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model). - This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). Using the export variant default. Available variants are: - - default: The default ONNX variant. + - default: The default ONNX variant. Using framework PyTorch: 2.1.0+cpu Overriding 1 configuration item(s) - use_cache -> False @@ -361,10 +373,158 @@ on-the-fly, you can set ``export=True`` when loading your model. .. parsed-literal:: Compiling the model to AUTO ... - Set CACHE_DIR to /tmp/tmpx5aqydhf/model_cache -Moreover, some models in the Hugging Face Models Hub are already +Convert model using Optimum CLI interface +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + + +Alternatively, you can use the Optimum CLI interface for converting +models (supported starting optimum-intel 1.12 version). General command +format: + +.. code:: bash + + optimum-cli export openvino --model --task + +where task is task to export the model for, if not specified, the task +will be auto-inferred based on the model. Available tasks depend on the +model, but are among: [‘default’, ‘fill-mask’, ‘text-generation’, +‘text2text-generation’, ‘text-classification’, ‘token-classification’, +‘multiple-choice’, ‘object-detection’, ‘question-answering’, +‘image-classification’, ‘image-segmentation’, ‘masked-im’, +‘semantic-segmentation’, ‘automatic-speech-recognition’, +‘audio-classification’, ‘audio-frame-classification’, +‘automatic-speech-recognition’, ‘audio-xvector’, ‘image-to-text’, +‘stable-diffusion’, ‘zero-shot-object-detection’]. For decoder models, +use ``xxx-with-past`` to export the model using past key values in the +decoder. + +You can find a mapping between tasks and model classes in Optimum +TaskManager +`documentation `__. + +Additionally, you can specify weights compression ``--fp16`` for the +compression model to FP16 and ``--int8`` for the compression model to +INT8. Please note, that for INT8, it is necessary to install nncf. + +Full list of supported arguments available via ``--help`` + +.. code:: ipython3 + + !optimum-cli export openvino --help + + +.. parsed-literal:: + + huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... + To disable this warning, you can either: + - Avoid using `tokenizers` before the fork if possible + - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) + + +.. parsed-literal:: + + 2023-11-14 23:07:16.627580: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT + usage: optimum-cli export openvino [-h] -m MODEL [--task TASK] + [--cache_dir CACHE_DIR] + [--framework {pt,tf}] [--trust-remote-code] + [--pad-token-id PAD_TOKEN_ID] [--fp16] + [--int8] + output + + optional arguments: + -h, --help show this help message and exit + + Required arguments: + -m MODEL, --model MODEL + Model ID on huggingface.co or path on disk to load + model from. + output Path indicating the directory where to store the + generated OV model. + + Optional arguments: + --task TASK The task to export the model for. If not specified, + the task will be auto-inferred based on the model. + Available tasks depend on the model, but are among: + ['stable-diffusion-xl', 'multiple-choice', 'zero-shot- + image-classification', 'audio-classification', 'image- + to-image', 'text2text-generation', 'text- + classification', 'text-to-audio', 'text-generation', + 'depth-estimation', 'question-answering', 'fill-mask', + 'zero-shot-object-detection', 'conversational', + 'audio-frame-classification', 'masked-im', 'image- + classification', 'mask-generation', 'stable- + diffusion', 'token-classification', 'image- + segmentation', 'audio-xvector', 'object-detection', + 'feature-extraction', 'semantic-segmentation', 'image- + to-text', 'automatic-speech-recognition']. For decoder + models, use `xxx-with-past` to export the model using + past key values in the decoder. + --cache_dir CACHE_DIR + Path indicating where to store cache. + --framework {pt,tf} The framework to use for the export. If not provided, + will attempt to use the local checkpoint's original + framework or what is available in the environment. + --trust-remote-code Allows to use custom code for the modeling hosted in + the model repository. This option should only be set + for repositories you trust and in which you have read + the code, as it will execute on your local machine + arbitrary code present in the model repository. + --pad-token-id PAD_TOKEN_ID + This is needed by some models, for some tasks. If not + provided, will attempt to use the tokenizer to guess + it. + --fp16 Compress weights to fp16 + --int8 Compress weights to int8 + + +The command line export for model from example above with FP16 weights +compression: + +.. code:: ipython3 + + !optimum-cli export openvino --model $MODEL --task text-classification --fp16 models/optimum_model/fp16 + + +.. parsed-literal:: + + huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... + To disable this warning, you can either: + - Avoid using `tokenizers` before the fork if possible + - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) + + +.. parsed-literal:: + + 2023-11-14 23:07:20.866293: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT + Framework not specified. Using pt to export to ONNX. + Some weights of the model checkpoint at cardiffnlp/twitter-roberta-base-sentiment-latest were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias'] + - This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model). + - This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). + Using the export variant default. Available variants are: + - default: The default ONNX variant. + Using framework PyTorch: 2.1.0+cpu + Overriding 1 configuration item(s) + - use_cache -> False + + +After export, model will be available in the specified directory and can +be loaded using the same OVModelForXXX class. + +.. code:: ipython3 + + model = OVModelForSequenceClassification.from_pretrained("models/optimum_model/fp16", device=device.value) + + +.. parsed-literal:: + + Compiling the model to AUTO ... + Setting OpenVINO CACHE_DIR to models/optimum_model/fp16/model_cache + + +There are some models in the Hugging Face Models Hub, that are already converted and ready to run! You can filter those models out by library name, just type OpenVINO, or follow `this link `__. @@ -372,21 +532,23 @@ link `__. The Optimum Model Inference ~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + Model inference is exactly the same as for the original model! .. code:: ipython3 - output = model.forward(**encoded_input) + output = model(**encoded_input) scores = output[0][0] - scores = torch.softmax(scores, dim=0).detach().numpy() + scores = torch.softmax(scores, dim=0).numpy(force=True) print_prediction(scores) .. parsed-literal:: - 1) positive 0.9485 - 2) neutral 0.0484 + 1) positive 0.9483 + 2) neutral 0.0485 3) negative 0.0031 @@ -403,3 +565,7 @@ XL `__ 6. `Create LLM-powered Chatbot using OpenVINO `__ +7. `Document Visual Question Answering Using Pix2Struct and +OpenVINO `__ +8. `Automatic speech recognition using Distil-Whisper and +OpenVINO `__ diff --git a/docs/notebooks/126-tensorflow-hub-with-output.rst b/docs/notebooks/126-tensorflow-hub-with-output.rst index 2a6f3100d82de2..2a66974a13cb5c 100644 --- a/docs/notebooks/126-tensorflow-hub-with-output.rst +++ b/docs/notebooks/126-tensorflow-hub-with-output.rst @@ -1,8 +1,6 @@ Convert of TensorFlow Hub models to OpenVINO Intermediate Representation (IR) ============================================================================= -|Colab| |Binder| - This tutorial demonstrates step-by-step instructions on how to convert models loaded from TensorFlow Hub using OpenVINO Runtime. @@ -24,7 +22,6 @@ or selectively execute specific sections, as each section operates independently. **Table of contents:** ---- - `Image classification <#image-classification>`__ - `Install required packages <#install-required-packages>`__ @@ -41,13 +38,11 @@ independently. - `Select inference device <#select-inference-device>`__ - `Inference <#inference>`__ -.. |Colab| image:: https://colab.research.google.com/assets/colab-badge.svg - :target: https://colab.research.google.com/github/openvinotoolkit/openvino_notebooks/blob/main/notebooks/126-tensorflow-hub/126-tensorflow-hub.ipynb -.. |Binder| image:: https://mybinder.org/badge_logo.svg - :target: https://mybinder.org/v2/gh/openvinotoolkit/openvino_notebooks/HEAD?filepath=notebooks%2F126-tensorflow-hub%2F126-tensorflow-hub.ipynb -Image classification --------------------------------------------------------------- +Image classification +-------------------- + + We will use the `MobileNet_v2 `__ image classification model from `TensorFlow Hub `__. @@ -67,8 +62,10 @@ efficient deep learning inference on smartphones and edge devices. More information about model can be found on `Model page on TensorFlow Hub `__ -Install required packages -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Install required packages +~~~~~~~~~~~~~~~~~~~~~~~~~ + + .. code:: ipython3 @@ -79,16 +76,18 @@ Install required packages .. parsed-literal:: ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. - onnxconverter-common 1.14.0 requires protobuf==3.20.2, but you have protobuf 4.24.4 which is incompatible. - tf2onnx 1.15.1 requires protobuf~=3.20.2, but you have protobuf 4.24.4 which is incompatible. + onnxconverter-common 1.14.0 requires protobuf==3.20.2, but you have protobuf 4.25.0 which is incompatible. + tf2onnx 1.15.1 requires protobuf~=3.20.2, but you have protobuf 4.25.0 which is incompatible. Note: you may need to restart the kernel to use updated packages. ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. openvino-dev 2023.1.0 requires openvino==2023.1.0, but you have openvino 2023.2.0.dev20230922 which is incompatible. Note: you may need to restart the kernel to use updated packages. -Import libraries -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Import libraries +~~~~~~~~~~~~~~~~ + + .. code:: ipython3 @@ -113,10 +112,11 @@ Import libraries IMAGE_URL, IMAGE_PATH = "https://storage.googleapis.com/download.tensorflow.org/example_images/grace_hopper.jpg", "data/grace_hopper.jpg" MODEL_URL, MODEL_PATH = "https://tfhub.dev/google/imagenet/mobilenet_v2_100_224/classification/5", "models/mobilenet_v2_100_224.xml" -Download the classifier -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Download the classifier +~~~~~~~~~~~~~~~~~~~~~~~ -Select a MobileNetV2 pre-trained model `from TensorFlow + Select a MobileNetV2 +pre-trained model `from TensorFlow Hub `__ and wrap it as a Keras layer with ``hub.KerasLayer``. @@ -124,11 +124,19 @@ and wrap it as a Keras layer with ``hub.KerasLayer``. model = hub.KerasLayer(MODEL_URL, input_shape=IMAGE_SHAPE + (3,)) -Download a single image to try the model on -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -The input ``images`` are expected to have color values in the range -[0,1], following the `common image input +.. parsed-literal:: + + 2023-11-14 23:08:14.660883: E tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:266] failed call to cuInit: CUDA_ERROR_COMPAT_NOT_SUPPORTED_ON_DEVICE: forward compatibility was attempted on non supported HW + 2023-11-14 23:08:14.661058: E tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:312] kernel version 470.182.3 does not match DSO version 470.223.2 -- cannot find working devices in this configuration + + +Download a single image to try the model on +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + The input ``images`` are +expected to have color values in the range [0,1], following the `common +image input conventions `__. For this model, the size of the input images is fixed to ``height`` x ``width`` = 224 x 224 pixels. @@ -163,8 +171,10 @@ Normalize the image to [0,1] range. -Convert model to OpenVINO IR -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Convert model to OpenVINO IR +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + We will convert the loaded model to OpenVINO IR using ``ov.convert_model`` function. We pass the model object to it, no @@ -177,8 +187,10 @@ additional arguments required. Then, we save the model to disk using converted_model = ov.convert_model(model) ov.save_model(converted_model, MODEL_PATH) -Select inference device -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Select inference device +~~~~~~~~~~~~~~~~~~~~~~~ + + select device from dropdown list for running inference using OpenVINO @@ -210,8 +222,10 @@ select device from dropdown list for running inference using OpenVINO compiled_model = core.compile_model(MODEL_PATH, device_name=device.value) -Inference -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Inference +~~~~~~~~~ + + Add a batch dimension (with ``np.newaxis``) and pass the image to the model: @@ -266,8 +280,10 @@ dataset labels to decode the predictions: .. image:: 126-tensorflow-hub-with-output_files/126-tensorflow-hub-with-output_26_0.png -Image style transfer --------------------------------------------------------------- +Image style transfer +-------------------- + + We will use `arbitrary image stylization model `__ from `TensorFlow @@ -295,8 +311,10 @@ very efficient. More model information can be found on `Model page on TensorFlow Hub `__. -Install required packages -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Install required packages +~~~~~~~~~~~~~~~~~~~~~~~~~ + + .. code:: ipython3 @@ -336,8 +354,10 @@ Install required packages MODEL_URL = "https://tfhub.dev/google/magenta/arbitrary-image-stylization-v1-256/2" MODEL_PATH = "./models/arbitrary-image-stylization-v1-256.xml" -Load the model -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Load the model +~~~~~~~~~~~~~~ + + We load the model from TensorFlow Hub using ``hub.KerasLayer``. Since the model has multiple inputs (content image and style image), we need @@ -354,8 +374,10 @@ function. outputs = model(inputs) model = tf.keras.Model(inputs=inputs, outputs=outputs) -Convert the model to OpenVINO IR -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Convert the model to OpenVINO IR +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + We convert the loaded model to OpenVINO IR using ``ov.convert_model`` function. We pass our model to the function, no additional arguments @@ -369,8 +391,10 @@ needed. After converting, we save the model to disk using converted_model = ov.convert_model(model) ov.save_model(converted_model, MODEL_PATH) -Select inference device -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Select inference device +~~~~~~~~~~~~~~~~~~~~~~~ + + select device from dropdown list for running inference using OpenVINO @@ -402,8 +426,10 @@ select device from dropdown list for running inference using OpenVINO compiled_model = core.compile_model(MODEL_PATH, device_name=device.value) -Inference -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Inference +~~~~~~~~~ + + .. code:: ipython3 diff --git a/docs/notebooks/126-tensorflow-hub-with-output_files/index.html b/docs/notebooks/126-tensorflow-hub-with-output_files/index.html index 5bc07f4507b316..072161cf5c7b17 100644 --- a/docs/notebooks/126-tensorflow-hub-with-output_files/index.html +++ b/docs/notebooks/126-tensorflow-hub-with-output_files/index.html @@ -1,10 +1,10 @@ -Index of /projects/ov-notebook/0.1.0-latest/20231030220807/dist/rst_files/126-tensorflow-hub-with-output_files/ +Index of /projects/ov-notebook/0.1.0-latest/20231114220808/dist/rst_files/126-tensorflow-hub-with-output_files/ -

Index of /projects/ov-notebook/0.1.0-latest/20231030220807/dist/rst_files/126-tensorflow-hub-with-output_files/


../
-126-tensorflow-hub-with-output_11_0.jpg            31-Oct-2023 00:35               10479
-126-tensorflow-hub-with-output_11_0.png            31-Oct-2023 00:35               92843
-126-tensorflow-hub-with-output_26_0.png            31-Oct-2023 00:35              203738
-126-tensorflow-hub-with-output_45_0.png            31-Oct-2023 00:35              538743
+

Index of /projects/ov-notebook/0.1.0-latest/20231114220808/dist/rst_files/126-tensorflow-hub-with-output_files/


../
+126-tensorflow-hub-with-output_11_0.jpg            15-Nov-2023 00:43               10479
+126-tensorflow-hub-with-output_11_0.png            15-Nov-2023 00:43               92843
+126-tensorflow-hub-with-output_26_0.png            15-Nov-2023 00:43              203738
+126-tensorflow-hub-with-output_45_0.png            15-Nov-2023 00:43              538743
 

diff --git a/docs/notebooks/212-pyannote-speaker-diarization-with-output.rst b/docs/notebooks/212-pyannote-speaker-diarization-with-output.rst index c57a004003e4ad..bf2462f921726e 100644 --- a/docs/notebooks/212-pyannote-speaker-diarization-with-output.rst +++ b/docs/notebooks/212-pyannote-speaker-diarization-with-output.rst @@ -39,9 +39,10 @@ card `__, **Table of contents:** - - `Prerequisites <#prerequisites>`__ - `Prepare pipeline <#prepare-pipeline>`__ +- `Login to huggingfacehub to get access to pre-trained + model <#login-to-huggingfacehub-to-get-access-to-pre-trained-model>`__ - `Load test audio file <#load-test-audio-file>`__ - `Run inference pipeline <#run-inference-pipeline>`__ - `Convert model to OpenVINO Intermediate Representation @@ -52,8 +53,10 @@ card `__, - `Run speaker diarization with OpenVINO <#run-speaker-diarization-with-openvino>`__ -Prerequisites -------------------------------------------------------- +Prerequisites +------------- + + .. code:: ipython3 @@ -67,16 +70,18 @@ Prerequisites onnx 1.15.0 requires protobuf>=3.20.2, but you have protobuf 3.20.1 which is incompatible. onnxconverter-common 1.14.0 requires protobuf==3.20.2, but you have protobuf 3.20.1 which is incompatible. paddlepaddle 2.5.2 requires protobuf>=3.20.2; platform_system != "Windows", but you have protobuf 3.20.1 which is incompatible. - ppgan 2.1.0 requires imageio==2.9.0, but you have imageio 2.31.6 which is incompatible. + ppgan 2.1.0 requires imageio==2.9.0, but you have imageio 2.32.0 which is incompatible. ppgan 2.1.0 requires librosa==0.8.1, but you have librosa 0.9.2 which is incompatible. ppgan 2.1.0 requires opencv-python<=4.6.0.66, but you have opencv-python 4.8.1.78 which is incompatible. - tensorflow 2.13.1 requires protobuf!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.20.3, but you have protobuf 3.20.1 which is incompatible. + tensorflow 2.12.0 requires protobuf!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.20.3, but you have protobuf 3.20.1 which is incompatible. tf2onnx 1.15.1 requires protobuf~=3.20.2, but you have protobuf 3.20.1 which is incompatible. Note: you may need to restart the kernel to use updated packages. -Prepare pipeline ----------------------------------------------------------- +Prepare pipeline +---------------- + + Traditional Speaker Diarization systems can be generalized into a five-step process: @@ -133,9 +138,10 @@ hub `__. the following code: .. code:: python - + :force: ## login to huggingfacehub to get access to pre-trained model + from huggingface_hub import notebook_login, whoami try: @@ -150,8 +156,10 @@ hub `__. pipeline = Pipeline.from_pretrained("philschmid/pyannote-speaker-diarization-endpoint") -Load test audio file --------------------------------------------------------------- +Load test audio file +-------------------- + + .. code:: ipython3 @@ -206,8 +214,10 @@ Load test audio file .. image:: 212-pyannote-speaker-diarization-with-output_files/212-pyannote-speaker-diarization-with-output_9_1.png -Run inference pipeline ----------------------------------------------------------------- +Run inference pipeline +---------------------- + + For running inference, we should provide a path to input audio to the pipeline @@ -267,8 +277,10 @@ We can also print each time frame and corresponding speaker: start=27.8s stop=29.5s speaker_SPEAKER_02 -Convert model to OpenVINO Intermediate Representation format ------------------------------------------------------------------------------------------------------- +Convert model to OpenVINO Intermediate Representation format +------------------------------------------------------------ + + For best results with OpenVINO, it is recommended to convert the model to OpenVINO IR format. OpenVINO supports PyTorch via ONNX conversion. We @@ -306,8 +318,10 @@ with ``openvino.runtime.serialize``. Model successfully converted to IR and saved to pyannote-segmentation.xml -Select inference device ------------------------------------------------------------------ +Select inference device +----------------------- + + select device from dropdown list for running inference using OpenVINO @@ -333,8 +347,10 @@ select device from dropdown list for running inference using OpenVINO -Replace segmentation model with OpenVINO ----------------------------------------------------------------------------------- +Replace segmentation model with OpenVINO +---------------------------------------- + + .. code:: ipython3 @@ -363,8 +379,10 @@ Replace segmentation model with OpenVINO pipeline._segmentation.infer = infer_segm -Run speaker diarization with OpenVINO -------------------------------------------------------------------------------- +Run speaker diarization with OpenVINO +------------------------------------- + + .. code:: ipython3 @@ -379,7 +397,7 @@ Run speaker diarization with OpenVINO .. parsed-literal:: - Diarization pipeline took 14.49 s + Diarization pipeline took 14.54 s .. code:: ipython3 diff --git a/docs/notebooks/212-pyannote-speaker-diarization-with-output_files/index.html b/docs/notebooks/212-pyannote-speaker-diarization-with-output_files/index.html index be3ff93140e94b..d4a7c596a93151 100644 --- a/docs/notebooks/212-pyannote-speaker-diarization-with-output_files/index.html +++ b/docs/notebooks/212-pyannote-speaker-diarization-with-output_files/index.html @@ -1,9 +1,9 @@ -Index of /projects/ov-notebook/0.1.0-latest/20231030220807/dist/rst_files/212-pyannote-speaker-diarization-with-output_files/ +Index of /projects/ov-notebook/0.1.0-latest/20231114220808/dist/rst_files/212-pyannote-speaker-diarization-with-output_files/ -

Index of /projects/ov-notebook/0.1.0-latest/20231030220807/dist/rst_files/212-pyannote-speaker-diarization-with-output_files/


../
-212-pyannote-speaker-diarization-with-output_14..> 31-Oct-2023 00:35                7969
-212-pyannote-speaker-diarization-with-output_27..> 31-Oct-2023 00:35                7969
-212-pyannote-speaker-diarization-with-output_9_..> 31-Oct-2023 00:35               43095
+

Index of /projects/ov-notebook/0.1.0-latest/20231114220808/dist/rst_files/212-pyannote-speaker-diarization-with-output_files/


../
+212-pyannote-speaker-diarization-with-output_14..> 15-Nov-2023 00:43                7969
+212-pyannote-speaker-diarization-with-output_27..> 15-Nov-2023 00:43                7969
+212-pyannote-speaker-diarization-with-output_9_..> 15-Nov-2023 00:43               43095
 

diff --git a/docs/notebooks/214-grammar-correction-with-output.rst b/docs/notebooks/214-grammar-correction-with-output.rst index fb3a4b2397a951..4e251f92018ec0 100644 --- a/docs/notebooks/214-grammar-correction-with-output.rst +++ b/docs/notebooks/214-grammar-correction-with-output.rst @@ -47,28 +47,38 @@ It consists of the following steps: **Table of contents:** - -- `How does it work? <#how-does-it-work>`__ -- `Prerequisites <#prerequisites>`__ -- `Download and Convert - Models <#download-and-convert-models>`__ - - - `Select inference device <#select-inference-device>`__ - - `Grammar Checker <#grammar-checker>`__ - - `Grammar Corrector <#grammar-corrector>`__ - -- `Prepare Demo Pipeline <#prepare-demo-pipeline>`__ -- `Quantization <#quantization>`__ - - - `Run Quantization <#run-quantization>`__ - - `Compare model size, performance and - accuracy <#compare-model-size-performance-and-accuracy>`__ - -- `Interactive demo <#interactive-demo>`__ +- `How does it work? + <#how-does-it-work>`__ +- `Prerequisites + <#prerequisites>`__ +- `Download and Convert Models + <#download-and-convert-models>`__ + + - `Select inference device + <#select-inference-device>`__ + - `Grammar Checker + <#grammar-checker>`__ + - `Grammar Corrector + <#grammar-corrector>`__ + +- `Prepare Demo Pipeline + <#prepare-demo-pipeline>`__ +- `Quantization + <#quantization>`__ + + - `Run Quantization + <#run-quantization>`__ + - `Compare model size, performance and accuracy + <#compare-model-size-performance-and-accuracy>`__ + +- `Interactive demo + <#interactive-demo>`__ How does it work? ------------------------------------------------------------ + + A Grammatical Error Correction task can be thought of as a sequence-to-sequence task where a model is trained to take a grammatically incorrect sentence as input and return a grammatically @@ -119,6 +129,8 @@ Now that we know more about FLAN-T5 and RoBERTa, let us get started. 🚀 Prerequisites -------------------------------------------------------- + + First, we need to install the `Hugging Face Optimum `__ library accelerated by OpenVINO integration. The Hugging Face Optimum API is a @@ -129,7 +141,7 @@ documentation `__. .. code:: ipython3 - %pip install -q "git+https://github.com/huggingface/optimum-intel.git" "openvino>=2023.1.0" onnx onnxruntime gradio + %pip install -q "git+https://github.com/huggingface/optimum-intel.git" "openvino>=2023.1.0" onnx gradio "transformers>=4.33.0" %pip install -q "git+https://github.com/openvinotoolkit/nncf.git@9c671f0ae0a118e4bc2de8b09e66425931c0bfa4" datasets jiwer @@ -142,6 +154,8 @@ documentation `__. Download and Convert Models ---------------------------------------------------------------------- + + Optimum Intel can be used to load optimized models from the `Hugging Face Hub `__ and create pipelines to run an inference with OpenVINO Runtime using Hugging @@ -198,6 +212,8 @@ Tokenizer class and pipelines API are compatible with Optimum models. Select inference device ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + select device from dropdown list for running inference using OpenVINO .. code:: ipython3 @@ -228,6 +244,8 @@ select device from dropdown list for running inference using OpenVINO Grammar Checker ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + .. code:: ipython3 grammar_checker_model_id = "textattack/roberta-base-CoLA" @@ -291,6 +309,8 @@ Great! Looks like the model can detect errors in the sample. Grammar Corrector ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + The steps for loading the Grammar Corrector model are very similar, except for the model class that is used. Because FLAN-T5 is a sequence-to-sequence text generation model, we should use the @@ -361,6 +381,8 @@ Nice! The result looks pretty good! Prepare Demo Pipeline ---------------------------------------------------------------- + + Now let us put everything together and create the pipeline for grammar correction. The pipeline accepts input text, verifies its correctness, and generates the correct version if required. It will consist of @@ -498,6 +520,8 @@ Let us see it in action. Quantization ------------------------------------------------------- + + `NNCF `__ enables post-training quantization by adding quantization layers into model graph and then using a subset of the training dataset to initialize the @@ -543,6 +567,8 @@ improve model inference speed. Run Quantization ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + Below we retrieve the quantized model. Please see ``utils.py`` for source code. Quantization is relatively time-consuming and will take some time to complete. @@ -577,16 +603,63 @@ some time to complete. Output() + +.. raw:: html + +

+
+
+
+
+.. raw:: html
+
+    
+    
+ + + + .. parsed-literal:: Output() + +.. raw:: html + +

+
+
+
+
+.. raw:: html
+
+    
+    
+ + + + .. parsed-literal:: Output() + +.. raw:: html + +

+
+
+
+
+.. raw:: html
+
+    
+    
+ + + .. parsed-literal:: Compiling the encoder to AUTO ... @@ -622,6 +695,8 @@ model and original FP32 model should be almost the same. Compare model size, performance and accuracy ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + First, we compare file size of ``FP32`` and ``INT8`` models. .. code:: ipython3 @@ -687,8 +762,8 @@ where WER is Word Error Rate metric. Accuracy drop :0.59%. Model footprint reduction: 3.989 -Interactive demo ------------------------------------------------------------ +Interactive demo \ +----------------------------------------------------------------------------------------------------- .. code:: ipython3 @@ -750,5 +825,5 @@ Interactive demo .. .. raw:: html -..
+..
diff --git a/docs/notebooks/215-image-inpainting-with-output.rst b/docs/notebooks/215-image-inpainting-with-output.rst index c9f63034534f66..d5d59bd81aeb3c 100644 --- a/docs/notebooks/215-image-inpainting-with-output.rst +++ b/docs/notebooks/215-image-inpainting-with-output.rst @@ -1,5 +1,5 @@ Image In-painting with OpenVINO™ -================================ +-------------------------------- This notebook demonstrates how to use an image in-painting model with OpenVINO, using `GMCNN @@ -11,7 +11,6 @@ original image. The Following pipeline will be used in this notebook. **Table of contents:** - - `Download the Model <#download-the-model>`__ - `Convert Tensorflow model to OpenVINO IR format <#convert-tensorflow-model-to-openvino-ir-format>`__ @@ -20,8 +19,7 @@ original image. The Following pipeline will be used in this notebook. model <#determine-the-input-shapes-of-the-model>`__ - `Create a square mask <#create-a-square-mask>`__ - `Load and Resize the Image <#load-and-resize-the-image>`__ -- `Generating the Masked - Image <#generating-the-masked-image>`__ +- `Generating the Masked Image <#generating-the-masked-image>`__ - `Preprocessing <#preprocessing>`__ - `Inference <#inference>`__ - `Save the Restored Image <#save-the-restored-image>`__ @@ -32,6 +30,15 @@ original image. The Following pipeline will be used in this notebook. %pip install -q "openvino>=2023.1.0" "opencv-python" "matplotlib" + +.. parsed-literal:: + + + [notice] A new release of pip is available: 23.2.1 -> 23.3.1 + [notice] To update, run: pip install --upgrade pip + Note: you may need to restart the kernel to use updated packages. + + .. code:: ipython3 import sys @@ -46,8 +53,10 @@ original image. The Following pipeline will be used in this notebook. sys.path.append("../utils") import notebook_utils as utils -Download the Model -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Download the Model +~~~~~~~~~~~~~~~~~~ + + Download ``gmcnn-places2-tf``\ model (this step will be skipped if the model is already downloaded) and then unzip it. Downloaded model stored @@ -78,8 +87,10 @@ be obtained from original model checkpoint can be found in this Already downloaded -Convert Tensorflow model to OpenVINO IR format -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Convert Tensorflow model to OpenVINO IR format +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + The pre-trained model is in TensorFlow format. To use it with OpenVINO, convert it to OpenVINO IR format with model conversion API. For more @@ -105,8 +116,10 @@ This step is also skipped if the model is already converted. model/public/ir/frozen_model.xml already exists. -Load the model -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Load the model +~~~~~~~~~~~~~~ + + Now, load the OpenVINO IR model and perform as follows: @@ -155,8 +168,10 @@ Only a few lines of code are required to run the model: input_layer = compiled_model.input(0) output_layer = compiled_model.output(0) -Determine the input shapes of the model -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Determine the input shapes of the model +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + Note that both input shapes are the same. However, the second input has 1 channel (monotone). @@ -165,8 +180,10 @@ Note that both input shapes are the same. However, the second input has N, H, W, C = input_layer.shape -Create a square mask -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Create a square mask +~~~~~~~~~~~~~~~~~~~~ + + Next, create a single channeled mask that will be laid on top of the original image. @@ -208,8 +225,10 @@ original image. .. image:: 215-image-inpainting-with-output_files/215-image-inpainting-with-output_15_0.png -Load and Resize the Image -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Load and Resize the Image +~~~~~~~~~~~~~~~~~~~~~~~~~ + + This image will be altered by using the mask. You can process any image you like. Just change the URL below. @@ -220,7 +239,7 @@ you like. Just change the URL below. if not img_path.exists(): # Download an image. - url = "https://www.intel.com/content/dam/www/central-libraries/us/en/images/arc-home-hero-128.png.rendition.intel.web.480.360.png" + url = "https://user-images.githubusercontent.com/29454499/281372079-fa8d84c4-8bf9-4a82-a1b9-5a74ad42ce47.png" image_file = utils.download_file( url, filename="laptop.png", directory="data", show_progress=False, silent=True, timeout=30 ) @@ -237,8 +256,10 @@ you like. Just change the URL below. .. image:: 215-image-inpainting-with-output_files/215-image-inpainting-with-output_17_0.png -Generating the Masked Image -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Generating the Masked Image +~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + This multiplication of the image and the mask gives the result of the masked image layered on top of the original image. The ``masked_image`` @@ -256,8 +277,10 @@ will be the first input to the GMCNN model. .. image:: 215-image-inpainting-with-output_files/215-image-inpainting-with-output_19_0.png -Preprocessing -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Preprocessing +~~~~~~~~~~~~~ + + The model expects the input dimensions to be ``NHWC``. @@ -269,8 +292,10 @@ The model expects the input dimensions to be ``NHWC``. masked_image = masked_image[None, ...] mask = mask[None, ...] -Inference -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Inference +~~~~~~~~~ + + Do inference with the given masked image and the mask. Then, show the restored image. @@ -287,8 +312,10 @@ restored image. .. image:: 215-image-inpainting-with-output_files/215-image-inpainting-with-output_23_0.png -Save the Restored Image -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Save the Restored Image +~~~~~~~~~~~~~~~~~~~~~~~ + + Save the restored image to the data directory to download it. diff --git a/docs/notebooks/215-image-inpainting-with-output_files/215-image-inpainting-with-output_15_0.png b/docs/notebooks/215-image-inpainting-with-output_files/215-image-inpainting-with-output_15_0.png index 5e0d90c8fcab3e..9672f85850845f 100644 --- a/docs/notebooks/215-image-inpainting-with-output_files/215-image-inpainting-with-output_15_0.png +++ b/docs/notebooks/215-image-inpainting-with-output_files/215-image-inpainting-with-output_15_0.png @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:a46cd32b28865daa8ce534fd66b37766ddbb2c25806469b3681d33df0b671f18 -size 16155 +oid sha256:bcd410b6efb941d2c605b9700489d965f6556a77bd2b1b1c8e81b6611ae60e92 +size 16186 diff --git a/docs/notebooks/215-image-inpainting-with-output_files/215-image-inpainting-with-output_17_0.png b/docs/notebooks/215-image-inpainting-with-output_files/215-image-inpainting-with-output_17_0.png index 0ffd97041ffd00..630c40142d5d25 100644 --- a/docs/notebooks/215-image-inpainting-with-output_files/215-image-inpainting-with-output_17_0.png +++ b/docs/notebooks/215-image-inpainting-with-output_files/215-image-inpainting-with-output_17_0.png @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:9b69904bb953a31d2c89974a5ae14753aa290e0e197a76d8c160442ec995846b +oid sha256:1784e76f880bfc703c807391eb8d7960c99b2bd83230a53a09100ef0c587a263 size 544222 diff --git a/docs/notebooks/215-image-inpainting-with-output_files/215-image-inpainting-with-output_19_0.png b/docs/notebooks/215-image-inpainting-with-output_files/215-image-inpainting-with-output_19_0.png index c7242ebac0f83a..8762c383538b8d 100644 --- a/docs/notebooks/215-image-inpainting-with-output_files/215-image-inpainting-with-output_19_0.png +++ b/docs/notebooks/215-image-inpainting-with-output_files/215-image-inpainting-with-output_19_0.png @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:81b3eed7308c70674b16172ecf146179ee5e60b431deb8d8629d6766c7fb7e50 -size 493354 +oid sha256:47ecdc3f633406ae13fa0c2e006655c088c39fdeea5928d700362491533ce840 +size 492981 diff --git a/docs/notebooks/215-image-inpainting-with-output_files/215-image-inpainting-with-output_23_0.png b/docs/notebooks/215-image-inpainting-with-output_files/215-image-inpainting-with-output_23_0.png index d56b6882f789bb..886e64b2e43870 100644 --- a/docs/notebooks/215-image-inpainting-with-output_files/215-image-inpainting-with-output_23_0.png +++ b/docs/notebooks/215-image-inpainting-with-output_files/215-image-inpainting-with-output_23_0.png @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:6177d3e806c58c63147586c10aaa45312883fcb294b2f5ac64ace11c64d29673 -size 586544 +oid sha256:3d613aa493580e662df403d6269fc6ea72d87a05cd3b362db03c62f25fade2f5 +size 593391 diff --git a/docs/notebooks/215-image-inpainting-with-output_files/index.html b/docs/notebooks/215-image-inpainting-with-output_files/index.html index ea839b329370ad..72badb680668be 100644 --- a/docs/notebooks/215-image-inpainting-with-output_files/index.html +++ b/docs/notebooks/215-image-inpainting-with-output_files/index.html @@ -1,10 +1,10 @@ -Index of /projects/ov-notebook/0.1.0-latest/20231030220807/dist/rst_files/215-image-inpainting-with-output_files/ +Index of /projects/ov-notebook/0.1.0-latest/20231114220808/dist/rst_files/215-image-inpainting-with-output_files/ -

Index of /projects/ov-notebook/0.1.0-latest/20231030220807/dist/rst_files/215-image-inpainting-with-output_files/


../
-215-image-inpainting-with-output_15_0.png          31-Oct-2023 00:35               16155
-215-image-inpainting-with-output_17_0.png          31-Oct-2023 00:35              544222
-215-image-inpainting-with-output_19_0.png          31-Oct-2023 00:35              493354
-215-image-inpainting-with-output_23_0.png          31-Oct-2023 00:35              586544
+

Index of /projects/ov-notebook/0.1.0-latest/20231114220808/dist/rst_files/215-image-inpainting-with-output_files/


../
+215-image-inpainting-with-output_15_0.png          15-Nov-2023 00:43               16186
+215-image-inpainting-with-output_17_0.png          15-Nov-2023 00:43              544222
+215-image-inpainting-with-output_19_0.png          15-Nov-2023 00:43              492981
+215-image-inpainting-with-output_23_0.png          15-Nov-2023 00:43              593391
 

diff --git a/docs/notebooks/220-cross-lingual-books-alignment-with-output.rst b/docs/notebooks/220-cross-lingual-books-alignment-with-output.rst index 3bafb1d522c429..dbf2a1ba91d468 100644 --- a/docs/notebooks/220-cross-lingual-books-alignment-with-output.rst +++ b/docs/notebooks/220-cross-lingual-books-alignment-with-output.rst @@ -229,7 +229,7 @@ the last occurrence of these asterisks. Hint: There are text-cleaning libraries that clean up common flaws. If the source of the text is known, you can look for a library designed for that source, for example - ```gutenberg_cleaner`` `__. + `gutenberg_cleaner `__. These libraries can reduce manual work and even automate the process.process. diff --git a/docs/notebooks/223-text-prediction-with-output.rst b/docs/notebooks/223-text-prediction-with-output.rst new file mode 100644 index 00000000000000..53b9e25f31fbf5 --- /dev/null +++ b/docs/notebooks/223-text-prediction-with-output.rst @@ -0,0 +1,694 @@ +Text Prediction with OpenVINO™ +============================== + +This notebook shows text prediction with OpenVINO. This notebook can +work in two different modes, Text Generation and Conversation, which the +user can select via selecting the model in the Model Selection Section. +We use three models +`GPT-2 `__, +`GPT-Neo `__, and +`PersonaGPT `__, which are a part of +the Generative Pre-trained Transformer (GPT) family. GPT-2 and GPT-Neo +can be used for text generation, whereas PersonaGPT is trained for the +downstream task of conversation. + +GPT-2 and GPT-Neo are pre-trained on a large corpus of English text +using unsupervised training. They both display a broad set of +capabilities, including the ability to generate conditional synthetic +text samples of unprecedented quality, where we prime the model with an +input and have it generate a lengthy continuation. + +More details about the models are provided on their HuggingFace cards: + +- `GPT-2 `__ +- `GPT-Neo `__ + +PersonaGPT is an open-domain conversational agent that can decode +*personalized* and *controlled* responses based on user input. It is +built on the pretrained +`DialoGPT-medium `__ model, +following the `GPT-2 `__ architecture. +PersonaGPT is fine-tuned on the +`Persona-Chat `__ dataset. The model +is available from +`HuggingFace `__. PersonaGPT +displays a broad set of capabilities, including the ability to take on +personas, where we prime the model with few facts and have it generate +based upon that, it can also be used for creating a chatbot on a +knowledge base. + +The following image illustrates the complete demo pipeline used for text +generation: + +.. figure:: https://user-images.githubusercontent.com/91228207/163990722-d2713ede-921e-4594-8b00-8b5c1a4d73b5.jpeg + :alt: image2 + + image2 + +This is a demonstration in which the user can type the beginning of the +text and the network will generate a further. This procedure can be +repeated as many times as the user desires. + +For Text Generation, The model input is tokenized text, which serves as +the initial condition for text generation. Then, logits from the models’ +inference results are obtained, and the token with the highest +probability is selected using the top-k sampling strategy and joined to +the input sequence. This procedure repeats until the end of the sequence +token is received or the specified maximum length is reached. After +that, tokenized IDs are decoded to text. + +The following image illustrates the demo pipeline for conversation: + +.. figure:: https://user-images.githubusercontent.com/95569637/226101538-e204aebd-a34f-4c8b-b90c-5363ba41c080.jpeg + :alt: image2 + + image2 + +For Conversation, User Input is tokenized with ``eos_token`` +concatenated in the end. Then, the text gets generated as detailed +above. The Generated response is added to the history with the +``eos_token`` at the end. Additional user input is added to the history, +and the sequence is passed back into the model. + +**Table of contents:** + +- `Model Selection <#model-selection>`__ +- `Load Model <#load-model>`__ +- `Convert Pytorch Model to OpenVINO + IR <#convert-pytorch-model-to-openvino-ir>`__ + + - `Load the model <#load-the-model>`__ + + - `Select inference device <#select-inference-device>`__ + +- `Pre-Processing <#pre-processing>`__ +- `Define tokenization <#define-tokenization>`__ + + - `Define Softmax layer <#define-softmax-layer>`__ + - `Set the minimum sequence + length <#set-the-minimum-sequence-length>`__ + - `Top-K sampling <#top-k-sampling>`__ + - `Main Processing Function <#main-processing-function>`__ + +- `Inference with GPT-Neo/GPT-2 <#inference-with-gpt-neogpt->`__ +- `Conversation with PersonaGPT using + OpenVINO <#conversation-with-personagpt-using-openvino>`__ +- `Converse Function <#converse-function>`__ +- `Conversation Class <#conversation-class>`__ +- `Conversation with PersonaGPT <#conversation-with-personagpt>`__ + +Model Selection +--------------- + + + +Select the Model to be used for text generation, GPT-2 and GPT-Neo are +used for text generation whereas PersonaGPT is used for Conversation. + +.. code:: ipython3 + + %pip install -q "openvino>=2023.1.0" + %pip install -q gradio + %pip install -q --extra-index-url https://download.pytorch.org/whl/cpu transformers[torch] + + +.. parsed-literal:: + + DEPRECATION: pytorch-lightning 1.6.5 has a non-standard dependency specifier torch>=1.8.*. pip 24.0 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of pytorch-lightning or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063 + Note: you may need to restart the kernel to use updated packages. + DEPRECATION: pytorch-lightning 1.6.5 has a non-standard dependency specifier torch>=1.8.*. pip 24.0 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of pytorch-lightning or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063 + ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. + paddlepaddle 2.5.2 requires protobuf>=3.20.2; platform_system != "Windows", but you have protobuf 3.20.1 which is incompatible. + Note: you may need to restart the kernel to use updated packages. + DEPRECATION: pytorch-lightning 1.6.5 has a non-standard dependency specifier torch>=1.8.*. pip 24.0 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of pytorch-lightning or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063 + Note: you may need to restart the kernel to use updated packages. + + +.. code:: ipython3 + + import ipywidgets as widgets + + style = {'description_width': 'initial'} + model_name = widgets.Select( + options=['PersonaGPT (Converastional)', 'GPT-2', 'GPT-Neo'], + value='PersonaGPT (Converastional)', + description='Select Model:', + disabled=False + ) + + widgets.VBox([model_name]) + + + + +.. parsed-literal:: + + VBox(children=(Select(description='Select Model:', options=('PersonaGPT (Converastional)', 'GPT-2', 'GPT-Neo')… + + + +Load Model +---------- + + + +Download the Selected Model and Tokenizer from HuggingFace + +.. code:: ipython3 + + from transformers import GPTNeoForCausalLM, GPT2TokenizerFast, GPT2Tokenizer, GPT2LMHeadModel + + if model_name.value == "PersonaGPT (Converastional)": + pt_model = GPT2LMHeadModel.from_pretrained('af1tang/personaGPT') + tokenizer = GPT2Tokenizer.from_pretrained('af1tang/personaGPT') + elif model_name.value == 'GPT-2': + pt_model = GPT2LMHeadModel.from_pretrained('gpt2') + tokenizer = GPT2Tokenizer.from_pretrained('gpt2') + elif model_name.value == 'GPT-Neo': + pt_model = GPTNeoForCausalLM.from_pretrained('EleutherAI/gpt-neo-125M') + tokenizer = GPT2TokenizerFast.from_pretrained('EleutherAI/gpt-neo-125M') + +Convert Pytorch Model to OpenVINO IR +------------------------------------ + + + +For starting work with GPT-Neo model using OpenVINO, a model should be +converted to OpenVINO Intermediate Representation (IR) format. +HuggingFace provides a GPT-Neo model in PyTorch format, which is +supported in OpenVINO via Model Conversion API. The ``ov.convert_model`` +Python function of `model conversion +API `__ +can be used for converting the model. The function returns instance of +OpenVINO Model class, which is ready to use in Python interface. The +Model can also be save on device in OpenVINO IR format for future +execution using ``ov.save_model``. In our case dynamic input shapes with +a possible shape range (from 1 token to a maximum length defined in our +processing function) are specified for optimization of memory +consumption. + +.. code:: ipython3 + + from pathlib import Path + import torch + + import openvino as ov + + # define path for saving openvino model + model_path = Path("model/text_generator.xml") + + example_input = {"input_ids": torch.ones((1, 10), dtype=torch.long), "attention_mask": torch.ones((1, 10), dtype=torch.long)} + pt_model.config.torchscript = True + + # convert model to openvino + if model_name.value == "PersonaGPT (Converastional)": + ov_model = ov.convert_model(pt_model, example_input=example_input, input=[('input_ids', [1, -1], ov.Type.i64), ('attention_mask', [1,-1], ov.Type.i64)]) + else: + ov_model = ov.convert_model(pt_model, example_input=example_input, input=[('input_ids', [1, ov.Dimension(1,128)], ov.Type.i64), ('attention_mask', [1, ov.Dimension(1,128)], ov.Type.i64)]) + + # serialize openvino model + ov.save_model(ov_model, str(model_path)) + + +.. parsed-literal:: + + /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-545/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/transformers/models/gpt2/modeling_gpt2.py:801: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! + if batch_size <= 0: + + +Load the model +~~~~~~~~~~~~~~ + + + +We start by building an OpenVINO Core object. Then we read the network +architecture and model weights from the ``.xml`` and ``.bin`` files, +respectively. Finally, we compile the model for the desired device. + +Select inference device +^^^^^^^^^^^^^^^^^^^^^^^ + + + +select device from dropdown list for running inference using OpenVINO + +.. code:: ipython3 + + import ipywidgets as widgets + + # initialize openvino core + core = ov.Core() + + device = widgets.Dropdown( + options=core.available_devices + ["AUTO"], + value='AUTO', + description='Device:', + disabled=False, + ) + + device + + + + +.. parsed-literal:: + + Dropdown(description='Device:', index=1, options=('CPU', 'AUTO'), value='AUTO') + + + +.. code:: ipython3 + + # read the model and corresponding weights from file + model = core.read_model(model_path) + +.. code:: ipython3 + + # compile the model for CPU devices + compiled_model = core.compile_model(model=model, device_name=device.value) + + # get output tensors + output_key = compiled_model.output(0) + +Input keys are the names of the input nodes and output keys contain +names of the output nodes of the network. In the case of GPT-Neo, we +have ``batch size`` and ``sequence length`` as inputs and +``batch size``, ``sequence length`` and ``vocab size`` as outputs. + +Pre-Processing +-------------- + + + +NLP models often take a list of tokens as a standard input. A token is a +word or a part of a word mapped to an integer. To provide the proper +input, we use a vocabulary file to handle the mapping. So first let’s +load the vocabulary file. + +Define tokenization +------------------- + + + +.. code:: ipython3 + + from typing import List, Tuple + + + # this function converts text to tokens + def tokenize(text: str) -> Tuple[List[int], List[int]]: + """ + tokenize input text using GPT2 tokenizer + + Parameters: + text, str - input text + Returns: + input_ids - np.array with input token ids + attention_mask - np.array with 0 in place, where should be padding and 1 for places where original tokens are located, represents attention mask for model + """ + + inputs = tokenizer(text, return_tensors="np") + return inputs["input_ids"], inputs["attention_mask"] + +``eos_token`` is special token, which means that generation is finished. +We store the index of this token in order to use this index as padding +at later stage. + +.. code:: ipython3 + + eos_token_id = tokenizer.eos_token_id + eos_token = tokenizer.decode(eos_token_id) + + +.. parsed-literal:: + + 2023-11-14 23:32:14.663057: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`. + 2023-11-14 23:32:14.696431: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. + To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. + 2023-11-14 23:32:15.262361: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT + + +Define Softmax layer +~~~~~~~~~~~~~~~~~~~~ + + A softmax function is used to +convert top-k logits into a probability distribution. + +.. code:: ipython3 + + import numpy as np + + + def softmax(x : np.array) -> np.array: + e_x = np.exp(x - np.max(x, axis=-1, keepdims=True)) + summation = e_x.sum(axis=-1, keepdims=True) + return e_x / summation + +Set the minimum sequence length +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + + +If the minimum sequence length is not reached, the following code will +reduce the probability of the ``eos`` token occurring. This continues +the process of generating the next words. + +.. code:: ipython3 + + def process_logits(cur_length: int, scores: np.array, eos_token_id : int, min_length : int = 0) -> np.array: + """ + Reduce probability for padded indices. + + Parameters: + cur_length: Current length of input sequence. + scores: Model output logits. + eos_token_id: Index of end of string token in model vocab. + min_length: Minimum length for applying postprocessing. + + Returns: + Processed logits with reduced probability for padded indices. + """ + if cur_length < min_length: + scores[:, eos_token_id] = -float("inf") + return scores + +Top-K sampling +~~~~~~~~~~~~~~ + + + +In Top-K sampling, we filter the K most likely next words and +redistribute the probability mass among only those K next words. + +.. code:: ipython3 + + def get_top_k_logits(scores : np.array, top_k : int) -> np.array: + """ + Perform top-k sampling on the logits scores. + + Parameters: + scores: np.array, model output logits. + top_k: int, number of elements with the highest probability to select. + + Returns: + np.array, shape (batch_size, sequence_length, vocab_size), + filtered logits scores where only the top-k elements with the highest + probability are kept and the rest are replaced with -inf + """ + filter_value = -float("inf") + top_k = min(max(top_k, 1), scores.shape[-1]) + top_k_scores = -np.sort(-scores)[:, :top_k] + indices_to_remove = scores < np.min(top_k_scores) + filtred_scores = np.ma.array(scores, mask=indices_to_remove, + fill_value=filter_value).filled() + return filtred_scores + +Main Processing Function +~~~~~~~~~~~~~~~~~~~~~~~~ + + + +Generating the predicted sequence. + +.. code:: ipython3 + + def generate_sequence(input_ids : List[int], attention_mask : List[int], max_sequence_length : int = 128, + eos_token_id : int = eos_token_id, dynamic_shapes : bool = True) -> List[int]: + """ + Generates a sequence of tokens using a pre-trained language model. + + Parameters: + input_ids: np.array, tokenized input ids for model + attention_mask: np.array, attention mask for model + max_sequence_length: int, maximum sequence length for stopping iteration + eos_token_id: int, index of the end-of-sequence token in the model's vocabulary + dynamic_shapes: bool, whether to use dynamic shapes for inference or pad model input to max_sequence_length + + Returns: + np.array, the predicted sequence of token ids + """ + while True: + cur_input_len = len(input_ids[0]) + if not dynamic_shapes: + pad_len = max_sequence_length - cur_input_len + model_input_ids = np.concatenate((input_ids, [[eos_token_id] * pad_len]), axis=-1) + model_input_attention_mask = np.concatenate((attention_mask, [[0] * pad_len]), axis=-1) + else: + model_input_ids = input_ids + model_input_attention_mask = attention_mask + outputs = compiled_model({"input_ids": model_input_ids, "attention_mask": model_input_attention_mask})[output_key] + next_token_logits = outputs[:, cur_input_len - 1, :] + # pre-process distribution + next_token_scores = process_logits(cur_input_len, + next_token_logits, eos_token_id) + top_k = 20 + next_token_scores = get_top_k_logits(next_token_scores, top_k) + # get next token id + probs = softmax(next_token_scores) + next_tokens = np.random.choice(probs.shape[-1], 1, + p=probs[0], replace=True) + # break the loop if max length or end of text token is reached + if cur_input_len == max_sequence_length or next_tokens[0] == eos_token_id: + break + else: + input_ids = np.concatenate((input_ids, [next_tokens]), axis=-1) + attention_mask = np.concatenate((attention_mask, [[1] * len(next_tokens)]), axis=-1) + return input_ids + +Inference with GPT-Neo/GPT-2 +---------------------------- + + + +The ``text`` variable below is the input used to generate a predicted +sequence. + +.. code:: ipython3 + + import time + + if not model_name.value == "PersonaGPT (Converastional)": + text = "Deep learning is a type of machine learning that uses neural networks" + input_ids, attention_mask = tokenize(text) + + start = time.perf_counter() + output_ids = generate_sequence(input_ids, attention_mask) + end = time.perf_counter() + output_text = " " + # Convert IDs to words and make the sentence from it + for i in output_ids[0]: + output_text += tokenizer.batch_decode([i])[0] + print(f"Generation took {end - start:.3f} s") + print(f"Input Text: {text}") + print() + print(f"{model_name.value}: {output_text}") + else: + print("Selected Model is PersonaGPT. Please select GPT-Neo or GPT-2 in the first cell to generate text sequences") + + +.. parsed-literal:: + + Selected Model is PersonaGPT. Please select GPT-Neo or GPT-2 in the first cell to generate text sequences + + +Conversation with PersonaGPT using OpenVINO +=========================================== + + + +User Input is tokenized with ``eos_token`` concatenated in the end. +Model input is tokenized text, which serves as initial condition for +generation, then logits from model inference result should be obtained +and token with the highest probability is selected using top-k sampling +strategy and joined to input sequence. The procedure repeats until end +of sequence token will be received or specified maximum length is +reached. After that, decoding token ids to text using tokenized should +be applied. + +The Generated response is added to the history with the ``eos_token`` at +the end. Further User Input is added to it and again passed into the +model. + +Converse Function +----------------- + + + +Wrapper on generate sequence function to support conversation + +.. code:: ipython3 + + def converse(input: str, history: List[int], eos_token: str = eos_token, + eos_token_id: int = eos_token_id) -> Tuple[str, List[int]]: + """ + Converse with the Model. + + Parameters: + input: Text input given by the User + history: Chat History, ids of tokens of chat occured so far + eos_token: end of sequence string + eos_token_id: end of sequence index from vocab + Returns: + response: Text Response generated by the model + history: Chat History, Ids of the tokens of chat occured so far,including the tokens of generated response + """ + + # Get Input Ids of the User Input + new_user_input_ids, _ = tokenize(input + eos_token) + + # append the new user input tokens to the chat history, if history exists + if len(history) == 0: + bot_input_ids = new_user_input_ids + else: + bot_input_ids = np.concatenate([history, new_user_input_ids[0]]) + bot_input_ids = np.expand_dims(bot_input_ids, axis=0) + + # Create Attention Mask + bot_attention_mask = np.ones_like(bot_input_ids) + + # Generate Response from the model + history = generate_sequence(bot_input_ids, bot_attention_mask, max_sequence_length=1000) + + # Add the eos_token to mark end of sequence + history = np.append(history[0], eos_token_id) + + # convert the tokens to text, and then split the responses into lines and retrieve the response from the Model + response = ''.join(tokenizer.batch_decode(history)).split(eos_token)[-2] + return response, history + +Conversation Class +------------------ + + + +.. code:: ipython3 + + class Conversation: + def __init__(self): + # Initialize Empty History + self.history = [] + self.messages = [] + + def chat(self, input_text): + """ + Wrapper Over Converse Function. + Parameters: + input_text: Text input given by the User + Returns: + response: Text Response generated by the model + """ + response, self.history = converse(input_text, self.history) + self.messages.append(f"Person: {input_text}") + self.messages.append(f"PersonaGPT: {response}") + return response + +Conversation with PersonaGPT +---------------------------- + + + +This notebook provides two styles of inference, Plain and Interactive. +The style of inference can be selected in the next cell. + +.. code:: ipython3 + + style = {'description_width': 'initial'} + interactive_mode = widgets.Select( + options=['Plain', 'Interactive'], + value='Plain', + description='Inference Style:', + disabled=False + ) + + widgets.VBox([interactive_mode]) + + + + +.. parsed-literal:: + + VBox(children=(Select(description='Inference Style:', options=('Plain', 'Interactive'), value='Plain'),)) + + + +.. code:: ipython3 + + import gradio as gr + + if model_name.value == "PersonaGPT (Converastional)": + if interactive_mode.value == 'Plain': + conversation = Conversation() + user_prompt = None + pre_written_prompts = ["Hi,How are you?", "What are you doing?", "I like to dance,do you?", "Can you recommend me some books?"] + # Number of responses generated by model + n_prompts = 10 + for i in range(n_prompts): + # Uncomment for taking User Input + # user_prompt = input() + if not user_prompt: + user_prompt = pre_written_prompts[i % len(pre_written_prompts)] + conversation.chat(user_prompt) + print(conversation.messages[-2]) + print(conversation.messages[-1]) + user_prompt = None + else: + def add_text(history, text): + history = history + [(text, None)] + return history, "" + + conversation = Conversation() + + def bot(history): + conversation.chat(history[-1][0]) + response = conversation.messages[-1] + history[-1][1] = response + return history + + with gr.Blocks() as demo: + chatbot = gr.Chatbot([], elem_id="chatbot") + + with gr.Row(): + with gr.Column(): + txt = gr.Textbox( + show_label=False, + placeholder="Enter text and press enter, or upload an image", + container=False + ) + + txt.submit(add_text, [chatbot, txt], [chatbot, txt]).then( + bot, chatbot, chatbot + ) + try: + demo.launch(debug=False) + except Exception: + demo.launch(debug=False, share=True) + # if you are launching remotely, specify server_name and server_port + # demo.launch(server_name='your server name', server_port='server port in int') + # Read more in the docs: https://gradio.app/docs/ + else: + print("Selected Model is not PersonaGPT, Please select PersonaGPT in the first cell to have a conversation") + + +.. parsed-literal:: + + Person: Hi,How are you? + PersonaGPT: a bit tired, since i'm off at the weekend. i hope you are well + Person: What are you doing? + PersonaGPT: i'm taking a break from playing my xbox. how about you? + Person: I like to dance,do you? + PersonaGPT: i've danced, do you play any games? + Person: Can you recommend me some books? + PersonaGPT: probably not, do you like movies or television? + Person: Hi,How are you? + PersonaGPT: doing very well, thank you for asking. what do you do for a living? + Person: What are you doing? + PersonaGPT: i'm a stay at home mom. + Person: I like to dance,do you? + PersonaGPT: i dance, but not as a job. i play video games sometimes + Person: Can you recommend me some books? + PersonaGPT: maybe you can try playing warcraft, but i don't think i would like it + Person: Hi,How are you? + PersonaGPT: i'm fine, thanks for asking + Person: What are you doing? + PersonaGPT: i'm relaxing at home since i'm off at work + diff --git a/docs/notebooks/225-stable-diffusion-text-to-image-with-output.rst b/docs/notebooks/225-stable-diffusion-text-to-image-with-output.rst index fcf724f74884b9..4a37b453ad89e2 100644 --- a/docs/notebooks/225-stable-diffusion-text-to-image-with-output.rst +++ b/docs/notebooks/225-stable-diffusion-text-to-image-with-output.rst @@ -40,10 +40,10 @@ Notebook contains the following steps: **Table of contents:** - - `Prerequisites <#prerequisites>`__ -- `Create PyTorch Models - pipeline <#create-pytorch-models-pipeline>`__ +- `login to huggingfacehub to get access to pretrained + model <#login-to-huggingfacehub-to-get-access-to-pretrained-model>`__ +- `Create PyTorch Models pipeline <#create-pytorch-models-pipeline>`__ - `Convert models to OpenVINO Intermediate representation (IR) format <#convert-models-to-openvino-intermediate-representation-ir-format>`__ @@ -52,16 +52,17 @@ Notebook contains the following steps: - `VAE <#vae>`__ - `Prepare Inference Pipeline <#prepare-inference-pipeline>`__ -- `Configure Inference - Pipeline <#configure-inference-pipeline>`__ +- `Configure Inference Pipeline <#configure-inference-pipeline>`__ - `Text-to-Image generation <#text-to-image-generation>`__ - `Image-to-Image generation <#image-to-image-generation>`__ - `Interactive demo <#interactive-demo>`__ -Prerequisites -------------------------------------------------------- +Prerequisites +------------- + + **The following is needed only if you want to use the original model. If not, you do not have to do anything. Just run the notebook.** @@ -79,9 +80,10 @@ not, you do not have to do anything. Just run the notebook.** following code: .. code:: python - + :force: ## login to huggingfacehub to get access to pretrained model + from huggingface_hub import notebook_login, whoami try: @@ -105,8 +107,10 @@ solutions based on Stable Diffusion. %pip install -q gradio %pip install -q transformers -Create PyTorch Models pipeline ------------------------------------------------------------------------- +Create PyTorch Models pipeline +------------------------------ + + ``StableDiffusionPipeline`` is an end-to-end inference pipeline that you can use to generate images from text with just a few lines of code. @@ -261,8 +265,10 @@ First, load the pre-trained weights of all components of the model. -Convert models to OpenVINO Intermediate representation (IR) format ------------------------------------------------------------------------------------------------------------- +Convert models to OpenVINO Intermediate representation (IR) format +------------------------------------------------------------------ + + Staring from 2023.0 release, OpenVINO supports direct conversion PyTorch models to OpenVINO IR format. You need to provide a model object and @@ -283,8 +289,10 @@ The model consists of three important parts: Let us convert each part. -Text Encoder -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Text Encoder +~~~~~~~~~~~~ + + The text-encoder is responsible for transforming the input prompt, for example, “a photo of an astronaut riding a horse” into an embedding @@ -379,8 +387,10 @@ hidden states. -U-net -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +U-net +~~~~~ + + Unet model has three inputs: @@ -470,8 +480,10 @@ Model predicts the ``sample`` state for the next step. -VAE -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +VAE +~~~ + + The VAE model has two parts, an encoder and a decoder. The encoder is used to convert the image into a low dimensional latent representation, @@ -599,8 +611,10 @@ of the pipeline, it will be better to convert them to separate models. -Prepare Inference Pipeline --------------------------------------------------------------------- +Prepare Inference Pipeline +-------------------------- + + Putting it all together, let us now take a closer look at how the model works in inference by illustrating the logical flow. @@ -1001,8 +1015,10 @@ of the variational auto encoder. return timesteps, num_inference_steps - t_start -Configure Inference Pipeline ----------------------------------------------------------------------- +Configure Inference Pipeline +---------------------------- + + First, you should create instances of OpenVINO Model. @@ -1074,8 +1090,10 @@ Let us define them and put all components together scheduler=lms ) -Text-to-Image generation -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Text-to-Image generation +~~~~~~~~~~~~~~~~~~~~~~~~ + + Now, you can define a text prompt for image generation and run inference pipeline. Optionally, you can also change the random generator seed for @@ -1176,8 +1194,10 @@ Now is show time! Nice. As you can see, the picture has quite a high definition 🔥. -Image-to-Image generation -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Image-to-Image generation +~~~~~~~~~~~~~~~~~~~~~~~~~ + + Image-to-Image generation, additionally to text prompt, requires providing initial image. Optionally, you can also change ``strength`` @@ -1287,8 +1307,10 @@ semantically consistent with the input. .. image:: 225-stable-diffusion-text-to-image-with-output_files/225-stable-diffusion-text-to-image-with-output_40_1.png -Interactive demo ----------------------------------------------------------- +Interactive demo +---------------- + + .. code:: ipython3 @@ -1355,5 +1377,5 @@ Interactive demo .. .. raw:: html -..
+..
diff --git a/docs/notebooks/225-stable-diffusion-text-to-image-with-output_files/index.html b/docs/notebooks/225-stable-diffusion-text-to-image-with-output_files/index.html index 49f3a05b4767c0..ae58c34fee8c09 100644 --- a/docs/notebooks/225-stable-diffusion-text-to-image-with-output_files/index.html +++ b/docs/notebooks/225-stable-diffusion-text-to-image-with-output_files/index.html @@ -1,9 +1,9 @@ -Index of /projects/ov-notebook/0.1.0-latest/20231030220807/dist/rst_files/225-stable-diffusion-text-to-image-with-output_files/ +Index of /projects/ov-notebook/0.1.0-latest/20231114220808/dist/rst_files/225-stable-diffusion-text-to-image-with-output_files/ -

Index of /projects/ov-notebook/0.1.0-latest/20231030220807/dist/rst_files/225-stable-diffusion-text-to-image-with-output_files/


../
-225-stable-diffusion-text-to-image-with-output_..> 31-Oct-2023 00:35              372482
-225-stable-diffusion-text-to-image-with-output_..> 31-Oct-2023 00:35              928958
-225-stable-diffusion-text-to-image-with-output_..> 31-Oct-2023 00:35              726871
+

Index of /projects/ov-notebook/0.1.0-latest/20231114220808/dist/rst_files/225-stable-diffusion-text-to-image-with-output_files/


../
+225-stable-diffusion-text-to-image-with-output_..> 15-Nov-2023 00:43              372482
+225-stable-diffusion-text-to-image-with-output_..> 15-Nov-2023 00:43              928958
+225-stable-diffusion-text-to-image-with-output_..> 15-Nov-2023 00:43              726871
 

diff --git a/docs/notebooks/227-whisper-convert-with-output.rst b/docs/notebooks/227-whisper-convert-with-output.rst index a8288655d9a3f7..d420c4b6be06fc 100644 --- a/docs/notebooks/227-whisper-convert-with-output.rst +++ b/docs/notebooks/227-whisper-convert-with-output.rst @@ -26,7 +26,6 @@ Whisper pipeline with OpenVINO models. **Table of contents:** - - `Prerequisites <#prerequisites>`__ - `Instantiate model <#instantiate-model>`__ @@ -45,21 +44,25 @@ Whisper pipeline with OpenVINO models. pipeline <#run-video-transcription-pipeline>`__ - `Interactive demo <#interactive-demo>`__ -Prerequisites -------------------------------------------------------- +Prerequisites +------------- + + Install dependencies. .. code:: ipython3 %pip install -q "openvino>=2023.1.0" - %pip install -q "python-ffmpeg<=1.0.16" moviepy transformers onnx + %pip install -q "python-ffmpeg<=1.0.16" moviepy transformers %pip install -q -I "git+https://github.com/garywu007/pytube.git" %pip install -q -U gradio - %pip install -q -I "git+https://github.com/openai/whisper.git@e8622f9afc4eba139bf796c210f5c01081000472" + %pip install -q -I "git+https://github.com/openai/whisper.git@fcfeaf1b61994c071bba62da47d7846933576ac9" + +Instantiate model +----------------- + -Instantiate model ------------------------------------------------------------ Whisper is a Transformer based encoder-decoder model, also referred to as a sequence-to-sequence model. It maps a sequence of audio spectrogram @@ -82,18 +85,41 @@ the authors of the model. In this tutorial, we will use the ``base`` model, but the same actions are also applicable to other models from Whisper family. +.. code:: ipython3 + + from whisper import _MODELS + import ipywidgets as widgets + + model_id = widgets.Dropdown( + options=list(_MODELS), + value='large-v2', + description='Model:', + disabled=False, + ) + + model_id + + + + +.. parsed-literal:: + + Dropdown(description='Model:', index=9, options=('tiny.en', 'tiny', 'base.en', 'base', 'small.en', 'small', 'm… + + + .. code:: ipython3 import whisper - model_id = "base" - model = whisper.load_model("base") - model.to("cpu") + model = whisper.load_model(model_id.value, "cpu") model.eval() pass -Convert model to OpenVINO Intermediate Representation (IR) format. -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Convert model to OpenVINO Intermediate Representation (IR) format. +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + For best results with OpenVINO, it is recommended to convert the model to OpenVINO IR format. We need to provide initialized model object and @@ -103,35 +129,33 @@ function returns an OpenVINO model ready to load on device and start making predictions. We can save it on disk for next usage with ``ov.save_model``. -Convert Whisper Encoder to OpenVINO IR -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Convert Whisper Encoder to OpenVINO IR +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + .. code:: ipython3 from pathlib import Path - WHISPER_ENCODER_OV = Path("whisper_encoder.xml") - WHISPER_DECODER_OV = Path("whisper_decoder.xml") + WHISPER_ENCODER_OV = Path(f"whisper_{model_id.value}_encoder.xml") + WHISPER_DECODER_OV = Path(f"whisper_{model_id.value}_decoder.xml") .. code:: ipython3 import torch import openvino as ov - mel = torch.zeros((1, 80, 3000)) + mel = torch.zeros((1, 80 if 'v3' not in model_id.value else 128, 3000)) audio_features = model.encoder(mel) - encoder_model = ov.convert_model(model.encoder, example_input=mel) - ov.save_model(encoder_model, WHISPER_ENCODER_OV) - - -.. parsed-literal:: + if not WHISPER_ENCODER_OV.exists(): + encoder_model = ov.convert_model(model.encoder, example_input=mel) + ov.save_model(encoder_model, WHISPER_ENCODER_OV) - /home/ea/work/ov_venv/lib/python3.8/site-packages/whisper/model.py:166: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! - assert x.shape[1:] == self.positional_embedding.shape, "incorrect audio shape" +Convert Whisper decoder to OpenVINO IR +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Convert Whisper decoder to OpenVINO IR -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ To reduce computational complexity, the decoder uses cached key/value projections in attention modules from the previous steps. We need to @@ -265,16 +289,10 @@ modify this process for correct tracing. logits, kv_cache = model.decoder(tokens, audio_features, kv_cache=None) tokens = torch.ones((5, 1), dtype=torch.int64) - decoder_model = ov.convert_model(model.decoder, example_input=(tokens, audio_features, kv_cache)) - ov.save_model(decoder_model, WHISPER_DECODER_OV) - - -.. parsed-literal:: - - /home/ea/work/ov_venv/lib/python3.8/site-packages/torch/jit/_trace.py:154: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at aten/src/ATen/core/TensorBody.h:486.) - if a.grad is not None: - + if not WHISPER_DECODER_OV.exists(): + decoder_model = ov.convert_model(model.decoder, example_input=(tokens, audio_features, kv_cache)) + ov.save_model(decoder_model, WHISPER_DECODER_OV) The decoder model autoregressively predicts the next token guided by encoder hidden states and previously predicted sequence. This means that @@ -283,8 +301,10 @@ tokens and attention hidden states from previous step) are dynamic. For efficient utilization of memory, you define an upper bound for dynamic input shapes. -Prepare inference pipeline --------------------------------------------------------------------- +Prepare inference pipeline +-------------------------- + + The image below illustrates the pipeline of video transcribing using the Whisper model. @@ -338,8 +358,10 @@ select device from dropdown list for running inference using OpenVINO model.encoder = OpenVINOAudioEncoder(core, WHISPER_ENCODER_OV, device=device.value) model.decoder = OpenVINOTextDecoder(core, WHISPER_DECODER_OV, device=device.value) -Run video transcription pipeline --------------------------------------------------------------------------- +Run video transcription pipeline +-------------------------------- + + Now, we are ready to start transcription. We select a video from YouTube that we want to transcribe. Be patient, as downloading the video may @@ -389,7 +411,7 @@ take some time. from utils import get_audio - audio = get_audio(output_file) + audio, duration = get_audio(output_file) Select the task for the model: @@ -431,7 +453,7 @@ into video files using ``ffmpeg``. from utils import prepare_srt - srt_lines = prepare_srt(transcription) + srt_lines = prepare_srt(transcription, filter_duration=duration) # save transcription with output_file.with_suffix(".srt").open("w") as f: f.writelines(srt_lines) @@ -464,45 +486,39 @@ Now let us see the results. 2 00:00:05,000 --> 00:00:07,000 - Oh wow. + Wow. 3 - 00:00:07,000 --> 00:00:09,000 - Excuse me. + 00:00:07,000 --> 00:00:10,000 + Hello, humans. 4 - 00:00:09,000 --> 00:00:11,000 - Hello humans. - - 5 - 00:00:13,000 --> 00:00:15,000 + 00:00:10,000 --> 00:00:15,000 Focus on me. - 6 - 00:00:15,000 --> 00:00:17,000 + 5 + 00:00:15,000 --> 00:00:16,000 Focus on the guard. - 7 - 00:00:17,000 --> 00:00:20,000 + 6 + 00:00:16,000 --> 00:00:20,000 Don't tell anyone what you've seen in here. - 8 - 00:00:22,000 --> 00:00:24,000 + 7 + 00:00:20,000 --> 00:00:24,000 Have you seen what's in there? - 9 - 00:00:24,000 --> 00:00:25,000 - They have. - - 10 - 00:00:25,000 --> 00:00:27,000 + 8 + 00:00:24,000 --> 00:00:30,000 Intel. This is where it all changes. -Interactive demo ----------------------------------------------------------- +Interactive demo +---------------- + + .. code:: ipython3 @@ -513,9 +529,9 @@ Interactive demo output_file = Path("downloaded_video.mp4") yt = YouTube(url) yt.streams.get_highest_resolution().download(filename=output_file) - audio = get_audio(output_file) + audio, duration = get_audio(output_file) transcription = model.transcribe(audio, task=task.lower()) - srt_lines = prepare_srt(transcription) + srt_lines = prepare_srt(transcription, duration) with output_file.with_suffix(".srt").open("w") as f: f.writelines(srt_lines) return [str(output_file), str(output_file.with_suffix(".srt"))] @@ -535,3 +551,22 @@ Interactive demo # if you are launching remotely, specify server_name and server_port # demo.launch(server_name='your server name', server_port='server port in int') # Read more in the docs: https://gradio.app/docs/ + + +.. parsed-literal:: + + Running on local URL: http://127.0.0.1:7862 + + To create a public link, set `share=True` in `launch()`. + + + +.. .. raw:: html + +..
+ + +.. parsed-literal:: + + Keyboard interruption in main thread... closing server. + diff --git a/docs/notebooks/227-whisper-nncf-quantize-with-output.rst b/docs/notebooks/227-whisper-nncf-quantize-with-output.rst index b98bf0d3c4b9c7..26b1eb42feddcd 100644 --- a/docs/notebooks/227-whisper-nncf-quantize-with-output.rst +++ b/docs/notebooks/227-whisper-nncf-quantize-with-output.rst @@ -8,7 +8,7 @@ Compression Framework) and infer quantized model via OpenVINO™ Toolkit. The optimization process contains the following steps: 1. Quantize the converted OpenVINO model from `227-whisper-convert - notebook <227-whisper-convert.ipynb>`__ with NNCF. + notebook <227-whisper-convert-with-output-with-output.html>`__ with NNCF. 2. Check model result for the demo video. 3. Compare model size, performance and accuracy of FP32 and quantized INT8 models. @@ -16,20 +16,27 @@ The optimization process contains the following steps: .. **NOTE**: you should run - `227-whisper-convert <227-whisper-convert.ipynb>`__ notebook first to + `227-whisper-convert <227-whisper-convert-with-output.html>`__ notebook first to generate OpenVINO IR model that is used for quantization. **Table of contents:** -- `Prerequisites <#prerequisites>`__ -- `Create and initialize quantization <#create-and-initialize-quantization>`__ -- `Prepare calibration datasets <#prepare-calibration-datasets>`__ -- `Quantize Whisper encoder and decoder models <#quantize-whisper-encoder-and-decoder-models>`__ -- `Transcribe video with quantized OpenVINO model <#transcribe-video-with-quantized-openvino-model>`__ -- `Compare performance and accuracy of the FP32 and INT8 IRs <#compare-performance-and-accuracy-of-the-fp-and-int-irs>`__ +- `Prerequisites <#prerequisites>`__ +- `Create and initialize quantization <#create-and-initialize-quantization>`__ + + - `Prepare calibration datasets <#prepare-calibration-datasets>`__ + - `Quantize Whisper encoder and decoder + models <#quantize-whisper-encoder-and-decoder-models>`__ + +- `Transcribe video with quantized OpenVINO + model <#transcribe-video-with-quantized-openvino-model>`__ +- `Compare performance and accuracy of the FP32 and INT8 + IRs <#compare-performance-and-accuracy-of-the-fp-and-int-irs>`__ + +Prerequisites +------------- + -Prerequisites -------------------------------------------------------- Install dependencies. @@ -40,6 +47,40 @@ Install dependencies. %pip install -q datasets librosa soundfile %pip install -q evaluate jiwer +Select model for quantization + +.. code:: ipython3 + + from pathlib import Path + import ipywidgets as widgets + + def get_model_id(model_path): + return model_path.name.replace("whisper_", "").replace("encoder.xml", "").replace("_", "") + + model_list = [get_model_id(model_path) for model_path in Path('.').glob("whisper_*encoder.xml")] + model_list = [model_name for model_name in model_list if model_name] + + if not model_list: + raise RuntimeError("Please run conversion notebook first") + + model_id = widgets.Dropdown( + options=model_list, + value=model_list[0], + description='Model:', + disabled=False, + ) + + model_id + + + + +.. parsed-literal:: + + Dropdown(description='Model:', options=('large-v2', 'large-v3'), value='large-v2') + + + Select device from dropdown list for running inference using OpenVINO. .. code:: ipython3 @@ -63,7 +104,7 @@ Select device from dropdown list for running inference using OpenVINO. .. parsed-literal:: - Dropdown(description='Device:', index=4, options=('CPU', 'GPU.0', 'GPU.1', 'GPU.2', 'AUTO'), value='AUTO') + Dropdown(description='Device:', index=2, options=('CPU', 'GPU', 'AUTO'), value='AUTO') @@ -94,7 +135,7 @@ Select the task for the model: Create and initialize quantization ------------------------------------------------- +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ `NNCF `__ enables post-training quantization by adding the quantization layers into the @@ -112,18 +153,18 @@ The optimization process contains the following steps: function. Set paths to the model converted in -`227-whisper-convert <227-whisper-convert.ipynb>`__ notebook and the +`227-whisper-convert <227-whisper-convert-with-output.html>`__ notebook and the paths where quantized models will be saved. .. code:: ipython3 from pathlib import Path - WHISPER_ENCODER_OV = Path("whisper_encoder.xml") - WHISPER_DECODER_OV = Path("whisper_decoder.xml") + WHISPER_ENCODER_OV = Path(f"whisper_{model_id.value}_encoder.xml") + WHISPER_DECODER_OV = Path(f"whisper_{model_id.value}_decoder.xml") - WHISPER_ENCODER_OV_INT8 = Path("whisper_encoder_int8.xml") - WHISPER_DECODER_OV_INT8 = Path("whisper_decoder_int8.xml") + WHISPER_ENCODER_OV_INT8 = Path(f"whisper_{model_id.value}_encoder_int8.xml") + WHISPER_DECODER_OV_INT8 = Path(f"whisper_{model_id.value}_decoder_int8.xml") Load FP32 model IR. @@ -132,15 +173,16 @@ Load FP32 model IR. import whisper from utils import patch_whisper_for_ov_inference, OpenVINOAudioEncoder, OpenVINOTextDecoder - model_id = "base" - model_fp32 = whisper.load_model(model_id).to("cpu").eval() + model_fp32 = whisper.load_model(model_id.value, "cpu").eval() patch_whisper_for_ov_inference(model_fp32) model_fp32.encoder = OpenVINOAudioEncoder(core, WHISPER_ENCODER_OV, device=device.value) model_fp32.decoder = OpenVINOTextDecoder(core, WHISPER_DECODER_OV, device=device.value) -Prepare calibration datasets -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Prepare calibration datasets +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + Whisper consists of an encoder and a decoder models. We need to collect calibration data for both of them. @@ -210,8 +252,10 @@ dataset from Hugging Face as calibration data. Collecting calibration data: 0%| | 0/30 [00:00`__ notebook. +`227-whisper-convert <227-whisper-convert-with-output.html>`__ notebook. .. code:: ipython3 @@ -361,7 +402,7 @@ Select a video for transcription as in from utils import get_audio - audio = get_audio(output_file) + audio, duration = get_audio(output_file) Run transcription by the quantized model. @@ -373,7 +414,7 @@ Run transcription by the quantized model. from utils import prepare_srt - srt_lines = prepare_srt(transcription) + srt_lines = prepare_srt(transcription, duration) # save transcription with output_file.with_suffix(".srt").open("w") as f: f.writelines(srt_lines) @@ -389,7 +430,7 @@ Now let us see the results. .. parsed-literal:: - Video(value=b'\x00\x00\x00\x18ftypmp42\x00\x00\x00\x00isommp42\x00\x00Aimoov\x00\x00\x00lmvhd...', height='800… + Video(value=b"\x00\x00\x00\x18ftypmp42\x00\x00\x00\x00isommp42\x00\x00:'moov\x00\x00\x00lmvhd...", height='800… @@ -401,44 +442,50 @@ Now let us see the results. .. parsed-literal:: 1 - 00:00:00,000 --> 00:00:07,000 - What's that? Oh, wow. + 00:00:00,000 --> 00:00:05,000 + What's that? 2 - 00:00:09,000 --> 00:00:11,000 - Hello humans. + 00:00:05,000 --> 00:00:07,000 + Oh, wow. 3 - 00:00:14,000 --> 00:00:15,000 - Focus on me. + 00:00:09,000 --> 00:00:11,000 + Hello, humans. 4 - 00:00:15,000 --> 00:00:16,000 - Focus on the guard. + 00:00:13,000 --> 00:00:15,000 + Focus on me. 5 - 00:00:18,000 --> 00:00:20,000 - Don't tell anyone what you've seen in here. + 00:00:15,000 --> 00:00:17,000 + Focus on the guard. 6 + 00:00:17,000 --> 00:00:20,000 + Don't tell anyone what you see in here. + + 7 00:00:22,000 --> 00:00:24,000 Have you seen what's in there? - 7 + 8 00:00:24,000 --> 00:00:25,000 - They have intel. + They have... - 8 + 9 00:00:25,000 --> 00:00:27,000 - This is where it all changes. + Intel. This is where it all changes. As you can see the result is almost the same. -Compare performance and accuracy of the FP32 and INT8 IRs ---------------------------------------------------------------------------------------------------- +Compare performance and accuracy of the FP32 and INT8 IRs +--------------------------------------------------------- + + Compare model file size. @@ -458,14 +505,14 @@ Compare model file size. .. parsed-literal:: - Model: whisper_encoder - * FP32 IR model size: 40216.07 KB - * INT8 IR model size: 21092.37 KB - * Model compression rate: 1.907 - Model: whisper_decoder - * FP32 IR model size: 101961.09 KB - * INT8 IR model size: 78058.77 KB - * Model compression rate: 1.306 + Model: whisper_large-v2_encoder + * FP32 IR model size: 1244080.07 KB + * INT8 IR model size: 626971.58 KB + * Model compression rate: 1.984 + Model: whisper_large-v2_decoder + * FP32 IR model size: 1900607.09 KB + * INT8 IR model size: 955679.81 KB + * Model compression rate: 1.989 To measure the inference performance of the ``FP32`` and ``INT8`` @@ -516,7 +563,7 @@ quantized models. .. parsed-literal:: - Encoder performance speedup: 1.325 + Encoder performance speedup: 1.763 @@ -533,7 +580,7 @@ quantized models. .. parsed-literal:: - Decoder performance speedup: 1.609 + Decoder performance speedup: 2.022 We measure the whole transcription performance separately, because a @@ -589,12 +636,22 @@ accuracy as ``(1 - WER)``. print(f"Whisper transcription word accuracy. FP32: {accuracy_fp32:.2f}%. INT8: {accuracy_int8:.2f}%. Accuracy drop :{accuracy_fp32 - accuracy_int8:.2f}%.") +.. parsed-literal:: + + Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. + + .. parsed-literal:: Measuring performance and accuracy: 0%| | 0/100 [00:00`__ +`228-clip-zero-shot-quantize <228-clip-zero-shot-quantize-with-output.html>`__ notebook to quantize the IR model with the Post-training Quantization API of NNCF and compare ``FP16`` and ``INT8`` models. diff --git a/docs/notebooks/228-clip-zero-shot-quantize-with-output.rst b/docs/notebooks/228-clip-zero-shot-quantize-with-output.rst index dc8519624b51b2..88115c96b0960b 100644 --- a/docs/notebooks/228-clip-zero-shot-quantize-with-output.rst +++ b/docs/notebooks/228-clip-zero-shot-quantize-with-output.rst @@ -8,16 +8,16 @@ Compression Framework) and infer quantized model via OpenVINO™ Toolkit. The optimization process contains the following steps: 1. Quantize the converted OpenVINO model from - `notebook <228-clip-zero-shot-convert.ipynb>`__ with NNCF. + `notebook <228-clip-zero-shot-convert-with-output.html>`__ with NNCF. 2. Check the model result using the same input data from the - `notebook <228-clip-zero-shot-convert.ipynb>`__. + `notebook <228-clip-zero-shot-convert-with-output.html>`__. 3. Compare model size of converted and quantized models. 4. Compare performance of converted and quantized models. .. **NOTE**: you should run - `228-clip-zero-shot-convert <228-clip-zero-shot-convert.ipynb>`__ + `228-clip-zero-shot-convert <228-clip-zero-shot-convert-with-output.html>`__ notebook first to generate OpenVINO IR model that is used for quantization. @@ -260,7 +260,7 @@ Run quantized OpenVINO model The steps for making predictions with the quantized OpenVINO CLIP model are similar to the PyTorch model. Let us check the model result using the same input data from the `1st -notebook <228-clip-zero-shot-image-classification.ipynb>`__. +notebook <228-clip-zero-shot-image-classification-with-output.html>`__. .. code:: ipython3 diff --git a/docs/notebooks/230-yolov8-instance-segmentation-with-output.rst b/docs/notebooks/230-yolov8-instance-segmentation-with-output.rst index 8ee70305894350..9ea9a1fb5d9b82 100644 --- a/docs/notebooks/230-yolov8-instance-segmentation-with-output.rst +++ b/docs/notebooks/230-yolov8-instance-segmentation-with-output.rst @@ -1230,7 +1230,7 @@ utilization. For more information, refer to the overview of tutorial <118-optimize-preprocessing-with-output.html>`__. To see, how it could be used with YOLOV8 object detection model , please, see `Convert and Optimize YOLOv8 real-time object detection with -OpenVINO tutorial <./230-yolov8-object-detection.ipynb>`__ +OpenVINO tutorial <230-yolov8-object-detection-with-output.html>`__ Live demo --------------------------------------------------- diff --git a/docs/notebooks/230-yolov8-keypoint-detection-with-output.rst b/docs/notebooks/230-yolov8-keypoint-detection-with-output.rst index 67e4cff60db1d8..b2064339cd612c 100644 --- a/docs/notebooks/230-yolov8-keypoint-detection-with-output.rst +++ b/docs/notebooks/230-yolov8-keypoint-detection-with-output.rst @@ -1217,7 +1217,7 @@ utilization. For more information, refer to the overview of tutorial <118-optimize-preprocessing-with-output.html>`__. To see, how it could be used with YOLOV8 object detection model , please, see `Convert and Optimize YOLOv8 real-time object detection with -OpenVINO tutorial <./230-yolov8-object-detection.ipynb>`__ +OpenVINO tutorial <230-yolov8-object-detection-with-output.html>`__ Live demo --------------------------------------------------- diff --git a/docs/notebooks/231-instruct-pix2pix-image-editing-with-output.rst b/docs/notebooks/231-instruct-pix2pix-image-editing-with-output.rst index 9412662ee2384a..f57eaa0c65bd5f 100644 --- a/docs/notebooks/231-instruct-pix2pix-image-editing-with-output.rst +++ b/docs/notebooks/231-instruct-pix2pix-image-editing-with-output.rst @@ -99,7 +99,7 @@ Convert Models to OpenVINO IR ----------------------------------------------------------------------- OpenVINO supports PyTorch models using `Model Conversion -API `__ +API `__ to convert the model to IR format. ``ov.convert_model`` function accepts PyTorch model object and example input and then converts it to ``ov.Model`` class instance that ready to use for loading on device or diff --git a/docs/notebooks/232-clip-language-saliency-map-with-output.rst b/docs/notebooks/232-clip-language-saliency-map-with-output.rst index 75c61fbd0e19d6..69e7528c500bc5 100644 --- a/docs/notebooks/232-clip-language-saliency-map-with-output.rst +++ b/docs/notebooks/232-clip-language-saliency-map-with-output.rst @@ -868,4 +868,4 @@ can explore the CLIP capabilities further. For example: `NNCF `__ to get further acceleration. You can find example how to quantize CLIP model in `this - notebook <../228-clip-zero-shot-image-classification>`__ + notebook <228-clip-zero-shot-image-classification-with-output.html>`__ diff --git a/docs/notebooks/233-blip-convert-with-output.rst b/docs/notebooks/233-blip-convert-with-output.rst index 2061f204b64aec..a2265cfed797c2 100644 --- a/docs/notebooks/233-blip-convert-with-output.rst +++ b/docs/notebooks/233-blip-convert-with-output.rst @@ -674,7 +674,7 @@ Interactive demo Next steps ---------------------------------------------------- -Open the `233-blip-optimize <233-blip-optimize.ipynb>`__ notebook to +Open the `233-blip-optimize <233-blip-optimize-with-output.html>`__ notebook to quantize vision and text encoder models with the Post-training Quantization API of NNCF and compress weights of the text decoder. Then compare the converted and optimized OpenVINO models. diff --git a/docs/notebooks/233-blip-optimize-with-output.rst b/docs/notebooks/233-blip-optimize-with-output.rst index 1f38fe4a212663..9e20fbe293478a 100644 --- a/docs/notebooks/233-blip-optimize-with-output.rst +++ b/docs/notebooks/233-blip-optimize-with-output.rst @@ -10,18 +10,18 @@ contains the following steps: 1. Download and preprocess dataset for quantization. 2. Quantize the converted vision and text encoder OpenVINO models from - `notebook <233-blip-convert.ipynb>`__ with NNCF. + `notebook <233-blip-convert-with-output.html>`__ with NNCF. 3. Compress weights of the OpenVINO text decoder model from - `notebook <233-blip-convert.ipynb>`__ with NNCF. + `notebook <233-blip-convert-with-output.html>`__ with NNCF. 4. Check the model result using the same input data from the - `notebook <233-blip-convert.ipynb>`__. + `notebook <233-blip-convert-with-output.html>`__. 5. Compare model size of converted and optimized models. 6. Compare performance of converted and optimized models. .. **NOTE**: you should run - `233-blip-convert <233-blip-convert.ipynb>`__ notebook first to + `233-blip-convert <233-blip-convert-with-output.html>`__ notebook first to generate OpenVINO IR models that are used for optimization. **Table of contents:** @@ -290,7 +290,7 @@ Run optimized OpenVINO model The steps for making predictions with the optimized OpenVINO BLIP model are similar to the PyTorch model. Let us check the model result using the same input data from the `first -notebook <233-blip-convert.ipynb>`__. +notebook <233-blip-convert-with-output.html>`__. .. code:: ipython3 diff --git a/docs/notebooks/235-controlnet-stable-diffusion-with-output.rst b/docs/notebooks/235-controlnet-stable-diffusion-with-output.rst index 949b6258bbc7cd..55443ec56c13e4 100644 --- a/docs/notebooks/235-controlnet-stable-diffusion-with-output.rst +++ b/docs/notebooks/235-controlnet-stable-diffusion-with-output.rst @@ -141,7 +141,6 @@ discussed steps are also applicable to other annotation modes. **Table of contents:** - - `Prerequisites <#prerequisites>`__ - `Instantiating Generation Pipeline <#instantiating-generation-pipeline>`__ @@ -168,8 +167,10 @@ discussed steps are also applicable to other annotation modes. - `Select inference device for Stable Diffusion pipeline <#select-inference-device-for-stable-diffusion-pipeline>`__ -Prerequisites -------------------------------------------------------- +Prerequisites +------------- + + .. code:: ipython3 @@ -177,11 +178,15 @@ Prerequisites %pip install -q "diffusers>=0.14.0" "transformers>=4.30.2" "controlnet-aux>=0.0.6" "gradio>=3.36" %pip install -q "openvino>=2023.1.0" -Instantiating Generation Pipeline ---------------------------------------------------------------------------- +Instantiating Generation Pipeline +--------------------------------- + + + +ControlNet in Diffusers library +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + -ControlNet in Diffusers library -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ For working with Stable Diffusion and ControlNet models, we will use Hugging Face `Diffusers `__ @@ -208,18 +213,10 @@ controlnet model and ``stable-diffusion-v1-5``: ) -.. parsed-literal:: - - 2023-08-29 19:05:09.752880: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`. - 2023-08-29 19:05:09.791513: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. - To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. - 2023-08-29 19:05:10.519110: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT - - .. parsed-literal:: - Fetching 15 files: 0%| | 0/15 [00:00`__ @@ -258,7 +257,7 @@ The code below demonstrates how to instantiate the OpenPose model. .. parsed-literal:: - /home/ea/work/ov_venv/lib/python3.8/site-packages/controlnet_aux/mediapipe_face/mediapipe_face_common.py:7: UserWarning: The module 'mediapipe' is not installed. The package will have limited functionality. Please install it using the command: pip install 'mediapipe' + /home/ea/work/openvino_notebooks/test_env/lib/python3.8/site-packages/controlnet_aux/mediapipe_face/mediapipe_face_common.py:7: UserWarning: The module 'mediapipe' is not installed. The package will have limited functionality. Please install it using the command: pip install 'mediapipe' warnings.warn( @@ -318,8 +317,10 @@ Now, let us check its result on example image: .. image:: 235-controlnet-stable-diffusion-with-output_files/235-controlnet-stable-diffusion-with-output_8_0.png -Convert models to OpenVINO Intermediate representation (IR) format ------------------------------------------------------------------------------------------------------------- +Convert models to OpenVINO Intermediate representation (IR) format +------------------------------------------------------------------ + + Starting from 2023.0 release, OpenVINO supports PyTorch models conversion directly. We need to provide a model object, input data for @@ -338,8 +339,10 @@ The pipeline consists of five important parts: Let us convert each part: -OpenPose conversion -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +OpenPose conversion +~~~~~~~~~~~~~~~~~~~ + + OpenPose model is represented in the pipeline as a wrapper on the PyTorch model which not only detects poses on an input image but is also @@ -377,7 +380,7 @@ estimation part, which is located inside the wrapper .. parsed-literal:: - OpenPose will be loaded from openpose.xml + OpenPose successfully converted to IR To reuse the original drawing procedure, we replace the PyTorch OpenPose @@ -431,8 +434,10 @@ model with the OpenVINO model, using the following code: core = ov.Core() -Select inference device ------------------------------------------------------------------ +Select inference device +----------------------- + + select device from dropdown list for running inference using OpenVINO @@ -454,7 +459,7 @@ select device from dropdown list for running inference using OpenVINO .. parsed-literal:: - Dropdown(description='Device:', index=2, options=('CPU', 'GNA', 'AUTO'), value='AUTO') + Dropdown(description='Device:', index=2, options=('CPU', 'GPU', 'AUTO'), value='AUTO') @@ -475,8 +480,10 @@ select device from dropdown list for running inference using OpenVINO Great! As we can see, it works perfectly. -ControlNet conversion -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +ControlNet conversion +~~~~~~~~~~~~~~~~~~~~~ + + The ControlNet model accepts the same inputs like UNet in Stable Diffusion pipeline and additional condition sample - skeleton key points @@ -534,12 +541,14 @@ blocks, which serves additional context for the UNet model. .. parsed-literal:: - 5531 + 9962 + + +UNet conversion +~~~~~~~~~~~~~~~ -UNet conversion -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The process of UNet model conversion remains the same, like for original Stable Diffusion model, but with respect to the new inputs generated by @@ -547,6 +556,8 @@ ControlNet. .. code:: ipython3 + from typing import Tuple + UNET_OV_PATH = Path('unet_controlnet.xml') dtype_mapping = { @@ -556,6 +567,47 @@ ControlNet. torch.int64: ov.Type.i64 } + class UnetWrapper(torch.nn.Module): + def __init__( + self, + unet, + sample_dtype=torch.float32, + timestep_dtype=torch.int64, + encoder_hidden_states=torch.float32, + down_block_additional_residuals=torch.float32, + mid_block_additional_residual=torch.float32 + ): + super().__init__() + self.unet = unet + self.sample_dtype = sample_dtype + self.timestep_dtype = timestep_dtype + self.encoder_hidden_states_dtype = encoder_hidden_states + self.down_block_additional_residuals_dtype = down_block_additional_residuals + self.mid_block_additional_residual_dtype = mid_block_additional_residual + + def forward( + self, + sample:torch.Tensor, + timestep:torch.Tensor, + encoder_hidden_states:torch.Tensor, + down_block_additional_residuals:Tuple[torch.Tensor], + mid_block_additional_residual:torch.Tensor + ): + sample.to(self.sample_dtype) + timestep.to(self.timestep_dtype) + encoder_hidden_states.to(self.encoder_hidden_states_dtype) + down_block_additional_residuals = [res.to(self.down_block_additional_residuals_dtype) for res in down_block_additional_residuals] + mid_block_additional_residual.to(self.mid_block_additional_residual_dtype) + return self.unet( + sample, + timestep, + encoder_hidden_states, + down_block_additional_residuals=down_block_additional_residuals, + mid_block_additional_residual=mid_block_additional_residual + ) + + + def flattenize_inputs(inputs): flatten_inputs = [] for input_data in inputs: @@ -572,7 +624,7 @@ ControlNet. inputs["down_block_additional_residuals"] = down_block_res_samples inputs["mid_block_additional_residual"] = mid_block_res_sample - unet = pipe.unet + unet = UnetWrapper(pipe.unet) unet.eval() with torch.no_grad(): @@ -598,39 +650,21 @@ ControlNet. .. parsed-literal:: - WARNING:tensorflow:Please fix your imports. Module tensorflow.python.training.tracking.base has been moved to tensorflow.python.trackable.base. The old module will be deleted in version 2.11. - + Unet will be loaded from unet_controlnet.xml -.. parsed-literal:: - [ WARNING ] Please fix your imports. Module %s has been moved to %s. The old module will be deleted in version %s. - /home/ea/work/ov_venv/lib/python3.8/site-packages/diffusers/models/unet_2d_condition.py:526: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! - if any(s % default_overall_up_factor != 0 for s in sample.shape[-2:]): - /home/ea/work/ov_venv/lib/python3.8/site-packages/diffusers/models/resnet.py:185: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! - assert hidden_states.shape[1] == self.channels - /home/ea/work/ov_venv/lib/python3.8/site-packages/diffusers/models/resnet.py:190: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! - assert hidden_states.shape[1] == self.channels - /home/ea/work/ov_venv/lib/python3.8/site-packages/diffusers/models/resnet.py:112: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! - assert hidden_states.shape[1] == self.channels - /home/ea/work/ov_venv/lib/python3.8/site-packages/diffusers/models/resnet.py:125: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! - if hidden_states.shape[0] >= 64: .. parsed-literal:: - Unet successfully converted to IR - - + 0 -.. parsed-literal:: - - 0 +Text Encoder +~~~~~~~~~~~~ -Text Encoder -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The text-encoder is responsible for transforming the input prompt, for example, “a photo of an astronaut riding a horse” into an embedding @@ -688,31 +722,21 @@ hidden states. .. parsed-literal:: - /home/ea/work/ov_venv/lib/python3.8/site-packages/transformers/models/clip/modeling_clip.py:286: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! - if attn_weights.size() != (bsz * self.num_heads, tgt_len, src_len): - /home/ea/work/ov_venv/lib/python3.8/site-packages/transformers/models/clip/modeling_clip.py:294: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! - if causal_attention_mask.size() != (bsz, 1, tgt_len, src_len): - /home/ea/work/ov_venv/lib/python3.8/site-packages/transformers/models/clip/modeling_clip.py:326: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! - if attn_output.size() != (bsz * self.num_heads, tgt_len, self.head_dim): - /home/ea/work/ov_venv/lib/python3.8/site-packages/torch/jit/annotations.py:310: UserWarning: TorchScript will treat type annotations of Tensor dtype-specific subtypes as if they are normal Tensors. dtype constraints are not enforced in compilation either. - warnings.warn("TorchScript will treat type annotations of Tensor " + Text encoder will be loaded from text_encoder.xml -.. parsed-literal:: - - Text Encoder successfully converted to IR +.. parsed-literal:: + 0 -.. parsed-literal:: - 4202 +VAE Decoder conversion +~~~~~~~~~~~~~~~~~~~~~~ -VAE Decoder conversion -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The VAE model has two parts, an encoder, and a decoder. The encoder is used to convert the image into a low-dimensional latent representation, @@ -774,11 +798,13 @@ diffusion .. parsed-literal:: - VAE decoder successfully converted to IR + VAE decoder will be loaded from vae_decoder.xml + + +Prepare Inference pipeline +-------------------------- -Prepare Inference pipeline --------------------------------------------------------------------- Putting it all together, let us now take a closer look at how the model works in inference by illustrating the logical flow. |detailed workflow| @@ -839,7 +865,7 @@ on OpenVINO. .. code:: ipython3 - from diffusers.pipeline_utils import DiffusionPipeline + from diffusers import DiffusionPipeline from transformers import CLIPTokenizer from typing import Union, List, Optional, Tuple import cv2 @@ -1225,8 +1251,10 @@ on OpenVINO. fig.savefig("result.png", bbox_inches='tight') return fig -Running Text-to-Image Generation with ControlNet Conditioning and OpenVINO --------------------------------------------------------------------------------------------------------------------- +Running Text-to-Image Generation with ControlNet Conditioning and OpenVINO +-------------------------------------------------------------------------- + + Now, we are ready to start generation. For improving the generation process, we also introduce an opportunity to provide a @@ -1238,8 +1266,10 @@ this We can keep this field empty if we want to generate image without negative prompting. -Select inference device for Stable Diffusion pipeline ------------------------------------------------------------------------------------------------ +Select inference device for Stable Diffusion pipeline +----------------------------------------------------- + + select device from dropdown list for running inference using OpenVINO @@ -1263,7 +1293,7 @@ select device from dropdown list for running inference using OpenVINO .. parsed-literal:: - Dropdown(description='Device:', options=('CPU', 'GNA', 'AUTO'), value='CPU') + Dropdown(description='Device:', options=('CPU', 'GPU', 'AUTO'), value='CPU') @@ -1271,6 +1301,25 @@ select device from dropdown list for running inference using OpenVINO ov_pipe = OVContrlNetStableDiffusionPipeline(tokenizer, scheduler, core, CONTROLNET_OV_PATH, TEXT_ENCODER_OV_PATH, UNET_OV_PATH, VAE_DECODER_OV_PATH, device=device.value) +.. code:: ipython3 + + np.random.seed(42) + + pose = pose_estimator(img) + + prompt = "Dancing Darth Vader, best quality, extremely detailed" + negative_prompt = "monochrome, lowres, bad anatomy, worst quality, low quality" + result = ov_pipe(prompt, pose, 20, negative_prompt=negative_prompt) + + result[0] + + + + +.. image:: 235-controlnet-stable-diffusion-with-output_files/235-controlnet-stable-diffusion-with-output_34_0.png + + + .. code:: ipython3 import gradio as gr @@ -1312,10 +1361,11 @@ select device from dropdown list for running inference using OpenVINO pose_btn.click(extract_pose, inp_img, [out_pose, step1, step2]) btn.click(generate, [out_pose, inp_prompt, inp_neg_prompt, inp_seed, inp_steps], out_result) - demo.queue().launch(share=True) - - -.. parsed-literal:: - - Running on local URL: http://127.0.0.1:7860 - + + try: + demo.queue().launch(debug=False) + except Exception: + demo.queue().launch(share=True, debug=False) + # if you are launching remotely, specify server_name and server_port + # demo.launch(server_name='your server name', server_port='server port in int') + # Read more in the docs: https://gradio.app/docs/ diff --git a/docs/notebooks/235-controlnet-stable-diffusion-with-output_files/235-controlnet-stable-diffusion-with-output_17_0.png b/docs/notebooks/235-controlnet-stable-diffusion-with-output_files/235-controlnet-stable-diffusion-with-output_17_0.png index 1847a6c402bef0..8bad9ffaf3c115 100644 --- a/docs/notebooks/235-controlnet-stable-diffusion-with-output_files/235-controlnet-stable-diffusion-with-output_17_0.png +++ b/docs/notebooks/235-controlnet-stable-diffusion-with-output_files/235-controlnet-stable-diffusion-with-output_17_0.png @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:93a101bee4378a0dfdea04df02be54e4fd01634bf190a5ec38a8ee1dbe9a046d -size 491302 +oid sha256:d015d97ab8bc78fd1836ec81e32f791e9c9d76f2519a3aaecf8dcfe3e9299605 +size 498463 diff --git a/docs/notebooks/235-controlnet-stable-diffusion-with-output_files/235-controlnet-stable-diffusion-with-output_34_0.jpg b/docs/notebooks/235-controlnet-stable-diffusion-with-output_files/235-controlnet-stable-diffusion-with-output_34_0.jpg new file mode 100644 index 00000000000000..f7429b584e39f0 --- /dev/null +++ b/docs/notebooks/235-controlnet-stable-diffusion-with-output_files/235-controlnet-stable-diffusion-with-output_34_0.jpg @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5aafc1dfb3864c10d2931ef640caf4152e4ce150ab00d0f50b68c056e3fa3c65 +size 30487 diff --git a/docs/notebooks/235-controlnet-stable-diffusion-with-output_files/235-controlnet-stable-diffusion-with-output_34_0.png b/docs/notebooks/235-controlnet-stable-diffusion-with-output_files/235-controlnet-stable-diffusion-with-output_34_0.png new file mode 100644 index 00000000000000..b39df602b4daf6 --- /dev/null +++ b/docs/notebooks/235-controlnet-stable-diffusion-with-output_files/235-controlnet-stable-diffusion-with-output_34_0.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0e544771bccc6bd6a817090f9e1cce06e6cec33b1e9e2680fa526502a70a5447 +size 464375 diff --git a/docs/notebooks/235-controlnet-stable-diffusion-with-output_files/235-controlnet-stable-diffusion-with-output_8_0.png b/docs/notebooks/235-controlnet-stable-diffusion-with-output_files/235-controlnet-stable-diffusion-with-output_8_0.png index 1847a6c402bef0..8bad9ffaf3c115 100644 --- a/docs/notebooks/235-controlnet-stable-diffusion-with-output_files/235-controlnet-stable-diffusion-with-output_8_0.png +++ b/docs/notebooks/235-controlnet-stable-diffusion-with-output_files/235-controlnet-stable-diffusion-with-output_8_0.png @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:93a101bee4378a0dfdea04df02be54e4fd01634bf190a5ec38a8ee1dbe9a046d -size 491302 +oid sha256:d015d97ab8bc78fd1836ec81e32f791e9c9d76f2519a3aaecf8dcfe3e9299605 +size 498463 diff --git a/docs/notebooks/235-controlnet-stable-diffusion-with-output_files/index.html b/docs/notebooks/235-controlnet-stable-diffusion-with-output_files/index.html index 117a81b5b1d4cd..413af7ce9ee1f0 100644 --- a/docs/notebooks/235-controlnet-stable-diffusion-with-output_files/index.html +++ b/docs/notebooks/235-controlnet-stable-diffusion-with-output_files/index.html @@ -1,8 +1,10 @@ -Index of /projects/ov-notebook/0.1.0-latest/20231030220807/dist/rst_files/235-controlnet-stable-diffusion-with-output_files/ +Index of /projects/ov-notebook/0.1.0-latest/20231114220808/dist/rst_files/235-controlnet-stable-diffusion-with-output_files/ -

Index of /projects/ov-notebook/0.1.0-latest/20231030220807/dist/rst_files/235-controlnet-stable-diffusion-with-output_files/


../
-235-controlnet-stable-diffusion-with-output_17_..> 31-Oct-2023 00:35              491302
-235-controlnet-stable-diffusion-with-output_8_0..> 31-Oct-2023 00:35              491302
+

Index of /projects/ov-notebook/0.1.0-latest/20231114220808/dist/rst_files/235-controlnet-stable-diffusion-with-output_files/


../
+235-controlnet-stable-diffusion-with-output_17_..> 15-Nov-2023 00:43              498463
+235-controlnet-stable-diffusion-with-output_34_..> 15-Nov-2023 00:43               30487
+235-controlnet-stable-diffusion-with-output_34_..> 15-Nov-2023 00:43              464375
+235-controlnet-stable-diffusion-with-output_8_0..> 15-Nov-2023 00:43              498463
 

diff --git a/docs/notebooks/236-stable-diffusion-v2-infinite-zoom-with-output.rst b/docs/notebooks/236-stable-diffusion-v2-infinite-zoom-with-output.rst index 7b9c471a6341ea..30edc7d2c08fe4 100644 --- a/docs/notebooks/236-stable-diffusion-v2-infinite-zoom-with-output.rst +++ b/docs/notebooks/236-stable-diffusion-v2-infinite-zoom-with-output.rst @@ -22,7 +22,7 @@ In previous notebooks, we already discussed how to run `Text-to-Image generation and Image-to-Image generation using Stable Diffusion v1 <225-stable-diffusion-text-to-image-with-output.html>`__ and `controlling its generation process using -ControlNet <./235-controlnet-stable-diffusion/235-controlnet-stable-diffusion.ipynb>`__. +ControlNet <235-controlnet-stable-diffusion/235-controlnet-stable-diffusion-with-output.html>`__. Now is turn of Stable Diffusion v2. Stable Diffusion v2: What’s new? @@ -203,7 +203,7 @@ Convert models to OpenVINO Intermediate representation (IR) format ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Conversion part of model stayed remain as in `Text-to-Image generation -notebook <./236-stable-diffusion-v2-text-to-image.ipynb>`__. Except +notebook <236-stable-diffusion-v2-text-to-image-with-output.html>`__. Except U-Net now has 9 channels, which now calculated like 4 for U-Net generated latents channels + 4 for latent representation of masked image + 1 channel resized mask. diff --git a/docs/notebooks/236-stable-diffusion-v2-optimum-demo-comparison-with-output.rst b/docs/notebooks/236-stable-diffusion-v2-optimum-demo-comparison-with-output.rst index 36bbae3ae3cfd5..89953a86d9508a 100644 --- a/docs/notebooks/236-stable-diffusion-v2-optimum-demo-comparison-with-output.rst +++ b/docs/notebooks/236-stable-diffusion-v2-optimum-demo-comparison-with-output.rst @@ -13,12 +13,9 @@ running multiple times. - `Showing Info Available Devices <#showing-info-available-devices>`__ -- `Using full precision model in CPU with - ``StableDiffusionPipeline`` <#using-full-precision-model-in-cpu-with-stablediffusionpipeline>`__ -- `Using full precision model in CPU with - ``OVStableDiffusionPipeline`` <#using-full-precision-model-in-cpu-with-ovstablediffusionpipeline>`__ -- `Using full precision model in dGPU with - ``OVStableDiffusionPipeline`` <#using-full-precision-model-in-dgpu-with-ovstablediffusionpipeline>`__ +- `Using full precision model in CPU with StableDiffusionPipeline <#using-full-precision-model-in-cpu-with-stablediffusionpipeline>`__ +- `Using full precision model in CPU with OVStableDiffusionPipeline <#using-full-precision-model-in-cpu-with-ovstablediffusionpipeline>`__ +- `Using full precision model in dGPU with OVStableDiffusionPipeline <#using-full-precision-model-in-dgpu-with-ovstablediffusionpipeline>`__ .. |image0| image:: https://github.com/openvinotoolkit/openvino_notebooks/assets/10940214/1858dae4-72fd-401e-b055-66d503d82446 diff --git a/docs/notebooks/236-stable-diffusion-v2-text-to-image-with-output.rst b/docs/notebooks/236-stable-diffusion-v2-text-to-image-with-output.rst index f7c2ce5b701141..882d1586f300f5 100644 --- a/docs/notebooks/236-stable-diffusion-v2-text-to-image-with-output.rst +++ b/docs/notebooks/236-stable-diffusion-v2-text-to-image-with-output.rst @@ -22,7 +22,7 @@ In previous notebooks, we already discussed how to run `Text-to-Image generation and Image-to-Image generation using Stable Diffusion v1 <225-stable-diffusion-text-to-image-with-output.html>`__ and `controlling its generation process using -ControlNet <./235-controlnet-stable-diffusion/235-controlnet-stable-diffusion.ipynb>`__. +ControlNet <235-controlnet-stable-diffusion-with-output.html>`__. Now is turn of Stable Diffusion v2. Stable Diffusion v2: What’s new? diff --git a/docs/notebooks/238-deep-floyd-if-optimize-with-output.rst b/docs/notebooks/238-deep-floyd-if-optimize-with-output.rst index 2d704dbf1c2431..6d8f92e26dabd6 100644 --- a/docs/notebooks/238-deep-floyd-if-optimize-with-output.rst +++ b/docs/notebooks/238-deep-floyd-if-optimize-with-output.rst @@ -7,17 +7,17 @@ applying 8-bit post-training quantization and weights compression from Compression Framework) and infer optimized model via OpenVINO™ Toolkit. **NOTE**: you should run - `238-deep-floyd-if-convert <238-deep-floyd-if-convert.ipynb>`__ + `238-deep-floyd-if-convert <238-deep-floyd-if-convert-with-output.html>`__ notebook first to generate OpenVINO IR model that is used for optimization. The optimization process contains the following steps: 1. Compress weights of the converted OpenVINO text encoder from -`notebook <238-deep-floyd-if-convert.ipynb>`__ with NNCF. 2. Quantize +`notebook <238-deep-floyd-if-convert-with-output.html>`__ with NNCF. 2. Quantize the converted stage_1 and stage_2 U-Nets from -`notebook <238-deep-floyd-if-convert.ipynb>`__ with NNCF. 2. Check the +`notebook <238-deep-floyd-if-convert-with-output.html>`__ with NNCF. 2. Check the model result using the same input data from the -`notebook <238-deep-floyd-if-convert.ipynb>`__. 3. Compare model size of +`notebook <238-deep-floyd-if-convert-with-output.html>`__. 3. Compare model size of converted and optimized models. 4. Compare performance of converted and optimized models. @@ -470,7 +470,7 @@ Run optimized OpenVINO model Let us check predictions with the optimized OpenVINO DeepFloyd IF model result using the same input data from the `1st -notebook <238-deep-floyd-if.ipynb>`__. +notebook <238-deep-floyd-if-with-output.html>`__. .. code:: ipython3 diff --git a/docs/notebooks/239-image-bind-convert-with-output.rst b/docs/notebooks/239-image-bind-convert-with-output.rst index 4c8123cbf8306c..0fee6dbdeb7c15 100644 --- a/docs/notebooks/239-image-bind-convert-with-output.rst +++ b/docs/notebooks/239-image-bind-convert-with-output.rst @@ -494,6 +494,6 @@ Putting all together, we can match text, image, and sound for our data. Next Steps ---------------------------------------------------- -Open the `239-image-bind-quantize <239-image-bind-quantize.ipynb>`__ +Open the `239-image-bind-quantize <239-image-bind-quantize-with-output.html>`__ notebook to quantize the IR model with the Post-training Quantization API of NNCF and compare ``FP16`` and ``INT8`` models. diff --git a/docs/notebooks/240-dolly-2-instruction-following-with-output.rst b/docs/notebooks/240-dolly-2-instruction-following-with-output.rst index c8f4ccd7a71d73..3dca6e68489c62 100644 --- a/docs/notebooks/240-dolly-2-instruction-following-with-output.rst +++ b/docs/notebooks/240-dolly-2-instruction-following-with-output.rst @@ -83,32 +83,32 @@ and `repo `__ **Table of contents:** - - `Prerequisites <#prerequisites>`__ - `Select inference device <#select-inference-device>`__ -- `Download and Convert - Model <#download-and-convert-model>`__ -- `NNCF model weights - compression <#nncf-model-weights-compression>`__ +- `Download and Convert Model <#download-and-convert-model>`__ + + - `NNCF model weights + compression <#nncf-model-weights-compression>`__ + - `Create an instruction-following inference pipeline <#create-an-instruction-following-inference-pipeline>`__ - `Setup imports <#setup-imports>`__ - `Prepare template for user prompt <#prepare-template-for-user-prompt>`__ - - `Helpers for output - parsing <#helpers-for-output-parsing>`__ - - `Main generation - function <#main-generation-function>`__ + - `Helpers for output parsing <#helpers-for-output-parsing>`__ + - `Main generation function <#main-generation-function>`__ - `Helpers for application <#helpers-for-application>`__ - `Run instruction-following pipeline <#run-instruction-following-pipeline>`__ -Prerequisites --------------------------------------------------------- +Prerequisites +------------- + + First, we should install the `Hugging Face Optimum `__ library @@ -120,11 +120,13 @@ documentation `__. .. code:: ipython3 - %pip install -q "diffusers>=0.16.1" "transformers>=4.28.0" "openvino==2023.2.0.dev20230922" "nncf>=2.6.0" datasets onnx onnxruntime gradio + %pip install -q "diffusers>=0.16.1" "transformers>=4.33.0" "openvino==2023.2.0.dev20230922" "nncf>=2.6.0" datasets onnx gradio --extra-index-url https://download.pytorch.org/whl/cpu %pip install -q --upgrade "git+https://github.com/huggingface/optimum-intel.git" -Select inference device -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Select inference device +~~~~~~~~~~~~~~~~~~~~~~~ + + select device from dropdown list for running inference using OpenVINO @@ -153,8 +155,10 @@ select device from dropdown list for running inference using OpenVINO -Download and Convert Model ---------------------------------------------------------------------- +Download and Convert Model +-------------------------- + + Optimum Intel can be used to load optimized models from the `Hugging Face Hub `__ and @@ -221,8 +225,10 @@ compatible with Optimum models. Compiling the model to CPU ... -NNCF model weights compression -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +NNCF model weights compression +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + NNCF `Weights Compression algorithm `__ @@ -298,8 +304,10 @@ accuracy drop. Compiling the model to CPU ... -Create an instruction-following inference pipeline ---------------------------------------------------------------------------------------------- +Create an instruction-following inference pipeline +-------------------------------------------------- + + The ``run_generation`` function accepts user-provided text input, tokenizes it, and runs the generation process. Text generation is an @@ -402,8 +410,10 @@ generated tokens without waiting until when the whole generation is finished using Streaming API, it adds a new token to the output queue and then prints them when they are ready. -Setup imports -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Setup imports +~~~~~~~~~~~~~ + + .. code:: ipython3 @@ -414,8 +424,10 @@ Setup imports from transformers import AutoTokenizer, TextIteratorStreamer import numpy as np -Prepare template for user prompt -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Prepare template for user prompt +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + For effective generation, model expects to have input in specific format. The code below prepare template for passing user instruction @@ -445,8 +457,10 @@ into model with providing additional context. response_key=RESPONSE_KEY, ) -Helpers for output parsing -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Helpers for output parsing +~~~~~~~~~~~~~~~~~~~~~~~~~~ + + Model was retrained to finish generation using special token ``### End`` the code below find its id for using it as generation stop-criteria. @@ -485,8 +499,10 @@ the code below find its id for using it as generation stop-criteria. except ValueError: pass -Main generation function -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Main generation function +~~~~~~~~~~~~~~~~~~~~~~~~ + + As it was discussed above, ``run_generation`` function is the entry point for starting generation. It gets provided input instruction as @@ -545,8 +561,10 @@ parameter and returns model response. start = perf_counter() return model_output, perf_text -Helpers for application -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Helpers for application +~~~~~~~~~~~~~~~~~~~~~~~ + + For making interactive user interface we will use Gradio library. The code bellow provides useful functions used for communication with UI @@ -611,8 +629,10 @@ elements. ov_model.compile() return current_text -Run instruction-following pipeline ------------------------------------------------------------------------------ +Run instruction-following pipeline +---------------------------------- + + Now, we are ready to explore model capabilities. This demo provides a simple interface that allows communication with a model using text @@ -690,9 +710,13 @@ generation parameters: if __name__ == "__main__": try: - demo.launch(enable_queue=True, share=False, height=800) + demo.queue().launch(debug=False, height=800) except Exception: - demo.launch(enable_queue=True, share=True, height=800) + demo.queue().launch(debug=False, share=True, height=800) + + # If you are launching remotely, specify server_name and server_port + # EXAMPLE: `demo.launch(server_name='your server name', server_port='server port in int')` + # To learn more please refer to the Gradio docs: https://gradio.app/docs/ .. parsed-literal:: diff --git a/docs/notebooks/241-riffusion-text-to-music-with-output.rst b/docs/notebooks/241-riffusion-text-to-music-with-output.rst index 27952900e7624d..81a23d62411021 100644 --- a/docs/notebooks/241-riffusion-text-to-music-with-output.rst +++ b/docs/notebooks/241-riffusion-text-to-music-with-output.rst @@ -76,7 +76,6 @@ audio generation. **Table of contents:** - - `Prerequisites <#prerequisites>`__ - `Stable Diffusion pipeline in Optimum Intel <#stable-diffusion-pipeline-in-optimum-intel>`__ @@ -88,17 +87,20 @@ audio generation. - `Run Inference pipeline <#run-inference-pipeline>`__ - `Interactive demo <#interactive-demo>`__ -Prerequisites -------------------------------------------------------- +Prerequisites +------------- + + .. code:: ipython3 - %pip install -q "diffusers>=0.16.1" "transformers>=4.28.0" - %pip install -q --extra-index-url https://download.pytorch.org/whl/cpu torch torchaudio - %pip install -q "git+https://github.com/huggingface/optimum-intel.git" onnx onnxruntime "gradio>=3.34.0" "openvino>=2023.1.0" + %pip install -q --extra-index-url https://download.pytorch.org/whl/cpu "torch<2.1" "torchaudio<2.1" "diffusers>=0.16.1" "transformers>=4.33.0" + %pip install -q "git+https://github.com/huggingface/optimum-intel.git" onnx "gradio>=3.34.0" "openvino>=2023.1.0" + +Stable Diffusion pipeline in Optimum Intel +------------------------------------------ + -Stable Diffusion pipeline in Optimum Intel ------------------------------------------------------------------------------------- As the riffusion model architecture is the same as Stable Diffusion, we can use it with the Stable Diffusion pipeline for text-to-image @@ -134,8 +136,10 @@ running. MODEL_ID = "riffusion/riffusion-model-v1" MODEL_DIR = Path("riffusion_pipeline") -Select inference device -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Select inference device +~~~~~~~~~~~~~~~~~~~~~~~ + + select device from dropdown list for running inference using OpenVINO @@ -194,8 +198,10 @@ select device from dropdown list for running inference using OpenVINO warnings.warn( -Prepare postprocessing for reconstruction audio from spectrogram image ----------------------------------------------------------------------------------------------------------------- +Prepare postprocessing for reconstruction audio from spectrogram image +---------------------------------------------------------------------- + + The riffusion model generates an audio spectrogram image, which can be used to reconstruct audio. However, the spectrogram images from the @@ -361,8 +367,10 @@ from a spectrogram image using Griffin-Lim Algorithm. return waveform -Run Inference pipeline ----------------------------------------------------------------- +Run Inference pipeline +---------------------- + + The diagram below briefly describes the workflow of our pipeline @@ -488,8 +496,10 @@ without the other. More explanation of how it works can be found in this -Interactive demo ----------------------------------------------------------- +Interactive demo +---------------- + + .. code:: ipython3 @@ -547,17 +557,20 @@ Interactive demo with gr.Column(): sound_output = gr.Audio(type='filepath', label="spectrogram sound") - spectrogram_output = gr.Image(label="spectrogram image result") - spectrogram_output.style(height=256) + spectrogram_output = gr.Image(label="spectrogram image result", height=256) send_btn.click(generate, inputs=[prompt_input, negative_prompt], outputs=[spectrogram_output, sound_output]) device.change(select_device, [device, prompt_input], [prompt_input]) if __name__ == "__main__": try: - demo.launch(enable_queue=True, height=800) + demo.queue().launch(debug=False, height=800) except Exception: - demo.launch(enable_queue=True, share=True, height=800) + demo.queue().launch(debug=False, share=True, height=800) + + # If you are launching remotely, specify server_name and server_port + # EXAMPLE: `demo.launch(server_name='your server name', server_port='server port in int')` + # To learn more please refer to the Gradio docs: https://gradio.app/docs/ .. parsed-literal:: diff --git a/docs/notebooks/241-riffusion-text-to-music-with-output_files/index.html b/docs/notebooks/241-riffusion-text-to-music-with-output_files/index.html index 357065094ecfa4..1e137131dbcb60 100644 --- a/docs/notebooks/241-riffusion-text-to-music-with-output_files/index.html +++ b/docs/notebooks/241-riffusion-text-to-music-with-output_files/index.html @@ -1,8 +1,8 @@ -Index of /projects/ov-notebook/0.1.0-latest/20231030220807/dist/rst_files/241-riffusion-text-to-music-with-output_files/ +Index of /projects/ov-notebook/0.1.0-latest/20231114220808/dist/rst_files/241-riffusion-text-to-music-with-output_files/ -

Index of /projects/ov-notebook/0.1.0-latest/20231030220807/dist/rst_files/241-riffusion-text-to-music-with-output_files/


../
-241-riffusion-text-to-music-with-output_14_0.jpg   31-Oct-2023 00:35               61095
-241-riffusion-text-to-music-with-output_14_0.png   31-Oct-2023 00:35              524399
+

Index of /projects/ov-notebook/0.1.0-latest/20231114220808/dist/rst_files/241-riffusion-text-to-music-with-output_files/


../
+241-riffusion-text-to-music-with-output_14_0.jpg   15-Nov-2023 00:43               61095
+241-riffusion-text-to-music-with-output_14_0.png   15-Nov-2023 00:43              524399
 

diff --git a/docs/notebooks/242-freevc-voice-conversion-with-output.rst b/docs/notebooks/242-freevc-voice-conversion-with-output.rst index 2fc050361e0c88..52818de95578ec 100644 --- a/docs/notebooks/242-freevc-voice-conversion-with-output.rst +++ b/docs/notebooks/242-freevc-voice-conversion-with-output.rst @@ -30,32 +30,35 @@ devices. It consists of the following steps: **Table of contents:** - - `Pre-requisites <#pre-requisites>`__ - `Imports and settings <#imports-and-settings>`__ - `Convert Modes to OpenVINO Intermediate Representation <#convert-modes-to-openvino-intermediate-representation>`__ - `Convert Prior Encoder. <#convert-prior-encoder>`__ - - `Convert SpeakerEncoder <#convert-speakerencoder>`__ + - `Convert + SpeakerEncoder <#convert-speakerencoder>`__ - `Convert Decoder <#convert-decoder>`__ Pre-requisites -------------------------------------------------------- This steps can be done manually or will be performed automatically -during the execution of the notebook, but in minimum necessary scope. 1. -Clone this repo: git clone https://github.com/OlaWod/FreeVC.git. 2. -Download -`WavLM-Large `__ -and put it under directory ``FreeVC/wavlm/``. 3. You can download the -`VCTK `__ dataset. For -this example we download only two of them from `Hugging Face FreeVC -example `__. 4. -Download `pretrained -models `__ -and put it under directory ‘checkpoints’ (for current example only -``freevc.pth`` are required). +during the execution of the notebook, but in minimum necessary scope. + +1. Clone this repo: + + .. code:: + + git clone https://github.com/OlaWod/FreeVC.git + +2. Download `WavLM-Large `__ and put it under directory ``FreeVC/wavlm/``. +3. You can download the `VCTK `__ dataset. + + For this example we download only two of them from `Hugging Face FreeVC example `__. + +4. Download `pretrained models `__ + and put it under directory ‘checkpoints’ (for current example only ``freevc.pth`` are required). Install extra requirements diff --git a/docs/notebooks/244-named-entity-recognition-with-output.rst b/docs/notebooks/244-named-entity-recognition-with-output.rst index 320bac5661a9cf..adc57e37189049 100644 --- a/docs/notebooks/244-named-entity-recognition-with-output.rst +++ b/docs/notebooks/244-named-entity-recognition-with-output.rst @@ -26,31 +26,34 @@ Optimum `__ library is used to convert the model to OpenVINO™ IR format and quantize it. **Table of contents:** ---- -- `Prerequisites <#prerequisites>`__ +- `Prerequisites <#prerequisites>`__ - `Download the NER model <#download-the-ner-model>`__ - `Quantize the model, using Hugging Face Optimum API <#quantize-the-model-using-hugging-face-optimum-api>`__ -- `Prepare demo for Named Entity Recognition OpenVINO - Runtime <#prepare-demo-for-named-entity-recognition-openvino-runtime>`__ - `Compare the Original and Quantized Models <#compare-the-original-and-quantized-models>`__ - `Compare performance <#compare-performance>`__ - - `Compare size of the - models <#compare-size-of-the-models>`__ + - `Compare size of the models <#compare-size-of-the-models>`__ + +- `Prepare demo for Named Entity Recognition OpenVINO + Runtime <#prepare-demo-for-named-entity-recognition-openvino-runtime>`__ + +Prerequisites +------------- + -Prerequisites -------------------------------------------------------- .. code:: ipython3 - %pip install -q "diffusers>=0.17.1" "openvino>=2023.1.0" "nncf>=2.5.0" "gradio" "onnx>=1.11.0" "onnxruntime>=1.14.0" "transformers>=4.31.0" + %pip install -q "diffusers>=0.17.1" "openvino>=2023.1.0" "nncf>=2.5.0" "gradio" "onnx>=1.11.0" "transformers>=4.33.0" --extra-index-url https://download.pytorch.org/whl/cpu %pip install -q "git+https://github.com/huggingface/optimum-intel.git" -Download the NER model ----------------------------------------------------------------- +Download the NER model +---------------------- + + We load the `distilbert-base-cased-finetuned-conll03-english `__ @@ -74,17 +77,10 @@ method. tokenizer = AutoTokenizer.from_pretrained(model_id) - -.. parsed-literal:: - - 2023-09-19 19:03:57.913343: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`. - 2023-09-19 19:03:57.950536: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. - To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. - 2023-09-19 19:03:58.511125: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT +Quantize the model, using Hugging Face Optimum API +-------------------------------------------------- -Quantize the model, using Hugging Face Optimum API --------------------------------------------------------------------------------------------- Post-training static quantization introduces an additional calibration step where data is fed through the network in order to compute the @@ -143,26 +139,15 @@ corresponding ``OVModelForXxx`` class. So we use .. parsed-literal:: - INFO:nncf:NNCF initialized successfully. Supported frameworks detected: torch, tensorflow, onnx, openvino + INFO:nncf:NNCF initialized successfully. Supported frameworks detected: torch, onnx, openvino .. parsed-literal:: No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda' - /home/ea/work/ov_venv/lib/python3.8/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations + /home/ea/work/openvino_notebooks/test_env/lib/python3.8/site-packages/datasets/load.py:2089: FutureWarning: 'use_auth_token' was deprecated in favor of 'token' in version 2.14.0 and will be removed in 3.0.0. + You can remove this warning by passing 'token=False' instead. warnings.warn( - Found cached dataset conll2003 (/home/ea/.cache/huggingface/datasets/conll2003/conll2003/1.0.0/9a4d16a94f8674ba3466315300359b0acd891b68b6c8743ddf60b9c702adce98) - Loading cached shuffled indices for dataset at /home/ea/.cache/huggingface/datasets/conll2003/conll2003/1.0.0/9a4d16a94f8674ba3466315300359b0acd891b68b6c8743ddf60b9c702adce98/cache-2fe5320fac60946d.arrow - - - -.. parsed-literal:: - - 0%| | 0/1 [00:00`__ +model with quantized and converted to OpenVINO IR format models to see +the difference. - [ WARNING ] Please fix your imports. Module %s has been moved to %s. The old module will be deleted in version %s. - /home/ea/work/ov_venv/lib/python3.8/site-packages/nncf/torch/dynamic_graph/wrappers.py:81: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect. - result = operator(\*args, \*\*kwargs) - Configuration saved in quantized_ner_model/openvino_config.json - Compiling the model... - Set CACHE_DIR to quantized_ner_model/model_cache +Compare performance +~~~~~~~~~~~~~~~~~~~ -Prepare demo for Named Entity Recognition OpenVINO Runtime ----------------------------------------------------------------------------------------------------- As the Optimum Inference models are API compatible with Hugging Face -Transformers models, we can just use ``pipleine()`` from `Hugging Face +Transformers models, we can just use ``pipeline()`` from `Hugging Face Transformers API `__ for inference. @@ -265,70 +236,7 @@ inference. from transformers import pipeline ner_pipeline_optimized = pipeline("token-classification", model=optimized_model, tokenizer=tokenizer) - -Now, you can try NER model on own text. Put your sentence to input text -box, click Submit button, the model label the recognized entities in the -text. - -.. code:: ipython3 - - import gradio as gr - - examples = [ - "My name is Wolfgang and I live in Berlin.", - ] - - def run_ner(text): - output = ner_pipeline_optimized(text) - return {"text": text, "entities": output} - - demo = gr.Interface(run_ner, - gr.Textbox(placeholder="Enter sentence here...", label="Input Text"), - gr.HighlightedText(label="Output Text"), - examples=examples, - allow_flagging="never") - if __name__ == "__main__": - try: - demo.launch(debug=False) - except Exception: - demo.launch(share=True, debug=False) - # if you are launching remotely, specify server_name and server_port - # demo.launch(server_name='your server name', server_port='server port in int') - # Read more in the docs: https://gradio.app/docs/ - - -.. parsed-literal:: - - Running on local URL: http://127.0.0.1:7860 - - To create a public link, set `share=True` in `launch()`. - - - -.. .. raw:: html - -..
- - -.. parsed-literal:: - - Keyboard interruption in main thread... closing server. - - -Compare the Original and Quantized Models ------------------------------------------------------------------------------------ - -Compare the original -`distilbert-base-cased-finetuned-conll03-english `__ -model with quantized and converted to OpenVINO IR format models to see -the difference. - -Compare performance -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -.. code:: ipython3 - ner_pipeline_original = pipeline("token-classification", model=model, tokenizer=tokenizer) .. code:: ipython3 @@ -360,23 +268,64 @@ Compare performance .. parsed-literal:: - Median inference time of quantized model: 0.008145123501890339 - Median inference time of original model: 0.09339697850373341 + Median inference time of quantized model: 0.008135671014315449 + Median inference time of original model: 0.108725632991991 + + +Compare size of the models +~~~~~~~~~~~~~~~~~~~~~~~~~~ -Compare size of the models -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code:: ipython3 from pathlib import Path - print(f'Size of original model in Bytes is {Path(original_ner_model_dir, "pytorch_model.bin").stat().st_size}') + pytorch_model_file = Path(original_ner_model_dir) / "pytorch_model.bin" + if not pytorch_model_file.exists(): + pytorch_model_file = pytorch_model_file.parent / "model.safetensors" + print(f'Size of original model in Bytes is {pytorch_model_file.stat().st_size}') print(f'Size of quantized model in Bytes is {Path(quantized_ner_model_dir, "openvino_model.bin").stat().st_size}') .. parsed-literal:: - Size of original model in Bytes is 260824741 + Size of original model in Bytes is 260803668 Size of quantized model in Bytes is 133539000 + +Prepare demo for Named Entity Recognition OpenVINO Runtime +---------------------------------------------------------- + + + +Now, you can try NER model on own text. Put your sentence to input text +box, click Submit button, the model label the recognized entities in the +text. + +.. code:: ipython3 + + import gradio as gr + + examples = [ + "My name is Wolfgang and I live in Berlin.", + ] + + def run_ner(text): + output = ner_pipeline_optimized(text) + return {"text": text, "entities": output} + + demo = gr.Interface(run_ner, + gr.Textbox(placeholder="Enter sentence here...", label="Input Text"), + gr.HighlightedText(label="Output Text"), + examples=examples, + allow_flagging="never") + + if __name__ == "__main__": + try: + demo.launch(debug=False) + except Exception: + demo.launch(share=True, debug=False) + # if you are launching remotely, specify server_name and server_port + # demo.launch(server_name='your server name', server_port='server port in int') + # Read more in the docs: https://gradio.app/docs/ diff --git a/docs/notebooks/246-depth-estimation-videpth-with-output.rst b/docs/notebooks/246-depth-estimation-videpth-with-output.rst index bf81d6d2110846..765e97e11b7264 100644 --- a/docs/notebooks/246-depth-estimation-videpth-with-output.rst +++ b/docs/notebooks/246-depth-estimation-videpth-with-output.rst @@ -70,48 +70,46 @@ IR model representation *via* another format. **Table of contents:** - - `Imports <#imports>`__ -- `Loading models and - checkpoints <#loading-models-and-checkpoints>`__ +- `Loading models and checkpoints <#loading-models-and-checkpoints>`__ - `Cleaning up the model directory <#cleaning-up-the-model-directory>`__ + - `Transformation of models <#transformation-of-models>`__ + + - `Dummy input creation <#dummy-input-creation>`__ + - `Conversion of depth model to OpenVINO IR + format <#conversion-of-depth-model-to-openvino-ir-format>`__ -- `Transformation of models <#transformation-of-models>`__ + - `Select inference device <#select-inference-device>`__ + - `Compilation of depth model <#compilation-of-depth-model>`__ + - `Computation of scale and shift + parameters <#computation-of-scale-and-shift-parameters>`__ - - `Dummy input creation <#dummy-input-creation>`__ - - `Conversion of depth model to OpenVINO™ IR - format <#conversion-of-depth-model-to-openvino-ir-format>`__ + - `Conversion of Scale Map Learner model to OpenVINO IR + format <#conversion-of-scale-map-learner-model-to-openvino-ir-format>`__ - - `Select inference device <#select-inference-device>`__ - - `Compilation of depth - model <#compilation-of-depth-model>`__ - - `Computation of scale and shift - parameters <#computation-of-scale-and-shift-parameters>`__ + - `Select inference device <#select-inference-device>`__ + - `Compilation of the ScaleMapLearner(SML) + model <#compilation-of-the-scalemaplearnersml-model>`__ - - `Conversion of Scale Map Learner model to OpenVINO™ IR - format <#conversion-of-scale-map-learner-model-to-openvino-ir-format>`__ + - `Storing and visualizing dummy results + obtained <#storing-and-visualizing-dummy-results-obtained>`__ - - `Select inference device <#select-inference-device>`__ - - `Compilation of the ScaleMapLearner(SML) - model <#compilation-of-the-scalemaplearnersml-model>`__ + - `Running inference on a test + image <#running-inference-on-a-test-image>`__ + - `Store and visualize Inference + results <#store-and-visualize-inference-results>`__ - - `Storing and visualizing dummy results - obtained <#storing-and-visualizing-dummy-results-obtained>`__ + - `Cleaning up the data + directory <#cleaning-up-the-data-directory>`__ -- `Running inference on a test - image <#running-inference-on-a-test-image>`__ -- `Store and visualize Inference - results <#store-and-visualize-inference-results>`__ + - `Concluding notes <#concluding-notes>`__ - - `Cleaning up the data - directory <#cleaning-up-the-data-directory>`__ +Imports +~~~~~~~ -- `Concluding notes <#concluding-notes>`__ -Imports -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code:: ipython3 @@ -133,8 +131,7 @@ Imports onnx 1.15.0 requires protobuf>=3.20.2, but you have protobuf 3.20.1 which is incompatible. onnxconverter-common 1.14.0 requires protobuf==3.20.2, but you have protobuf 3.20.1 which is incompatible. paddlepaddle 2.5.2 requires protobuf>=3.20.2; platform_system != "Windows", but you have protobuf 3.20.1 which is incompatible. - tensorflow 2.13.1 requires protobuf!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.20.3, but you have protobuf 3.20.1 which is incompatible. - tensorflow 2.13.1 requires typing-extensions<4.6.0,>=3.6.6, but you have typing-extensions 4.8.0 which is incompatible. + tensorflow 2.12.0 requires protobuf!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.20.3, but you have protobuf 3.20.1 which is incompatible. tf2onnx 1.15.1 requires protobuf~=3.20.2, but you have protobuf 3.20.1 which is incompatible. Note: you may need to restart the kernel to use updated packages. @@ -167,8 +164,10 @@ Imports # Ability to display images inline %matplotlib inline -Loading models and checkpoints -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Loading models and checkpoints +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + The complete pipeline here requires only two models: one for depth estimation and a ScaleMapLearner model which is responsible for @@ -185,7 +184,6 @@ link address”. We shall use this link in the next cell to download the ScaleMapLearner model. *Interestingly*, the ScaleMapLearner decides the depth prediction model as you will see. - +------------------+---------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------+ | Depth Predictor | SML on VOID 150 | SML on VOID 500 | SML on VOID 1500 | +==================+=================================================================================================================================+=================================================================================================================================+==================================================================================================================================+ @@ -204,7 +202,6 @@ depth prediction model as you will see. | MiDaS-small | `model `__ | `model `__ | `model `__ | +------------------+---------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------+ - \*Also available with pre-training on TartanAir: `model `__ @@ -284,7 +281,7 @@ depth prediction model as you will see. .. parsed-literal:: - /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-534/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/torch/hub.py:267: UserWarning: You are about to download and run code from an untrusted repository. In a future release, this won't be allowed. To add the repository to your trusted list, change the command to {calling_fn}(..., trust_repo=False) and a command prompt will appear asking for an explicit confirmation of trust, or load(..., trust_repo=True), which will assume that the prompt is to be answered with 'yes'. You can also use load(..., trust_repo='check') which will only prompt for confirmation if the repo is not already trusted. This will eventually be the default behaviour + /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-545/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/torch/hub.py:267: UserWarning: You are about to download and run code from an untrusted repository. In a future release, this won't be allowed. To add the repository to your trusted list, change the command to {calling_fn}(..., trust_repo=False) and a command prompt will appear asking for an explicit confirmation of trust, or load(..., trust_repo=True), which will assume that the prompt is to be answered with 'yes'. You can also use load(..., trust_repo='check') which will only prompt for confirmation if the repo is not already trusted. This will eventually be the default behaviour warnings.warn( Downloading: "https://github.com/rwightman/gen-efficientnet-pytorch/zipball/master" to model/master.zip Downloading: "https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/tf_efficientnet_lite3-b733e338.pth" to model/checkpoints/tf_efficientnet_lite3-b733e338.pth @@ -297,8 +294,10 @@ depth prediction model as you will see. 0%| | 0.00/81.8M [00:00`__ @@ -318,8 +317,10 @@ process. if list_file.is_file(): list_file.unlink() -Transformation of models -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Transformation of models +~~~~~~~~~~~~~~~~~~~~~~~~ + + Each of the models need an appropriate transformation which can be invoked by the ``get_model_transforms`` function. It needs only the @@ -353,8 +354,10 @@ model are always in direct correspondence with each other. depth_model_transform, scale_map_learner_transform = get_model_transforms(depth_predictor='midas_small', nsamples=NSAMPLES) -Dummy input creation -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +Dummy input creation +^^^^^^^^^^^^^^^^^^^^ + + Dummy inputs are necessary for `PyTorch to ONNX `__ @@ -437,8 +440,10 @@ dataset # Transform the dummy input image for the depth model transformed_dummy_image = transform_image_for_depth(input_image=dummy_input, depth_model_transform=depth_model_transform) -Conversion of depth model to OpenVINO IR format -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +Conversion of depth model to OpenVINO IR format +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + + The OpenVINO™ toolkit doesn’t provide any direct method of converting PyTorch models to the intermediate representation format. To have a @@ -468,21 +473,23 @@ we shall follow the following steps: .. parsed-literal:: - /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-534/.workspace/scm/ov-notebook/notebooks/246-depth-estimation-videpth/model/rwightman_gen-efficientnet-pytorch_master/geffnet/conv2d_layers.py:47: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! - /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-534/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/torch/onnx/_internal/jit_utils.py:258: UserWarning: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. (Triggered internally at ../torch/csrc/jit/passes/onnx/shape_type_inference.cpp:1884.) + /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-545/.workspace/scm/ov-notebook/notebooks/246-depth-estimation-videpth/model/rwightman_gen-efficientnet-pytorch_master/geffnet/conv2d_layers.py:47: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! + /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-545/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/torch/onnx/_internal/jit_utils.py:258: UserWarning: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. (Triggered internally at ../torch/csrc/jit/passes/onnx/shape_type_inference.cpp:1884.) _C._jit_pass_onnx_node_shape_type_inference(node, params_dict, opset_version) - /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-534/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/torch/onnx/utils.py:687: UserWarning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied. (Triggered internally at ../torch/csrc/jit/passes/onnx/constant_fold.cpp:179.) + /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-545/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/torch/onnx/utils.py:687: UserWarning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied. (Triggered internally at ../torch/csrc/jit/passes/onnx/constant_fold.cpp:179.) _C._jit_pass_onnx_graph_shape_type_inference( - /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-534/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/torch/onnx/utils.py:687: UserWarning: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. (Triggered internally at ../torch/csrc/jit/passes/onnx/shape_type_inference.cpp:1884.) + /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-545/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/torch/onnx/utils.py:687: UserWarning: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. (Triggered internally at ../torch/csrc/jit/passes/onnx/shape_type_inference.cpp:1884.) _C._jit_pass_onnx_graph_shape_type_inference( - /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-534/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/torch/onnx/utils.py:1178: UserWarning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied. (Triggered internally at ../torch/csrc/jit/passes/onnx/constant_fold.cpp:179.) + /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-545/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/torch/onnx/utils.py:1178: UserWarning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied. (Triggered internally at ../torch/csrc/jit/passes/onnx/constant_fold.cpp:179.) _C._jit_pass_onnx_graph_shape_type_inference( - /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-534/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/torch/onnx/utils.py:1178: UserWarning: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. (Triggered internally at ../torch/csrc/jit/passes/onnx/shape_type_inference.cpp:1884.) + /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-545/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/torch/onnx/utils.py:1178: UserWarning: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. (Triggered internally at ../torch/csrc/jit/passes/onnx/shape_type_inference.cpp:1884.) _C._jit_pass_onnx_graph_shape_type_inference( -Select inference device -''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' +Select inference device +''''''''''''''''''''''' + + select device from dropdown list for running inference using OpenVINO @@ -510,8 +517,10 @@ select device from dropdown list for running inference using OpenVINO -Compilation of depth model -'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' +Compilation of depth model +'''''''''''''''''''''''''' + + Now we can go ahead and compile our depth models from the ``.onnx`` file path. We will not perform serialization because we don’t plan to re-read @@ -569,8 +578,10 @@ depth estimation model as it is. depth_pred_dummy = run_depth_model(input_image_h=IMAGE_H, input_image_w=IMAGE_W, transformed_image=transformed_dummy_image, compiled_depth_model=compiled_depth_model) -Computation of scale and shift parameters -''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' +Computation of scale and shift parameters +''''''''''''''''''''''''''''''''''''''''' + + Computation of these parameters required the depth estimation model output from the previous step. These are the regression based parameters @@ -677,8 +688,10 @@ purpose has already been created. scale_map_learner_transform=scale_map_learner_transform, int_depth=d_depth, int_scales=d_scales) -Conversion of Scale Map Learner model to OpenVINO IR format -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +Conversion of Scale Map Learner model to OpenVINO IR format +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + + The OpenVINO™ toolkit doesn’t provide any direct method of converting PyTorch models to the intermediate representation format. To have the @@ -754,21 +767,23 @@ common format of all checkpoint files from the model releases. .. parsed-literal:: - /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-534/.workspace/scm/ov-notebook/notebooks/246-depth-estimation-videpth/model/rwightman_gen-efficientnet-pytorch_master/geffnet/conv2d_layers.py:47: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! - /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-534/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/torch/onnx/_internal/jit_utils.py:258: UserWarning: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. (Triggered internally at ../torch/csrc/jit/passes/onnx/shape_type_inference.cpp:1884.) + /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-545/.workspace/scm/ov-notebook/notebooks/246-depth-estimation-videpth/model/rwightman_gen-efficientnet-pytorch_master/geffnet/conv2d_layers.py:47: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! + /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-545/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/torch/onnx/_internal/jit_utils.py:258: UserWarning: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. (Triggered internally at ../torch/csrc/jit/passes/onnx/shape_type_inference.cpp:1884.) _C._jit_pass_onnx_node_shape_type_inference(node, params_dict, opset_version) - /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-534/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/torch/onnx/utils.py:687: UserWarning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied. (Triggered internally at ../torch/csrc/jit/passes/onnx/constant_fold.cpp:179.) + /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-545/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/torch/onnx/utils.py:687: UserWarning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied. (Triggered internally at ../torch/csrc/jit/passes/onnx/constant_fold.cpp:179.) _C._jit_pass_onnx_graph_shape_type_inference( - /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-534/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/torch/onnx/utils.py:687: UserWarning: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. (Triggered internally at ../torch/csrc/jit/passes/onnx/shape_type_inference.cpp:1884.) + /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-545/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/torch/onnx/utils.py:687: UserWarning: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. (Triggered internally at ../torch/csrc/jit/passes/onnx/shape_type_inference.cpp:1884.) _C._jit_pass_onnx_graph_shape_type_inference( - /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-534/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/torch/onnx/utils.py:1178: UserWarning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied. (Triggered internally at ../torch/csrc/jit/passes/onnx/constant_fold.cpp:179.) + /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-545/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/torch/onnx/utils.py:1178: UserWarning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied. (Triggered internally at ../torch/csrc/jit/passes/onnx/constant_fold.cpp:179.) _C._jit_pass_onnx_graph_shape_type_inference( - /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-534/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/torch/onnx/utils.py:1178: UserWarning: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. (Triggered internally at ../torch/csrc/jit/passes/onnx/shape_type_inference.cpp:1884.) + /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-545/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/torch/onnx/utils.py:1178: UserWarning: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. (Triggered internally at ../torch/csrc/jit/passes/onnx/shape_type_inference.cpp:1884.) _C._jit_pass_onnx_graph_shape_type_inference( -Select inference device -''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' +Select inference device +''''''''''''''''''''''' + + select device from dropdown list for running inference using OpenVINO @@ -785,8 +800,10 @@ select device from dropdown list for running inference using OpenVINO -Compilation of the ScaleMapLearner(SML) model -''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' +Compilation of the ScaleMapLearner(SML) model +''''''''''''''''''''''''''''''''''''''''''''' + + Now we can go ahead and compile our SML model from the ``.onnx`` file path. We will not perform serialization because we don’t plan to re-read @@ -847,8 +864,10 @@ SML model as it is. transformed_image_for_depth_scale=transformed_dummy_image_scale, compiled_scale_map_learner=compiled_scale_map_learner) -Storing and visualizing dummy results obtained -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +Storing and visualizing dummy results obtained +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + + .. code:: ipython3 @@ -902,8 +921,10 @@ Storing and visualizing dummy results obtained .. image:: 246-depth-estimation-videpth-with-output_files/246-depth-estimation-videpth-with-output_48_2.png -Running inference on a test image -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Running inference on a test image +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + Now role of both the dummy inputs i.e. the dummy image as well as its associated depth map is now over. Since we have access to the compiled @@ -993,8 +1014,10 @@ present*\ `here `__. -2. Users may choose to download the original and raw datasets from - the `VOID - dataset `__. -3. The `isl-org/VI-Depth `__ - works on a slightly older version of released model assets from - its `MiDaS sibling - repository `__. However, the new - releases beginning from - `v3.1 `__ - directly have OpenVINO™ ``.xml`` and ``.bin`` model files as their - assets thereby rendering the **major pre-processing and model - compilation step irrelevant**. +Concluding notes +~~~~~~~~~~~~~~~~ + + + + 1. The code for this tutorial is adapted from the `VI-Depth + repository `__. + 2. Users may choose to download the original and raw datasets from + the `VOID + dataset `__. + 3. The `isl-org/VI-Depth `__ + works on a slightly older version of released model assets from + its `MiDaS sibling + repository `__. However, the new + releases beginning from + `v3.1 `__ + directly have OpenVINO™ ``.xml`` and ``.bin`` model files as their + assets thereby rendering the **major pre-processing and model + compilation step irrelevant**. diff --git a/docs/notebooks/246-depth-estimation-videpth-with-output_files/index.html b/docs/notebooks/246-depth-estimation-videpth-with-output_files/index.html index 599ce48df64586..32036d2181fc9a 100644 --- a/docs/notebooks/246-depth-estimation-videpth-with-output_files/index.html +++ b/docs/notebooks/246-depth-estimation-videpth-with-output_files/index.html @@ -1,8 +1,8 @@ -Index of /projects/ov-notebook/0.1.0-latest/20231030220807/dist/rst_files/246-depth-estimation-videpth-with-output_files/ +Index of /projects/ov-notebook/0.1.0-latest/20231114220808/dist/rst_files/246-depth-estimation-videpth-with-output_files/ -

Index of /projects/ov-notebook/0.1.0-latest/20231030220807/dist/rst_files/246-depth-estimation-videpth-with-output_files/


../
-246-depth-estimation-videpth-with-output_48_2.png  31-Oct-2023 00:35              215788
-246-depth-estimation-videpth-with-output_53_2.png  31-Oct-2023 00:35              190117
+

Index of /projects/ov-notebook/0.1.0-latest/20231114220808/dist/rst_files/246-depth-estimation-videpth-with-output_files/


../
+246-depth-estimation-videpth-with-output_48_2.png  15-Nov-2023 00:43              215788
+246-depth-estimation-videpth-with-output_53_2.png  15-Nov-2023 00:43              190117
 

diff --git a/docs/notebooks/247-code-language-id-with-output.rst b/docs/notebooks/247-code-language-id-with-output.rst index 22c7854ab78af0..5b982bd26ce7e8 100644 --- a/docs/notebooks/247-code-language-id-with-output.rst +++ b/docs/notebooks/247-code-language-id-with-output.rst @@ -16,7 +16,6 @@ navigation. **Table of contents:** - - `Introduction <#introduction>`__ - `Task <#task>`__ @@ -27,8 +26,7 @@ navigation. - `Install prerequisites <#install-prerequisites>`__ - `Imports <#imports>`__ - - `Setting up HuggingFace - cache <#setting-up-huggingface-cache>`__ + - `Setting up HuggingFace cache <#setting-up-huggingface-cache>`__ - `Select inference device <#select-inference-device>`__ - `Download resources <#download-resources>`__ - `Create inference pipeline <#create-inference-pipeline>`__ @@ -51,11 +49,15 @@ navigation. - `Additional resources <#additional-resources>`__ - `Clean up <#clean-up>`__ -Introduction ------------------------------------------------------- +Introduction +------------ + + + +Task +~~~~ + -Task -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ **Programming language classification** is the task of identifying which programming language is used in an arbitrary code snippet. This can be @@ -80,8 +82,10 @@ formal, their symbols, syntax, and grammar can be revised and updated. For example, the walrus operator (``:=``) was a symbol distinctively used in Golang, but was later introduced in Python 3.8. -Model -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Model +~~~~~ + + The classification model that will be used in this notebook is `CodeBERTa-language-id `__ @@ -95,8 +99,10 @@ dataset (Husain, 2019). It supports 6 programming languages: - Go - Java - JavaScript - PHP - Python - Ruby -Part 1: Inference pipeline with OpenVINO ----------------------------------------------------------------------------------- +Part 1: Inference pipeline with OpenVINO +---------------------------------------- + + For this section, we will use the `HuggingFace Optimum `__ library, which @@ -105,8 +111,10 @@ OpenVINO toolkit. The code will be very similar to the `HuggingFace Transformers `__, but will allow to automatically convert models to the OpenVINO™ IR format. -Install prerequisites -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Install prerequisites +~~~~~~~~~~~~~~~~~~~~~ + + First, complete the `repository installation steps <../notebooks_installation.html>`__. @@ -115,7 +123,7 @@ OpenVINO support - HuggingFace Evaluate to benchmark results .. code:: ipython3 - %pip install -q "diffusers>=0.17.1" "openvino>=2023.1.0" "nncf>=2.5.0" "gradio" "onnx>=1.11.0" "onnxruntime>=1.14.0" "transformers>=4.31.0" "evaluate" + %pip install -q "diffusers>=0.17.1" "openvino>=2023.1.0" "nncf>=2.5.0" "gradio" "onnx>=1.11.0" "transformers>=4.33.0" "evaluate" %pip install -q "git+https://github.com/huggingface/optimum-intel.git" @@ -123,17 +131,18 @@ OpenVINO support - HuggingFace Evaluate to benchmark results DEPRECATION: pytorch-lightning 1.6.5 has a non-standard dependency specifier torch>=1.8.*. pip 24.0 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of pytorch-lightning or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063 ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. - onnxconverter-common 1.14.0 requires protobuf==3.20.2, but you have protobuf 4.24.4 which is incompatible. - pytorch-lightning 1.6.5 requires protobuf<=3.20.1, but you have protobuf 4.24.4 which is incompatible. - tensorflow 2.13.1 requires typing-extensions<4.6.0,>=3.6.6, but you have typing-extensions 4.8.0 which is incompatible. - tf2onnx 1.15.1 requires protobuf~=3.20.2, but you have protobuf 4.24.4 which is incompatible. + onnxconverter-common 1.14.0 requires protobuf==3.20.2, but you have protobuf 4.25.0 which is incompatible. + pytorch-lightning 1.6.5 requires protobuf<=3.20.1, but you have protobuf 4.25.0 which is incompatible. + tf2onnx 1.15.1 requires protobuf~=3.20.2, but you have protobuf 4.25.0 which is incompatible. Note: you may need to restart the kernel to use updated packages. DEPRECATION: pytorch-lightning 1.6.5 has a non-standard dependency specifier torch>=1.8.*. pip 24.0 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of pytorch-lightning or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063 Note: you may need to restart the kernel to use updated packages. -Imports -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Imports +~~~~~~~ + + The import ``OVModelForSequenceClassification`` from Optimum is equivalent to ``AutoModelForSequenceClassification`` from Transformers @@ -154,10 +163,10 @@ equivalent to ``AutoModelForSequenceClassification`` from Transformers .. parsed-literal:: - 2023-10-31 00:04:18.151817: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`. - 2023-10-31 00:04:18.186093: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. + 2023-11-15 00:06:22.342451: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`. + 2023-11-15 00:06:22.376717: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. - 2023-10-31 00:04:18.771332: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT + 2023-11-15 00:06:22.962059: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT .. parsed-literal:: @@ -168,12 +177,12 @@ equivalent to ``AutoModelForSequenceClassification`` from Transformers .. parsed-literal:: No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda' - /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-534/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations - warnings.warn( -Setting up HuggingFace cache -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Setting up HuggingFace cache +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + Resources from HuggingFace will be downloaded in the local folder ``./model`` (next to this notebook) instead of the device global cache @@ -186,8 +195,10 @@ for easy cleanup. Learn more MODEL_ID = f"huggingface/{MODEL_NAME}" MODEL_LOCAL_PATH = Path("./model").joinpath(MODEL_NAME) -Select inference device -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Select inference device +~~~~~~~~~~~~~~~~~~~~~~~ + + select device from dropdown list for running inference using OpenVINO @@ -216,8 +227,10 @@ select device from dropdown list for running inference using OpenVINO -Download resources -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Download resources +~~~~~~~~~~~~~~~~~~ + + .. code:: ipython3 @@ -251,7 +264,7 @@ Download resources - This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model). - This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). Using the export variant default. Available variants are: - - default: The default ONNX variant. + - default: The default ONNX variant. Using framework PyTorch: 1.13.1+cpu Overriding 1 configuration item(s) - use_cache -> False @@ -266,23 +279,26 @@ Download resources [ WARNING ] Please fix your imports. Module %s has been moved to %s. The old module will be deleted in version %s. Compiling the model to AUTO ... - Set CACHE_DIR to /tmp/tmpbwk74vw4/model_cache .. parsed-literal:: - Ressources cached locally at: /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-534/.workspace/scm/ov-notebook/notebooks/247-code-language-id/model/CodeBERTa-language-id + Ressources cached locally at: /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-545/.workspace/scm/ov-notebook/notebooks/247-code-language-id/model/CodeBERTa-language-id + + +Create inference pipeline +~~~~~~~~~~~~~~~~~~~~~~~~~ -Create inference pipeline -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code:: ipython3 code_classification_pipe = pipeline("text-classification", model=model, tokenizer=tokenizer) -Inference on new input -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Inference on new input +~~~~~~~~~~~~~~~~~~~~~~ + + .. code:: ipython3 @@ -304,8 +320,10 @@ Inference on new input Predicted score: 0.81 -Part 2: OpenVINO post-training quantization with HuggingFace Optimum --------------------------------------------------------------------------------------------------------------- +Part 2: OpenVINO post-training quantization with HuggingFace Optimum +-------------------------------------------------------------------- + + In this section, we will quantize a trained model. At a high-level, this process consists of using lower precision numbers in the model, which @@ -317,8 +335,10 @@ The HuggingFace Optimum library supports post-training quantization for OpenVINO. `Learn more `__. -Define constants and functions -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Define constants and functions +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + .. code:: ipython3 @@ -357,8 +377,10 @@ Define constants and functions return Dataset.from_list(examples) -Load resources -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Load resources +~~~~~~~~~~~~~~ + + NOTE: the base model is loaded using ``AutoModelForSequenceClassification`` from ``Transformers`` @@ -379,8 +401,10 @@ NOTE: the base model is loaded using - This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). -Load calibration dataset -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Load calibration dataset +~~~~~~~~~~~~~~~~~~~~~~~~ + + The ``get_dataset_sample()`` function will sample up to ``num_samples``, with an equal number of examples across the 6 programming languages. @@ -402,22 +426,16 @@ NOTE: Uncomment the method below to download and use the full dataset # ) -.. parsed-literal:: - - huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... - To disable this warning, you can either: - - Avoid using `tokenizers` before the fork if possible - - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) - - .. parsed-literal:: Map: 0%| | 0/120 [00:00 base 1.0 - 2.246393 - 53.418981 - 0.018720 + 2.340578 + 51.269396 + 0.019505 quantized 1.0 - 3.090061 - 38.834182 - 0.025751 + 3.334829 + 35.983857 + 0.027790 @@ -690,16 +716,16 @@ displayed. -Additional resources --------------------------------------------------------------- +Additional resources +-------------------- + + - `Grammatical Error Correction with OpenVINO `__ +- `Quantize a Hugging Face Question-Answering Model with OpenVINO `__ \ \*\* + +Clean up +-------- -- `Grammatical Error Correction with - OpenVINO `__ -- `Quantize a Hugging Face Question-Answering Model with - OpenVINO `__\ \*\* -Clean up --------------------------------------------------- Uncomment and run cell below to delete all resources cached locally in ./model diff --git a/docs/notebooks/250-music-generation-with-output.rst b/docs/notebooks/250-music-generation-with-output.rst index 52bc565acaa909..181a0111215d28 100644 --- a/docs/notebooks/250-music-generation-with-output.rst +++ b/docs/notebooks/250-music-generation-with-output.rst @@ -31,28 +31,49 @@ Transformers `__ library. **Table of contents:** ---- -- `Requirements and Imports <#prerequisites>`__ -- `Original Pipeline Inference <#musicgen-in-hf-transformers>`__ -- `Converting the Models to OpenVINO IR <#convert-models-to-openvino-intermediate-representation-ir-format>`__ -- `Convert Text Encoder <#convert-text-encoder>`__ -- `Convert MusicGen Language Model <#convert-musicgen-language-model>`__ -- `Convert Audio Decoder <#convert-audio-decoder>`__ -- `Embedding the Converted Models into the Pipeline <#embedding-the-converted-models-into-the-original-pipeline>`__ -- `Run Gradio App <#try-out-the-converted-pipeline>`__ +- `Prerequisites <#prerequisites>`__ + + - `Install requirements <#install-requirements>`__ + - `Imports <#imports>`__ + +- `MusicGen in HF Transformers <#musicgen-in-hf-transformers>`__ + + - `Original Pipeline Inference <#original-pipeline-inference>`__ + +- `Convert models to OpenVINO Intermediate representation (IR) + format <#convert-models-to-openvino-intermediate-representation-ir-format>`__ + + - `0. Set Up Variables <#-set-up-variables>`__ + - `1. Convert Text Encoder <#-convert-text-encoder>`__ + - `2. Convert MusicGen Language + Model <#-convert-musicgen-language-model>`__ + - `3. Convert Audio Decoder <#-convert-audio-decoder>`__ + +- `Embedding the converted models into the original + pipeline <#embedding-the-converted-models-into-the-original-pipeline>`__ + + - `Select inference device <#select-inference-device>`__ + - `Adapt OpenVINO models to the original + pipeline <#adapt-openvino-models-to-the-original-pipeline>`__ + +- `Try out the converted pipeline <#try-out-the-converted-pipeline>`__ Prerequisites ------------- + + Install requirements ~~~~~~~~~~~~~~~~~~~~ + + .. code:: ipython3 %pip install -q "openvino>=2023.1.0" - %pip install -q --extra-index-url https://download.pytorch.org/whl/cpu torch onnx gradio - %pip install -q transformers + %pip install -q --extra-index-url https://download.pytorch.org/whl/cpu torch onnx gradio ipywidgets + %pip install -q "transformers" .. parsed-literal:: @@ -68,9 +89,12 @@ Install requirements Imports ~~~~~~~ + + .. code:: ipython3 from collections import namedtuple + from functools import partial import gc from pathlib import Path from typing import Optional, Tuple @@ -90,15 +114,17 @@ Imports .. parsed-literal:: - 2023-10-31 00:07:14.058999: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`. - 2023-10-31 00:07:14.092895: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. + 2023-11-15 00:09:21.886779: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`. + 2023-11-15 00:09:21.920564: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. - 2023-10-31 00:07:14.669694: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT + 2023-11-15 00:09:22.467173: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT MusicGen in HF Transformers --------------------------- + + To work with `MusicGen `__ by Meta AI, we will use `Hugging Face Transformers @@ -137,6 +163,8 @@ and the desired music sample length. Original Pipeline Inference ~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + Text Preprocessing prepares the text prompt to be fed into the model, the ``processor`` object abstracts this step for us. Text tokenization is performed under the hood, it assigning tokens or IDs to the words; in @@ -163,7 +191,7 @@ vocabulary. It helps the model understand the context of a sentence. @@ -173,6 +201,8 @@ vocabulary. It helps the model understand the context of a sentence. Convert models to OpenVINO Intermediate representation (IR) format ------------------------------------------------------------------ + + Model conversion API enables direct conversion of PyTorch models. We will utilize the ``openvino.convert_model`` method to acquire OpenVINO IR versions of the models. The method requires a model object and @@ -197,6 +227,8 @@ Let us convert each model step by step. 0. Set Up Variables ~~~~~~~~~~~~~~~~~~~ + + .. code:: ipython3 models_dir = Path("./models") @@ -209,6 +241,8 @@ Let us convert each model step by step. 1. Convert Text Encoder ~~~~~~~~~~~~~~~~~~~~~~~ + + The text encoder is responsible for converting the input prompt, such as “90s rock song with loud guitars and heavy drums” into an embedding space that can be fed to the next model. Typically, it is a @@ -248,6 +282,8 @@ runtime `__ @@ -402,6 +442,8 @@ used to compile the model. Select inference device ^^^^^^^^^^^^^^^^^^^^^^^ + + Select device that will be used to do models inference using OpenVINO from the dropdown list: @@ -430,23 +472,22 @@ from the dropdown list: Adapt OpenVINO models to the original pipeline ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + Here we create wrapper classes for all three OpenVINO models that we want to embed in the original inference pipeline. Here are some of the -things to consider when adapting an OV model: - -- Make sure that parameters passed by the original pipeline are forwarded to the compiled - OV model properly; sometimes the OV model uses only a portion of the - input arguments and some are ignored, sometimes you need to convert the - argument to another data type or unwrap some data structures such as - tuples or dictionaries. -- Guarantee that the wrapper class returns - results to the pipeline in an expected format. In the example below you - can see how we pack OV model outputs into special classes declared in - the HF repo. -- Pay attention to the model method used in the original - pipeline for calling the model - it may be not the ``forward`` method! - Refer to the ``AudioDecoderWrapper`` to see how we wrap OV model - inference into the ``decode`` method. +things to consider when adapting an OV model: - Make sure that +parameters passed by the original pipeline are forwarded to the compiled +OV model properly; sometimes the OV model uses only a portion of the +input arguments and some are ignored, sometimes you need to convert the +argument to another data type or unwrap some data structures such as +tuples or dictionaries. - Guarantee that the wrapper class returns +results to the pipeline in an expected format. In the example below you +can see how we pack OV model outputs into special classes declared in +the HF repo. - Pay attention to the model method used in the original +pipeline for calling the model - it may be not the ``forward`` method! +Refer to the ``AudioDecoderWrapper`` to see how we wrap OV model +inference into the ``decode`` method. .. code:: ipython3 @@ -528,6 +569,57 @@ Now we initialize the wrapper objects and load them to the HF pipeline model.text_encoder = text_encode_ov model.decoder = musicgen_decoder_ov model.audio_encoder = audio_encoder_ov + + def prepare_inputs_for_generation( + self, + decoder_input_ids, + past_key_values=None, + attention_mask=None, + head_mask=None, + decoder_attention_mask=None, + decoder_head_mask=None, + cross_attn_head_mask=None, + use_cache=None, + encoder_outputs=None, + decoder_delay_pattern_mask=None, + guidance_scale=None, + **kwargs, + ): + if decoder_delay_pattern_mask is None: + decoder_input_ids, decoder_delay_pattern_mask = self.decoder.build_delay_pattern_mask( + decoder_input_ids, + self.generation_config.pad_token_id, + max_length=self.generation_config.max_length, + ) + + # apply the delay pattern mask + decoder_input_ids = self.decoder.apply_delay_pattern_mask(decoder_input_ids, decoder_delay_pattern_mask) + + if guidance_scale is not None and guidance_scale > 1: + # for classifier free guidance we need to replicate the decoder args across the batch dim (we'll split these + # before sampling) + decoder_input_ids = decoder_input_ids.repeat((2, 1)) + if decoder_attention_mask is not None: + decoder_attention_mask = decoder_attention_mask.repeat((2, 1)) + + if past_key_values is not None: + # cut decoder_input_ids if past is used + decoder_input_ids = decoder_input_ids[:, -1:] + + return { + "input_ids": None, # encoder_outputs is defined. input_ids not needed + "encoder_outputs": encoder_outputs, + "past_key_values": past_key_values, + "decoder_input_ids": decoder_input_ids, + "attention_mask": attention_mask, + "decoder_attention_mask": decoder_attention_mask, + "head_mask": head_mask, + "decoder_head_mask": decoder_head_mask, + "cross_attn_head_mask": cross_attn_head_mask, + "use_cache": use_cache, + } + + model.prepare_inputs_for_generation = partial(prepare_inputs_for_generation, model) We can now infer the pipeline backed by OpenVINO models. @@ -551,7 +643,7 @@ We can now infer the pipeline backed by OpenVINO models. @@ -561,6 +653,8 @@ We can now infer the pipeline backed by OpenVINO models. Try out the converted pipeline ------------------------------ + + The demo app below is created using `Gradio package `__ diff --git a/docs/notebooks/252-fastcomposer-image-generation-with-output.rst b/docs/notebooks/252-fastcomposer-image-generation-with-output.rst index 4b98381bd103a1..7535b2cad44538 100644 --- a/docs/notebooks/252-fastcomposer-image-generation-with-output.rst +++ b/docs/notebooks/252-fastcomposer-image-generation-with-output.rst @@ -27,9 +27,8 @@ different styles, actions, and contexts. transformers >= 4.30.1 (due to security vulnerability) **Table of contents:** ---- -- `Install Prerequisites <#install-prerequisites>`__ +- `Install Prerequisites <#install-prerequisites>`__ - `Convert models to OpenVINO Intermediate representation (IR) format <#convert-models-to-openvino-intermediate-representation-ir-format>`__ @@ -49,10 +48,10 @@ different styles, actions, and contexts. This tutorial requires about 25-28GB of free memory to generate one image. Each extra image requires ~11GB of free memory. -Install Prerequisites ---------------------------------------------------------------- +Install Prerequisites +--------------------- -Install required packages. + Install required packages. .. code:: ipython3 @@ -83,8 +82,10 @@ Download pretrained model. model_path = hf_hub_download(repo_id='mit-han-lab/fastcomposer', filename='pytorch_model.bin') -Convert models to OpenVINO Intermediate representation (IR) format ------------------------------------------------------------------------------------------------------------- +Convert models to OpenVINO Intermediate representation (IR) format +------------------------------------------------------------------ + + Define a configuration and make instance of ``FastComposerModel``. @@ -126,8 +127,10 @@ Pipeline consist of next models: ``Unet``, ``TextEncoder``, So, convert the models into OpenVINO IR format. -Convert text_encoder -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Convert text_encoder +~~~~~~~~~~~~~~~~~~~~ + + Model components are PyTorch modules, that can be converted with openvino.convert_model function directly. We also use @@ -173,8 +176,10 @@ padded to the maximum length accepted by the model. del model.text_encoder gc.collect(); -The Object Transform -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +The Object Transform +~~~~~~~~~~~~~~~~~~~~ + + It pads an incoming user image to square and resize it. An input is a tensor of size [3, height, width]. @@ -212,8 +217,10 @@ tensor of size [3, height, width]. del object_transforms gc.collect(); -The Image Encoder -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +The Image Encoder +~~~~~~~~~~~~~~~~~ + + The image encoder is a CLIP (Contrastive Language-Image Pretraining) Image Encoder. It takes a transformed image from the previous step as @@ -230,8 +237,10 @@ input and transforms it into a high-dimensional vector or embeddings. del model.image_encoder gc.collect(); -Postfuse module -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Postfuse module +~~~~~~~~~~~~~~~ + + On this step it is employed a multilayer perceptron (MLP) to augment the text embeddings with visual features extracted from the reference @@ -256,8 +265,10 @@ MLP. del model.postfuse_module gc.collect(); -Convert Unet -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Convert Unet +~~~~~~~~~~~~ + + U-Net model gradually denoises latent image representation guided by text encoder hidden state. @@ -280,8 +291,10 @@ text encoder hidden state. gc.collect() -Rebuild pipeline ----------------------------------------------------------- +Rebuild pipeline +---------------- + + Also, it needs to modify some internal FastComposer entities, to use OpenVINO models. First of all, how to get results. For example, to @@ -900,8 +913,10 @@ And replace all model in the pipeline by converted models. ) ) -Inference ---------------------------------------------------- +Inference +--------- + + And now it is possible to make inference. You can provide 1 or 2 images (``image1`` and ``image2``). If you want to provide only one image pass @@ -941,8 +956,10 @@ to display them. display(result[0][0]) -Run Gradio ----------------------------------------------------- +Run Gradio +---------- + + Also, it is possible to run with Gradio @@ -966,7 +983,7 @@ Also, it is possible to run with Gradio gr.Markdown(DESCRIPTION) with gr.Row(): with gr.Column(): - with gr.Box(): + with gr.Group(): image1 = gr.Image(label="Image 1", type="pil") gr.Examples( examples=["fastcomposer/data/newton.jpeg"], @@ -1031,9 +1048,7 @@ Also, it is possible to run with Gradio value=50, ) with gr.Column(): - result = gr.Gallery(label="Generated Images").style( - grid=[2], height="auto" - ) + result = gr.Gallery(label="Generated Images", columns=[2]) error_message = gr.Text(label="Job Status") inputs = [ diff --git a/docs/notebooks/254-llm-chatbot-with-output.rst b/docs/notebooks/254-llm-chatbot-with-output.rst index 18a150369530cd..78d6200e20da66 100644 --- a/docs/notebooks/254-llm-chatbot-with-output.rst +++ b/docs/notebooks/254-llm-chatbot-with-output.rst @@ -18,7 +18,7 @@ accuracy. Previously, we already discussed how to build an instruction-following pipeline using OpenVINO and Optimum Intel, please check out `Dolly -example <../240-dolly-2-instruction-following>`__ for reference. In this +example <240-dolly-2-instruction-following-with-output.html>`__ for reference. In this tutorial, we consider how to use the power of OpenVINO for running Large Language Models for chat. We will use a pre-trained model from the `Hugging Face @@ -33,38 +33,46 @@ The tutorial consists of the following steps: - Download and convert the model from a public source using the `OpenVINO integration with Hugging Face Optimum `__. -- Compress model weights to INT8 precision using +- Compress model weights to 4-bit or 8-bit data types using `NNCF `__ - Create a chat inference pipeline - Run chat pipeline **Table of contents:** -- `Prerequisites <#prerequisites>`__ -- `Select model for inference <#select-model-for-inference>`__ -- `Instantiate Model using Optimum Intel <#instantiate-model-using-optimum-intel>`__ -- `Compress model weights <#compress-model-weights>`__ -- `Weights Compression using Optimum Intel <#weights-compression-using-optimum-intel>`__ -- `Weights Compression using NNCF <#weights-compression-using-nncf->`__ -- `Select device for inference and model variant <#select-device-for-inference-and-model-variant->`__ -- `Run Chatbot <#run-chatbot>`__ +- `Prerequisites <#prerequisites>`__ +- `Select model for inference <#select-model-for-inference>`__ +- `login to huggingfacehub to get access to pretrained model <#login-to-huggingfacehub-to-get-access-to-pretrained-model>`__ +- `Instantiate Model using Optimum Intel <#instantiate-model-using-optimum-intel>`__ +- `Compress model weights <#compress-model-weights>`__ + + - `Weights Compression using Optimum Intel <#weights-compression-using-optimum-intel>`__ + - `Weights Compression using NNCF <#weights-compression-using-nncf>`__ + +- `Select device for inference and model variant <#select-device-for-inference-and-model-variant>`__ +- `Run Chatbot <#run-chatbot>`__ + +Prerequisites +------------- + -Prerequisites -------------------------------------------------------- Install required dependencies .. code:: ipython3 + %pip uninstall -q -y openvino-dev openvino openvino-nightly + %pip install -q openvino-nightly %pip install -q --extra-index-url https://download.pytorch.org/whl/cpu\ "git+https://github.com/huggingface/optimum-intel.git"\ - "nncf>=2.6.0"\ + "git+https://github.com/openvinotoolkit/nncf.git@release_v270"\ "gradio"\ - "onnx" "onnxruntime" "einops" "transformers>=4.31.0"\ - "openvino==2023.2.0.dev20230922" + "onnx" "einops" "transformers>=4.34.0"\ + +Select model for inference +-------------------------- + -Select model for inference --------------------------------------------------------------------- The tutorial supports different models, you can select one from the provided options to compare the quality of open source LLM solutions. @@ -101,8 +109,10 @@ The available options are: following code: .. code:: python + :force: ## login to huggingfacehub to get access to pretrained model + from huggingface_hub import notebook_login, whoami try: @@ -132,6 +142,16 @@ The available options are: `repository `__ and `HuggingFace model card `__. +- **zephyr-7b-beta** - Zephyr is a series of language models that are + trained to act as helpful assistants. Zephyr-7B-beta is the second + model in the series, and is a fine-tuned version of + `mistralai/Mistral-7B-v0.1 `__ + that was trained on on a mix of publicly available, synthetic + datasets using `Direct Preference Optimization + (DPO) `__. You can find more + details about model in `technical + report `__ and `HuggingFace model + card `__. .. code:: ipython3 @@ -144,7 +164,7 @@ The available options are: model_id = widgets.Dropdown( options=model_ids, - value=model_ids[0], + value=model_ids[-1], description='Model:', disabled=False, ) @@ -156,7 +176,7 @@ The available options are: .. parsed-literal:: - Dropdown(description='Model:', options=('red-pajama-3b-chat', 'llama-2-chat-7b', 'mpt-7b-chat'), value='red-pa… + Dropdown(description='Model:', index=3, options=('red-pajama-3b-chat', 'llama-2-chat-7b', 'mpt-7b-chat', 'zeph… @@ -168,11 +188,13 @@ The available options are: .. parsed-literal:: - Selected model red-pajama-3b-chat + Selected model zephyr-7b-beta + + +Instantiate Model using Optimum Intel +------------------------------------- -Instantiate Model using Optimum Intel -------------------------------------------------------------------------------- Optimum Intel can be used to load optimized models from the `Hugging Face Hub `__ and @@ -208,8 +230,9 @@ every time you want to generate a new token seems wasteful. With the cache, the model saves the hidden state once it has been computed. The model only computes the one for the most recently generated output token at each time step, re-using the saved ones for hidden tokens. This -reduces the generation complexity from O(n^3) to O(n^2) for a -transformer model. More details about how it works can be found in this +reduces the generation complexity from :math:`O(n^3)` to :math:`O(n^2)` +for a transformer model. More details about how it works can be found in +this `article `__. With this option, the model gets the previous step’s hidden states (cached attention keys and values) as input and additionally provides @@ -221,95 +244,7 @@ In our case, MPT model currently is not covered by Optimum Intel, we will convert it manually and create wrapper compatible with Optimum Intel. -Compress model weights ----------------------------------------------------------------- - -The Weights Compression algorithm is aimed at compressing the weights of -the models and can be used to optimize the model footprint and -performance of large models where the size of weights is relatively -larger than the size of activations, for example, Large Language Models -(LLM). - -Weights Compression using Optimum Intel -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -To enable weights compression via NNCF for models supported by Optimum -Intel ``OVQuantizer`` class should be used instantiated by PyTorch model -using ``from_pretrained`` method. -``OVQuantizer.quantize(save_directory=save_dir, weights_only=True)`` -enables weights compression and model conversion to OpenVINO -Intermediate Representation format. We will consider how to do it on -RedPajama and LLAMA examples. - - **Note**: This tutorial involves conversion model for both FP16 and - INT8 weights compression scenarios. It maybe memory and - time-consuming in first run. You can manually disable FP16 conversion - using CONVERT_FP16 variable below, CONVERT_INT8 variable can be used - for disabling conversion model with weights compression respectively. - -.. code:: ipython3 - - CONVERT_FP16 = True - CONVERT_INT8 = True - -.. code:: ipython3 - - from pathlib import Path - from optimum.intel import OVQuantizer - from transformers import AutoModelForCausalLM - from optimum.intel.openvino import OVModelForCausalLM - import logging - import nncf - import gc - - nncf.set_log_level(logging.ERROR) - - compressed_model_dir = Path(model_id.value) / "INT8_compressed_weights" - model_dir = Path(model_id.value) / "FP16" - pt_model_id = model_configuration["model_id"] - - if "mpt" not in model_id.value: - if CONVERT_INT8 and not compressed_model_dir.exists(): - pt_model = AutoModelForCausalLM.from_pretrained(pt_model_id) - quantizer = OVQuantizer.from_pretrained(pt_model) - quantizer.quantize(save_directory=compressed_model_dir, weights_only=True) - del quantizer - del pt_model - gc.collect() - - if CONVERT_FP16 and not model_dir.exists(): - ov_model = OVModelForCausalLM.from_pretrained(pt_model_id, export=True, compile=False) - ov_model.half() - ov_model.save_pretrained(model_dir) - del ov_model - gc.collect(); - - -.. parsed-literal:: - - INFO:nncf:NNCF initialized successfully. Supported frameworks detected: torch, tensorflow, onnx, openvino - - -.. parsed-literal:: - - No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda' - 2023-09-19 19:06:00.934297: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`. - 2023-09-19 19:06:00.971948: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. - To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. - 2023-09-19 19:06:01.591238: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT - /home/ea/work/ov_venv/lib/python3.8/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations - warnings.warn( - - -Weights Compression using NNCF -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -You also can perform weights compression for PyTorch models using NNCF -directly. ``nncf.compress_weights`` function accept PyTorch model -instance and compress its weights for Linear and Embedding layers. We -will consider this variant based on MPT model. - -To begin compression, we should define model conversion first. +Below is some code required for MPT conversion. .. code:: ipython3 @@ -318,6 +253,7 @@ To begin compression, we should define model conversion first. from transformers import AutoModelForCausalLM from nncf import compress_weights import openvino as ov + from pathlib import Path from typing import Optional, Union, Dict, Tuple, List def flattenize_inputs(inputs): @@ -390,7 +326,7 @@ To begin compression, we should define model conversion first. m_input.get_tensor().set_names({inp_name}) for out, out_name in zip(ov_model.outputs, outputs): - out.get_tensor().set_names({out_name}) + out.get_tensor().set_names({out_name}) ov_model.validate_nodes_and_infer_types() ov.save_model(ov_model, ov_out_path) @@ -398,47 +334,317 @@ To begin compression, we should define model conversion first. cleanup_torchscript_cache() del pt_model -Now, we know how to convert model to OpenVINO format, we can save -floating point and compressed model variants + +.. parsed-literal:: + + INFO:nncf:NNCF initialized successfully. Supported frameworks detected: torch, onnx, openvino + + +Compress model weights +---------------------------------------------------------------- + + + +The Weights Compression algorithm is aimed at compressing the weights of +the models and can be used to optimize the model footprint and +performance of large models where the size of weights is relatively +larger than the size of activations, for example, Large Language Models +(LLM). Compared to INT8 compression, INT4 compression improves +performance even more, but introduces a minor drop in prediction +quality. + +Weights Compression using Optimum Intel +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + + +To enable weights compression via NNCF for models supported by Optimum +Intel ``OVQuantizer`` class should be used for ``OVModelForCausalLM`` +model. +``OVQuantizer.quantize(save_directory=save_dir, weights_only=True)`` +enables weights compression. We will consider how to do it on RedPajama, +LLAMA and Zephyr examples. + + **Note**: Weights Compression using Optimum Intel currently supports + only INT8 compression. We will apply INT4 compression for these model + using NNCF API described below. + +.. + + **Note**: There may be no speedup for INT4/INT8 compressed models on + dGPU. + +Weights Compression using NNCF +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + + +You also can perform weights compression for OpenVINO models using NNCF +directly. ``nncf.compress_weights`` function accepts OpenVINO model +instance and compresses its weights for Linear and Embedding layers. We +will consider this variant based on MPT model. + + **Note**: This tutorial involves conversion model for FP16 and + INT4/INT8 weights compression scenarios. It may be memory and + time-consuming in the first run. You can manually control the + compression precision below. .. code:: ipython3 - compressed_model_dir = Path(model_id.value) / "INT8_compressed_weights" - model_dir = Path(model_id.value) / "FP16" + from IPython.display import display + + # TODO: red-pajama-3b-chat currently can't be compiled in INT4 or FP16 due to ticket 123973 + is_pajama_model = model_id.value == 'red-pajama-3b-chat' + prepare_int4_model = widgets.Checkbox( + value=True and not is_pajama_model, + description='Prepare INT4 model', + disabled=is_pajama_model, + ) + prepare_int8_model = widgets.Checkbox( + value=False or is_pajama_model, + description='Prepare INT8 model', + disabled=False, + ) + prepare_fp16_model = widgets.Checkbox( + value=False, + description='Prepare FP16 model', + disabled=is_pajama_model, + ) - if "mpt" in model_id.value and (not compressed_model_dir.exists() or not model_dir.exists()): - model = AutoModelForCausalLM.from_pretrained(model_configuration["model_id"], torch_dtype=torch.float32, trust_remote_code=True) - if CONVERT_FP16 and not model_dir.exists(): - convert_mpt(model, model_dir) - if CONVERT_INT8 and not compressed_model_dir.exists(): + display(prepare_int4_model) + display(prepare_int8_model) + display(prepare_fp16_model) + + + +.. parsed-literal:: + + Checkbox(value=True, description='Prepare INT4 model') + + + +.. parsed-literal:: + + Checkbox(value=False, description='Prepare INT8 model') + + + +.. parsed-literal:: + + Checkbox(value=False, description='Prepare FP16 model') + + +We can now save floating point and compressed model variants + +.. code:: ipython3 + + from pathlib import Path + from optimum.intel import OVQuantizer + from optimum.intel.openvino import OVModelForCausalLM + import shutil + import logging + import nncf + import gc + + nncf.set_log_level(logging.ERROR) + + pt_model_id = model_configuration["model_id"] + fp16_model_dir = Path(model_id.value) / "FP16" + int8_model_dir = Path(model_id.value) / "INT8_compressed_weights" + int4_model_dir = Path(model_id.value) / "INT4_compressed_weights" + + def convert_to_fp16(): + if (fp16_model_dir / "openvino_model.xml").exists(): + return + if "mpt" not in model_id.value: + ov_model = OVModelForCausalLM.from_pretrained(pt_model_id, export=True, compile=False) + ov_model.half() + ov_model.save_pretrained(fp16_model_dir) + del ov_model + else: + model = AutoModelForCausalLM.from_pretrained(model_configuration["model_id"], torch_dtype=torch.float32, trust_remote_code=True) + convert_mpt(model, fp16_model_dir) + del model + gc.collect() + + def convert_to_int8(): + if (int8_model_dir / "openvino_model.xml").exists(): + return + if "mpt" not in model_id.value: + if not fp16_model_dir.exists(): + ov_model = OVModelForCausalLM.from_pretrained(pt_model_id, export=True, compile=False) + ov_model.half() + else: + ov_model = OVModelForCausalLM.from_pretrained(fp16_model_dir, compile=False) + quantizer = OVQuantizer.from_pretrained(ov_model) + quantizer.quantize(save_directory=int8_model_dir, weights_only=True) + del quantizer + del ov_model + else: + convert_to_fp16() + model = ov.Core().read_model(fp16_model_dir / 'openvino_model.xml') compressed_model = compress_weights(model) - convert_mpt(compressed_model, compressed_model_dir) + ov.save_model(compressed_model, int8_model_dir / "openvino_model.xml") + shutil.copy(fp16_model_dir / 'config.json', int8_model_dir / 'config.json') + del model + del compressed_model + gc.collect() + - gc.collect(); + def convert_to_int4(group_size, ratio): + if (int4_model_dir / "openvino_model").exists(): + return + int4_model_dir.mkdir(parents=True, exist_ok=True) + if "mpt" not in model_id.value: + # TODO: remove compression via NNCF for non-MPT models when INT4 weight compression is added to optimum-intel + if not fp16_model_dir.exists(): + model = OVModelForCausalLM.from_pretrained(pt_model_id, export=True, compile=False) + model.half() + else: + model = OVModelForCausalLM.from_pretrained(fp16_model_dir, compile=False) + model.config.save_pretrained(int4_model_dir) + ov_model = model.model + del model + else: + convert_to_fp16() + ov_model = ov.Core().read_model(fp16_model_dir / 'openvino_model.xml') + shutil.copy(fp16_model_dir / 'config.json', int4_model_dir / 'config.json') + compressed_model = nncf.compress_weights(ov_model, mode=nncf.CompressWeightsMode.INT4_ASYM, group_size=group_size, ratio=ratio) + ov.save_model(compressed_model, int4_model_dir / 'openvino_model.xml') + del ov_model + del compressed_model + gc.collect() + + if prepare_fp16_model.value: + print("Apply weights compression to FP16 format") + convert_to_fp16() + if prepare_int8_model.value: + print("Apply weights compression to INT8 format") + convert_to_int8() + if prepare_int4_model.value: + print("Apply weights compression to INT4 format") + convert_to_int4(group_size=128, ratio=0.8) + + +.. parsed-literal:: + + No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda' + + +.. parsed-literal:: + + Apply weights compression to INT4 format + + +.. parsed-literal:: + + This architecture : mistral was not validated, only :bloom, marian, opt, gpt-neox, blenderbot-small, gpt2, blenderbot, pegasus, gpt-bigcode, codegen, llama, bart, gpt-neo architectures were validated, use at your own risk. + Framework not specified. Using pt to export to ONNX. + + + +.. parsed-literal:: + + Loading checkpoint shards: 0%| | 0/8 [00:00 True + /home/ea/work/openvino_notebooks/test_env/lib/python3.8/site-packages/transformers/models/mistral/modeling_mistral.py:795: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! + if input_shape[-1] > 1: + /home/ea/work/openvino_notebooks/test_env/lib/python3.8/site-packages/transformers/models/mistral/modeling_mistral.py:91: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! + if past_key_values_length > 0: + /home/ea/work/openvino_notebooks/test_env/lib/python3.8/site-packages/transformers/models/mistral/modeling_mistral.py:157: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! + if seq_len > self.max_seq_len_cached: + /home/ea/work/openvino_notebooks/test_env/lib/python3.8/site-packages/transformers/models/mistral/modeling_mistral.py:288: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! + if attn_weights.size() != (bsz, self.num_heads, q_len, kv_seq_len): + /home/ea/work/openvino_notebooks/test_env/lib/python3.8/site-packages/transformers/models/mistral/modeling_mistral.py:295: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! + if attention_mask.size() != (bsz, 1, q_len, kv_seq_len): + /home/ea/work/openvino_notebooks/test_env/lib/python3.8/site-packages/transformers/models/mistral/modeling_mistral.py:306: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! + if attn_output.size() != (bsz, self.num_heads, q_len, self.head_dim): + + + +.. parsed-literal:: + + Output() + + + +.. raw:: html + +

+
+
+
+
+.. raw:: html
+
+    
+    
+ + + + +.. parsed-literal:: + + Output() + + + +.. raw:: html + +

+
+
+
+
+.. raw:: html
+
+    
+    
+ + + +Let’s compare model size for different compression types .. code:: ipython3 - fp16_weights = model_dir / "openvino_model.bin" - int8_weights = compressed_model_dir / "openvino_model.bin" + fp16_weights = fp16_model_dir / "openvino_model.bin" + int8_weights = int8_model_dir / "openvino_model.bin" + int4_weights = int4_model_dir / "openvino_model.bin" if fp16_weights.exists(): - print(f'Size of FP16 model in MB is {fp16_weights.stat().st_size / 1024 / 1024}') - if int8_weights.exists(): - print(f'Size of model with INT8 compressed weights in MB is {int8_weights.stat().st_size / 1024 / 1024}') - if int8_weights.exists() and fp16_weights.exists(): - print(f"Model compression rate: {fp16_weights.stat().st_size / int8_weights.stat().st_size:.3f}") + print(f'Size of FP16 model is {fp16_weights.stat().st_size / 1024 / 1024:.2f} MB') + for precision, compressed_weights in zip([8, 4], [int8_weights, int4_weights]): + if compressed_weights.exists(): + print(f'Size of model with INT{precision} compressed weights is {compressed_weights.stat().st_size / 1024 / 1024:.2f} MB') + if compressed_weights.exists() and fp16_weights.exists(): + print(f"Compression rate for INT{precision} model: {fp16_weights.stat().st_size / compressed_weights.stat().st_size:.3f}") .. parsed-literal:: - Size of FP16 model in MB is 5299.166286468506 - Size of model with INT8 compressed weights in MB is 2659.578887939453 - Model compression rate: 1.992 + Size of model with INT4 compressed weights is 4374.50 MB Select device for inference and model variant --------------------------------------------------------------------------------------- + + + **Note**: There may be no speedup for INT4/INT8 compressed models on + dGPU. + .. code:: ipython3 core = ov.Core() @@ -448,37 +654,29 @@ Select device for inference and model variant description='Device:', disabled=False, ) + + device .. parsed-literal:: - VBox(children=(Dropdown(description='Device:', options=('CPU', 'GPU', 'AUTO'), value='CPU'), Checkbox(value=Tr… + Dropdown(description='Device:', options=('CPU', 'GPU', 'AUTO'), value='CPU') -.. code:: ipython3 - - int8_compressed_weights = widgets.Checkbox( - value=True, - description='Use compressed weights', - disabled=False - ) - - widgets.VBox([device, int8_compressed_weights]) - The cell below create ``OVMPTModel`` model wrapper based on ``OVModelForCausalLM`` model. .. code:: ipython3 - from transformers import AutoConfig + from transformers import AutoConfig, PretrainedConfig import torch - from optimum.intel.openvino import OVModelForCausalLM from optimum.utils import NormalizedTextConfig, NormalizedConfigManager from transformers.modeling_outputs import CausalLMOutputWithPast + from optimum.intel.openvino.utils import OV_XML_FILE_NAME import numpy as np from pathlib import Path @@ -582,17 +780,86 @@ The cell below create ``OVMPTModel`` model wrapper based on past_key_values = None return CausalLMOutputWithPast(logits=logits, past_key_values=past_key_values) + + @classmethod + def _from_pretrained( + cls, + model_id: Union[str, Path], + config: PretrainedConfig, + use_auth_token: Optional[Union[bool, str, None]] = None, + revision: Optional[Union[str, None]] = None, + force_download: bool = False, + cache_dir: Optional[str] = None, + file_name: Optional[str] = None, + subfolder: str = "", + from_onnx: bool = False, + local_files_only: bool = False, + load_in_8bit: bool = False, + **kwargs, + ): + model_path = Path(model_id) + default_file_name = OV_XML_FILE_NAME + file_name = file_name or default_file_name + + model_cache_path = cls._cached_file( + model_path=model_path, + use_auth_token=use_auth_token, + revision=revision, + force_download=force_download, + cache_dir=cache_dir, + file_name=file_name, + subfolder=subfolder, + local_files_only=local_files_only, + ) + + model = cls.load_model(model_cache_path, load_in_8bit=load_in_8bit) + init_cls = OVMPTModel + + return init_cls(model=model, config=config, model_save_dir=model_cache_path.parent, **kwargs) The cell below demonstrates how to instantiate model based on selected variant of model weights and inference device +.. code:: ipython3 + + available_models = [] + if int4_model_dir.exists(): + available_models.append("INT4") + if int8_model_dir.exists(): + available_models.append("INT8") + if fp16_model_dir.exists(): + available_models.append("FP16") + + model_to_run = widgets.Dropdown( + options=available_models, + value=available_models[0], + description='Model to run:', + disabled=False) + + model_to_run + + + + +.. parsed-literal:: + + Dropdown(description='Model to run:', options=('INT4',), value='INT4') + + + .. code:: ipython3 from pathlib import Path from optimum.intel.openvino import OVModelForCausalLM from transformers import AutoTokenizer - model_dir = Path(model_id.value) / ("FP16" if not int8_compressed_weights.value else "INT8_compressed_weights") + if model_to_run.value == "INT4": + model_dir = int4_model_dir + elif model_to_run.value == "INT8": + model_dir = int8_model_dir + else: + model_dir = fp16_model_dir + print(f"Loading model from {model_dir}") model_name = model_configuration["model_id"] ov_config = {'PERFORMANCE_HINT': 'LATENCY', 'NUM_STREAMS': '1', "CACHE_DIR": ""} @@ -605,8 +872,14 @@ variant of model weights and inference device .. parsed-literal:: + Loading model from zephyr-7b-beta/INT4_compressed_weights + + +.. parsed-literal:: + + Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. The argument `trust_remote_code` is to be used along with export=True. It will be ignored. - Compiling the model... + Compiling the model to CPU ... .. code:: ipython3 @@ -620,18 +893,19 @@ variant of model weights and inference device .. parsed-literal:: - Setting `pad_token_id` to `eos_token_id`:0 for open-end generation. - /home/ea/work/ov_venv/lib/python3.8/site-packages/optimum/intel/openvino/modeling_decoder.py:364: FutureWarning: `shared_memory` is deprecated and will be removed in 2024.0. Value of `shared_memory` is going to override `share_inputs` value. Please use only `share_inputs` explicitly. + /home/ea/work/openvino_notebooks/test_env/lib/python3.8/site-packages/optimum/intel/openvino/modeling_decoder.py:388: FutureWarning: `shared_memory` is deprecated and will be removed in 2024.0. Value of `shared_memory` is going to override `share_inputs` value. Please use only `share_inputs` explicitly. self.request.start_async(inputs, shared_memory=True) .. parsed-literal:: - 2 + 2 = 4. + 2 + 2 = 4 + + +Run Chatbot +----------- -Run Chatbot ------------------------------------------------------ Now, when model created, we can setup Chatbot interface using `Gradio `__. The diagram below illustrates how @@ -1014,27 +1288,7 @@ answers. # it creates a publicly shareable link for the interface. Read more in the docs: https://gradio.app/docs/ demo.launch() - -.. parsed-literal:: - - Running on local URL: http://127.0.0.1:7860 - - To create a public link, set `share=True` in `launch()`. - - - -.. .. raw:: html - -..
- - .. code:: ipython3 # please run this cell for stopping gradio interface demo.close() - - -.. parsed-literal:: - - Closing server running on port: 7860 - diff --git a/docs/notebooks/257-llava-multimodal-chatbot-with-output.rst b/docs/notebooks/257-llava-multimodal-chatbot-with-output.rst index 1234442a3b8780..717605194f33e5 100644 --- a/docs/notebooks/257-llava-multimodal-chatbot-with-output.rst +++ b/docs/notebooks/257-llava-multimodal-chatbot-with-output.rst @@ -36,14 +36,13 @@ The tutorial consists from following steps: - Install prerequisites - Prepare input processor and tokenizer - Download original model -- Compress model weights to INT8 using NNCF +- Compress model weights to 4 and 8 bits using NNCF - Convert model to OpenVINO Intermediate Representation (IR) format - Prepare OpenVINO-based inference pipeline - Run OpenVINO model **Table of contents:** - - `About model <#about-model>`__ - `Prerequisites <#prerequisites>`__ - `Build model tokenizer and image @@ -53,15 +52,11 @@ The tutorial consists from following steps: - `Prepare helpers for model conversion <#prepare-helpers-for-model-conversion>`__ - - `Convert and Optimize - Model <#convert-and-optimize-model>`__ + - `Convert and Optimize Model <#convert-and-optimize-model>`__ - - `instantiate PyTorch - model <#instantiate-pytorch-model>`__ - - `Compress Model weights to INT8 using - NNCF <#compress-model-weights-to-int-using-nncf>`__ - - `Convert model to OpenVINO IR - format <#convert-model-to-openvino-ir-format>`__ + - `Instantiate PyTorch model <#instantiate-pytorch-model>`__ + - `Compress Model weights to 4 and 8 bits using NNCF <#compress-model-weights-to--and--bits-using-nncf>`__ + - `Convert model to OpenVINO IR format <#convert-model-to-openvino-ir-format>`__ - `Prepare OpenVINO based inference pipeline <#prepare-openvino-based-inference-pipeline>`__ @@ -74,8 +69,10 @@ The tutorial consists from following steps: - `Interactive demo <#interactive-demo>`__ -About model ------------------------------------------------------ +About model +----------- + + LLaVA connects pre-trained `CLIP ViT-L/14 `__ visual encoder and large @@ -104,8 +101,10 @@ web-page `__, `paper `__ and `repo `__. -Prerequisites -------------------------------------------------------- +Prerequisites +------------- + + Install required dependencies @@ -113,12 +112,8 @@ Install required dependencies import sys - if sys.platform == "linux": - %pip install -q "torch==2.1.0" "torchvision" "torchaudio" --index-url https://download.pytorch.org/whl/cpu - else: - %pip install -q "torch==2.1.0" "torchvision" "torchaudio" - - %pip install -q "openvino==2023.2.0.dev20230922" "nncf>=2.6.0" "sentencepiece" "tokenizers>=0.12.1" "transformers>=4.31.0" "gradio" + %pip install -q "torch>=2.1.0" "torchvision" "torchaudio" --index-url https://download.pytorch.org/whl/cpu + %pip install -q "openvino-nightly==2023.2.0.dev20231102" "git+https://github.com/openvinotoolkit/nncf.git@release_v270" "sentencepiece" "tokenizers>=0.12.1" "transformers>=4.31.0,<4.35.0" "gradio" "einops" .. code:: ipython3 @@ -131,20 +126,10 @@ Install required dependencies sys.path.insert(0, str(repo_dir.resolve())) - -.. parsed-literal:: - - Cloning into 'LLaVA'... - remote: Enumerating objects: 1262, done. - remote: Counting objects: 100% (408/408), done. - remote: Compressing objects: 100% (127/127), done. - remote: Total 1262 (delta 343), reused 282 (delta 281), pack-reused 854 - Receiving objects: 100% (1262/1262), 11.94 MiB | 8.90 MiB/s, done. - Resolving deltas: 100% (789/789), done. +Build model tokenizer and image processor +----------------------------------------- -Build model tokenizer and image processor ------------------------------------------------------------------------------------ For starting work with model, we need understand how to prepare input data first. As it is already discussed before, LLaVA is multimodal model @@ -168,15 +153,6 @@ instruction. tokenizer = AutoTokenizer.from_pretrained(model_id) image_processor = CLIPImageProcessor.from_pretrained(config.mm_vision_tower) - -.. parsed-literal:: - - 2023-10-04 09:48:12.750646: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`. - 2023-10-04 09:48:12.789652: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. - To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. - 2023-10-04 09:48:13.494345: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT - - .. code:: ipython3 from llava.constants import ( @@ -200,8 +176,10 @@ instruction. else: context_len = 2048 -Build model and convert it to OpenVINO IR format ------------------------------------------------------------------------------------------- +Build model and convert it to OpenVINO IR format +------------------------------------------------ + + LLaVA is autoregressive transformer generative model, it means that each next model step depends from model output from previous step. The @@ -235,18 +213,21 @@ every time you want to generate a new token seems wasteful. With the cache, the model saves the hidden state once it has been computed. The model only computes the one for the most recently generated output token at each time step, re-using the saved ones for hidden tokens. This -reduces the generation complexity from O(n^3) to O(n^2) for a -transformer model. More details about how it works can be found in this +reduces the generation complexity from :math:`O(n^3)` to :math:`O(n^2)` +for a transformer model. More details about how it works can be found in +this `article `__. -Prepare helpers for model conversion -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Prepare helpers for model conversion +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + The code below preparing function for converting LLaVA model to OpenVINO Intermediate Representation format. It splits model on parts described above, prepare example inputs for each part and convert each part using `OpenVINO Model Conversion -API `__. +API `__. ``ov.convert_model`` function accepts PyTorch model instance and returns ``ov.Model`` object that represent model in OpenVINO format. It is ready to use for loading on device using ``ov.compile_model`` or can be saved @@ -259,6 +240,7 @@ on disk using ``ov.save_model``. import warnings import torch import openvino as ov + import nncf from typing import Optional, Tuple, List import torch.nn.functional as F @@ -380,7 +362,9 @@ on disk using ``ov.save_model``. return ov_model - def convert_llava_mpt(pt_model: torch.nn.Module, model_path: Path): + def convert_llava_mpt(pt_model: torch.nn.Module, model_path: Path, + image_encoder_wc_parameters: Optional[dict] = None, + llava_wc_parameters: Optional[dict] = None): """ LLaVA MPT model conversion function @@ -403,11 +387,14 @@ on disk using ``ov.save_model``. ov_model = ov.convert_model( model, example_input=torch.zeros((1, 3, 224, 224)), input=[(-1, 3, 224, 224)] ) + if image_encoder_wc_parameters is not None: + print("Applying weight compression to image encoder") + ov_model = nncf.compress_weights(ov_model, **image_encoder_wc_parameters) ov.save_model(ov_model, image_encoder_path) cleanup_torchscript_cache() del ov_model gc.collect() - print("Image Encoder model successfuly converted") + print("Image Encoder model successfully converted") if not token_embedding_model_path.exists(): model.forward = model.get_model().embed_tokens @@ -418,10 +405,10 @@ on disk using ``ov.save_model``. cleanup_torchscript_cache() del ov_model gc.collect() - print("Token Embedding model successfuly converted") + print("Token Embedding model successfully converted") if first_stage_model_path.exists() and second_stage_model_path.exists(): - print("LLaVA model successfuly converted") + print("LLaVA model successfully converted") del pt_model return model_wrap = ModelWrapper(model) @@ -445,6 +432,9 @@ on disk using ``ov.save_model``. model_wrap, example_input=example_input_first_stage ) ov_model = postprocess_converted_model(ov_model, output_names=outputs) + if llava_wc_parameters is not None: + print("Applying weight compression to first stage LLava model") + ov_model = nncf.compress_weights(ov_model, **llava_wc_parameters) ov.save_model(ov_model, first_stage_model_path) cleanup_torchscript_cache() del ov_model @@ -466,34 +456,49 @@ on disk using ``ov.save_model``. output_names=outputs, dynamic_shapes=dynamic_shapes ) - - ov.save_model(ov_model, ov_out_path / "llava_with_past.xml") - del ov_model + if llava_wc_parameters is not None: + print("Applying weight compression to second stage LLava model") + ov_model = nncf.compress_weights(ov_model, **llava_wc_parameters) + ov.save_model(ov_model, second_stage_model_path) cleanup_torchscript_cache() - print("LLaVA model successfuly converted") + del ov_model + gc.collect() + print("LLaVA model successfully converted") del model_wrap del pt_model -Convert and Optimize Model -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. parsed-literal:: + + INFO:nncf:NNCF initialized successfully. Supported frameworks detected: torch, openvino + + +Convert and Optimize Model +~~~~~~~~~~~~~~~~~~~~~~~~~~ + + Our model conversion and optimization consist of following steps: 1. -Download original PyTorch model. 2. Compress model weights to INT8 using -NNCF 3. Convert model to OpenVINO format and save it on disk. +Download original PyTorch model. 2. Compress model weights using NNCF 3. +Convert model to OpenVINO format and save it on disk. Let’s consider each step more deeply. -instantiate PyTorch model +Instantiate PyTorch model ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + + For creating PyTorch model we should use ``from_pretrained`` method of ``LlavaMPTForCausalLM`` model class. Model weights will be downloaded from `HuggingFace hub `__ during first run. It may takes some time and requires at least 13 Gb free space on disk. -Compress Model weights to INT8 using NNCF -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +Compress Model weights to 4 and 8 bits using NNCF +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + + For reducing memory consumption, weights compression optimization can be applied using `NNCF `__. Weight @@ -510,37 +515,62 @@ can benefit from weight compression in the following ways: latency of the memory access when computing the operations with weights, for example, Linear layers. -Currently, `Neural Network Compression Framework -(NNCF) `__ provides 8-bit -weight quantization as a compression method primarily designed to -optimize LLMs. The main difference between weights compression and full -model quantization (post-training quantization) is that activations -remain floating-point in the case of weights compression which leads to -a better accuracy. Weight compression for LLMs provides a solid -inference performance improvement which is on par with the performance -of the full model quantization. In addition, weight compression is -data-free and does not require a calibration dataset, making it easy to -use. +`Neural Network Compression Framework +(NNCF) `__ provides 4-bit / +8-bit mixed weight quantization as a compression method primarily +designed to optimize LLMs. The main difference between weights +compression and full model quantization (post-training quantization) is +that activations remain floating-point in the case of weights +compression which leads to a better accuracy. Weight compression for +LLMs provides a solid inference performance improvement which is on par +with the performance of the full model quantization. In addition, weight +compression is data-free and does not require a calibration dataset, +making it easy to use. ``nncf.compress_weights`` function can be used for performing weights -compression. It accepts PyTorch model that next can be converted to -OpenVINO model using Model Conversion API or OpenVINO Model after -conversion. +compression. The function accepts an OpenVINO model and other +compression parameters. Compared to INT8 compression, INT4 compression +improves performance even more, but introduces a minor drop in +prediction quality. More details about weights compression, can be found in `OpenVINO -documentation `__. +documentation `__. + + **Note**: There is no speedup for INT4 compressed models on dGPU. Convert model to OpenVINO IR format ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + + Convert model to OpenVINO format using conversion helper function defined above. +Please select below whether you would like to run INT4 weight +compression instead of INT8 weight compression. + .. code:: ipython3 - from nncf import compress_weights + import ipywidgets as widgets + + compression_mode = widgets.Dropdown( + options=['INT4', 'INT8'], + value='INT4', + description='Compression mode:', + disabled=False, + ) + + compression_mode + +.. code:: ipython3 + + if compression_mode.value == 'INT4': + compressed_model_dir = Path("llava-mpt/INT4_compressed_weights") + llava_wc_parameters = dict(mode=nncf.CompressWeightsMode.INT4_ASYM, group_size=128, ratio=0.8) + else: + compressed_model_dir = Path("llava-mpt/INT8_compressed_weights") + llava_wc_parameters = dict(mode=nncf.CompressWeightsMode.INT8) - compressed_model_dir = Path("llava-mpt/INT8_compressed_weights") if not compressed_model_dir.exists(): compressed_model_dir.mkdir(exist_ok=True, parents=True) config.save_pretrained(compressed_model_dir) @@ -554,15 +584,15 @@ defined above. model.eval() with torch.no_grad(): - model = compress_weights(model) - convert_llava_mpt(model, compressed_model_dir) + convert_llava_mpt(model, compressed_model_dir, + image_encoder_wc_parameters=dict(mode=nncf.CompressWeightsMode.INT8), + llava_wc_parameters=llava_wc_parameters) del model - gc.collect(); + gc.collect(); .. parsed-literal:: - INFO:nncf:NNCF initialized successfully. Supported frameworks detected: torch, tensorflow, onnx, openvino You are using config.init_device='cpu', but you can also use config.init_device="meta" with Composer + FSDP for fast initialization. @@ -574,28 +604,170 @@ defined above. .. parsed-literal:: - No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda' + No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda-11.7' + + +.. parsed-literal:: + + Applying weight compression to image encoder + INFO:nncf:Statistics of the bitwidth distribution: + +--------------+------------------+--------------------+ + | Num bits (N) | % all weight | % internal weights | + +==============+==================+====================+ + | 8 | 100% (139 / 139) | 100% (137 / 137) | + +--------------+------------------+--------------------+ + + + +.. parsed-literal:: + + Output() + + + +.. raw:: html + +

+
+
+
+
+.. raw:: html
+
+    
+    
+ + + +.. parsed-literal:: + + Image Encoder model successfully converted + Token Embedding model successfully converted + Applying weight compression to first stage LLava model + + + +.. parsed-literal:: + + Output() + + + +.. raw:: html + +

+
+
+
+
+.. raw:: html
+
+    
+    
+ .. parsed-literal:: - WARNING:tensorflow:Please fix your imports. Module tensorflow.python.training.tracking.base has been moved to tensorflow.python.trackable.base. The old module will be deleted in version 2.11. + INFO:nncf:Statistics of the bitwidth distribution: + +--------------+----------------+--------------------+ + | Num bits (N) | % all weight | % internal weights | + +==============+================+====================+ + | 8 | 24% (39 / 129) | 21% (37 / 127) | + +--------------+----------------+--------------------+ + | 4 | 76% (90 / 129) | 79% (90 / 127) | + +--------------+----------------+--------------------+ + + + +.. parsed-literal:: + + Output() + + + +.. raw:: html + +

+
+
+
+
+.. raw:: html
+
+    
+    
+ .. parsed-literal:: - [ WARNING ] Please fix your imports. Module %s has been moved to %s. The old module will be deleted in version %s. + Applying weight compression to second stage LLava model + .. parsed-literal:: - Image Encoder model successfuly converted - Token Embedding model successfuly converted - LLaVA model successfuly converted + Output() + + + +.. raw:: html + +

+
+
+
+
+.. raw:: html
+
+    
+    
+ + + +.. parsed-literal:: + + INFO:nncf:Statistics of the bitwidth distribution: + +--------------+----------------+--------------------+ + | Num bits (N) | % all weight | % internal weights | + +==============+================+====================+ + | 8 | 24% (39 / 129) | 21% (37 / 127) | + +--------------+----------------+--------------------+ + | 4 | 76% (90 / 129) | 79% (90 / 127) | + +--------------+----------------+--------------------+ + + + +.. parsed-literal:: + + Output() + + + +.. raw:: html + +

+
+
+
+
+.. raw:: html
+
+    
+    
+ + + +.. parsed-literal:: + + LLaVA model successfully converted + + +Prepare OpenVINO based inference pipeline +----------------------------------------- -Prepare OpenVINO based inference pipeline ------------------------------------------------------------------------------------ ``OVLlavaMPTForCausalLM`` class provides ease-to-use interface for using model in generation scenario. It is based on @@ -865,16 +1037,22 @@ documentation + diff --git a/docs/notebooks/257-llava-multimodal-chatbot-with-output_files/257-llava-multimodal-chatbot-with-output_19_1.jpg b/docs/notebooks/257-llava-multimodal-chatbot-with-output_files/257-llava-multimodal-chatbot-with-output_19_1.jpg deleted file mode 100644 index 29fc338b516a09..00000000000000 --- a/docs/notebooks/257-llava-multimodal-chatbot-with-output_files/257-llava-multimodal-chatbot-with-output_19_1.jpg +++ /dev/null @@ -1,3 +0,0 @@ -version https://git-lfs.github.com/spec/v1 -oid sha256:f825c10443339b42cb5e2415f48bb7bafb4e087fb29bce6d2feaf3c2f89788c8 -size 72374 diff --git a/docs/notebooks/257-llava-multimodal-chatbot-with-output_files/257-llava-multimodal-chatbot-with-output_19_1.png b/docs/notebooks/257-llava-multimodal-chatbot-with-output_files/257-llava-multimodal-chatbot-with-output_19_1.png deleted file mode 100644 index c1062ffb3d6d10..00000000000000 --- a/docs/notebooks/257-llava-multimodal-chatbot-with-output_files/257-llava-multimodal-chatbot-with-output_19_1.png +++ /dev/null @@ -1,3 +0,0 @@ -version https://git-lfs.github.com/spec/v1 -oid sha256:dde262e54da6d8dad5062989d7863db7cd85ac0403b9015a76f5884472f67ceb -size 599941 diff --git a/docs/notebooks/257-llava-multimodal-chatbot-with-output_files/257-llava-multimodal-chatbot-with-output_20_1.png b/docs/notebooks/257-llava-multimodal-chatbot-with-output_files/257-llava-multimodal-chatbot-with-output_20_1.png new file mode 100644 index 00000000000000..8fa51d6e9966e3 --- /dev/null +++ b/docs/notebooks/257-llava-multimodal-chatbot-with-output_files/257-llava-multimodal-chatbot-with-output_20_1.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6725b341bfb362ae84e4e807d3c6ef4189b8d87eb5cf895ee7cb0e37b63582a4 +size 539244 diff --git a/docs/notebooks/257-llava-multimodal-chatbot-with-output_files/index.html b/docs/notebooks/257-llava-multimodal-chatbot-with-output_files/index.html index 45960f065f4cfc..76276b096a486f 100644 --- a/docs/notebooks/257-llava-multimodal-chatbot-with-output_files/index.html +++ b/docs/notebooks/257-llava-multimodal-chatbot-with-output_files/index.html @@ -1,8 +1,7 @@ -Index of /projects/ov-notebook/0.1.0-latest/20231030220807/dist/rst_files/257-llava-multimodal-chatbot-with-output_files/ +Index of /projects/ov-notebook/0.1.0-latest/20231114220808/dist/rst_files/257-llava-multimodal-chatbot-with-output_files/ -

Index of /projects/ov-notebook/0.1.0-latest/20231030220807/dist/rst_files/257-llava-multimodal-chatbot-with-output_files/


../
-257-llava-multimodal-chatbot-with-output_19_1.jpg  31-Oct-2023 00:35               72374
-257-llava-multimodal-chatbot-with-output_19_1.png  31-Oct-2023 00:35              599941
+

Index of /projects/ov-notebook/0.1.0-latest/20231114220808/dist/rst_files/257-llava-multimodal-chatbot-with-output_files/


../
+257-llava-multimodal-chatbot-with-output_20_1.png  15-Nov-2023 00:43              539244
 

diff --git a/docs/notebooks/258-blip-diffusion-subject-generation-with-output.rst b/docs/notebooks/258-blip-diffusion-subject-generation-with-output.rst index fee1ea4415c238..6fc7153402be27 100644 --- a/docs/notebooks/258-blip-diffusion-subject-generation-with-output.rst +++ b/docs/notebooks/258-blip-diffusion-subject-generation-with-output.rst @@ -9,8 +9,7 @@ subjects with up to 20x speedup. In addition, BLIP-Diffusion can be flexibly combined with ControlNet and prompt-to-prompt to enable novel subject-driven generation and editing applications. -**Table of contents:** ---- +**Table of contents**: - `Prerequisites <#prerequisites>`__ - `Load the model <#load-the-model>`__ @@ -19,7 +18,7 @@ subject-driven generation and editing applications. - `Controlled subject-driven generation (Canny-edge) <#controlled-subject-driven-generation-canny-edge>`__ - `Controlled subject-driven generation (Scribble) <#controlled-subject-driven-generation-scribble>`__ - `Convert the model to OpenVINO Intermediate Representation (IR) <#convert-the-model-to-openvino-intermediate-representation-ir>`__ -- `QFormer <#qformer>`__ +- `Q-Former <#q-former>`__ - `Text encoder <#text-encoder>`__ - `ControlNet <#controlnet>`__ - `UNet <#unet>`__ @@ -34,13 +33,14 @@ subject-driven generation and editing applications. .. |image0| image:: https://github.com/salesforce/LAVIS/raw/main/projects/blip-diffusion/teaser-website.png Prerequisites -------------------------------------------------------- +------------- + + .. code:: ipython3 %pip install -q "openvino>=2023.1.0" matplotlib Pillow gradio - %pip install -q -extra-index-url https://download.pytorch.org/whl/cpu torch transformers accelerate controlnet_aux - %pip install -q "git+https://github.com/huggingface/diffusers.git" # TODO: Change to PyPI package where https://github.com/huggingface/diffusers/pull/4388 is included + %pip install -q -extra-index-url https://download.pytorch.org/whl/cpu torch transformers accelerate controlnet_aux "diffusers>=0.23.0" .. parsed-literal:: @@ -103,8 +103,10 @@ Prerequisites MODELS_DIR.mkdir(parents=True, exist_ok=True) DATA_DIR.mkdir(parents=True, exist_ok=True) -Load the model --------------------------------------------------------- +Load the model +-------------- + + We use Hugging Face ``diffusers`` library to load the model using ``from_pretrained`` method. @@ -147,11 +149,15 @@ We use Hugging Face ``diffusers`` library to load the model using urlretrieve(FLOWER_IMG_URL, FLOWER_IMG_PATH) urlretrieve(BAG_IMG_URL, BAG_IMG_PATH); -Infer the original model ------------------------------------------------------------------- +Infer the original model +------------------------ + + + +Zero-Shot subject-driven generation +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + -Zero-Shot subject-driven generation -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The pipeline takes a subject image and prompt text as input. The output is an image containing the subject with conditions from the prompt @@ -203,8 +209,10 @@ is an image containing the subject with conditions from the prompt .. image:: 258-blip-diffusion-subject-generation-with-output_files/258-blip-diffusion-subject-generation-with-output_12_0.png -Controlled subject-driven generation (Canny-edge) -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Controlled subject-driven generation (Canny-edge) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + The `Canny edge detector `__ is a @@ -277,8 +285,10 @@ description. .. image:: 258-blip-diffusion-subject-generation-with-output_files/258-blip-diffusion-subject-generation-with-output_16_0.png -Controlled subject-driven generation (Scribble) -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Controlled subject-driven generation (Scribble) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + `Holistically-Nested Edge Detection `__ (HED) is a deep @@ -347,8 +357,10 @@ edge map is the final output of HED and input of our diffusion model. .. image:: 258-blip-diffusion-subject-generation-with-output_files/258-blip-diffusion-subject-generation-with-output_19_0.png -Convert the model to OpenVINO Intermediate Representation (IR) --------------------------------------------------------------------------------------------------------- +Convert the model to OpenVINO Intermediate Representation (IR) +-------------------------------------------------------------- + + BLIP-Diffusion pipeline has the following structure: @@ -437,8 +449,10 @@ we clean after every conversion. gc.collect() -Q-Former -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Q-Former +~~~~~~~~ + + Q-Former was introduced in `BLIP-2 `__ paper and is a @@ -562,8 +576,10 @@ Original QFormer model takes raw text as input, so we redefine the -Text encoder -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Text encoder +~~~~~~~~~~~~ + + BLIP-Diffusion pipeline uses CLIP text encoder, the default encoder for Stable Diffusion-based models. The only difference is it allows for an @@ -612,8 +628,10 @@ embeddings, and interact with them using self-attention. -ControlNet -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +ControlNet +~~~~~~~~~~ + + The ControlNet model was introduced in `Adding Conditional Control to Text-to-Image Diffusion @@ -656,8 +674,10 @@ segmentation maps, and keypoints for pose detection. -UNet -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +UNet +~~~~ + + The `UNet `__ model is one of the most important components of a diffusion system because it @@ -665,6 +685,8 @@ facilitates the actual diffusion process. .. code:: ipython3 + from typing import Tuple + serialize_openvino( unet, UNET_PATH, @@ -688,6 +710,45 @@ facilitates the actual diffusion process. } + class UnetWrapper(torch.nn.Module): + def __init__( + self, + unet, + sample_dtype=torch.float32, + timestep_dtype=torch.int64, + encoder_hidden_states=torch.float32, + down_block_additional_residuals=torch.float32, + mid_block_additional_residual=torch.float32 + ): + super().__init__() + self.unet = unet + self.sample_dtype = sample_dtype + self.timestep_dtype = timestep_dtype + self.encoder_hidden_states_dtype = encoder_hidden_states + self.down_block_additional_residuals_dtype = down_block_additional_residuals + self.mid_block_additional_residual_dtype = mid_block_additional_residual + + def forward( + self, + sample:torch.Tensor, + timestep:torch.Tensor, + encoder_hidden_states:torch.Tensor, + down_block_additional_residuals:Tuple[torch.Tensor], + mid_block_additional_residual:torch.Tensor + ): + sample.to(self.sample_dtype) + timestep.to(self.timestep_dtype) + encoder_hidden_states.to(self.encoder_hidden_states_dtype) + down_block_additional_residuals = [res.to(self.down_block_additional_residuals_dtype) for res in down_block_additional_residuals] + mid_block_additional_residual.to(self.mid_block_additional_residual_dtype) + return self.unet( + sample, + timestep, + encoder_hidden_states, + down_block_additional_residuals=down_block_additional_residuals, + mid_block_additional_residual=mid_block_additional_residual + ) + def flatten_inputs(inputs): flat_inputs = [] for input_data in inputs: @@ -710,10 +771,7 @@ facilitates the actual diffusion process. } if not UNET_CONTROLNET_PATH.exists(): with torch.no_grad(): - ov_unet = ov.convert_model( - unet, - example_input=example_input, - ) + ov_unet = ov.convert_model(UnetWrapper(unet), example_input=example_input) flat_inputs = flatten_inputs(example_input.values()) for input_data, input_tensor in zip(flat_inputs, ov_unet.inputs): input_tensor.get_node().set_partial_shape(ov.PartialShape(input_data.shape)) @@ -733,8 +791,10 @@ facilitates the actual diffusion process. -Variational Autoencoder (VAE) -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Variational Autoencoder (VAE) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + The variational autoencoder (VAE) model with KL loss was introduced in `Auto-Encoding Variational @@ -772,8 +832,10 @@ decoder in separate ``torch.nn.Module``. -Select inference device -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Select inference device +~~~~~~~~~~~~~~~~~~~~~~~ + + select device from dropdown list for running inference using OpenVINO @@ -826,8 +888,10 @@ select device from dropdown list for running inference using OpenVINO vae = core.compile_model(VAE_PATH, device_name=device.value) -Inference ---------------------------------------------------- +Inference +--------- + + .. code:: ipython3 @@ -1072,8 +1136,10 @@ Inference ov_pipe = OvBlipDiffusionPipeline() -Zero-Shot subject-driven generation -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Zero-Shot subject-driven generation +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + .. code:: ipython3 @@ -1109,8 +1175,10 @@ Zero-Shot subject-driven generation .. image:: 258-blip-diffusion-subject-generation-with-output_files/258-blip-diffusion-subject-generation-with-output_52_0.png -Controlled subject-driven generation (Canny-edge) -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Controlled subject-driven generation (Canny-edge) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + .. code:: ipython3 @@ -1168,8 +1236,10 @@ Controlled subject-driven generation (Canny-edge) .. image:: 258-blip-diffusion-subject-generation-with-output_files/258-blip-diffusion-subject-generation-with-output_55_0.png -Controlled subject-driven generation (Scribble) -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Controlled subject-driven generation (Scribble) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + .. code:: ipython3 @@ -1223,8 +1293,10 @@ Controlled subject-driven generation (Scribble) .. image:: 258-blip-diffusion-subject-generation-with-output_files/258-blip-diffusion-subject-generation-with-output_58_0.png -Interactive inference ---------------------------------------------------------------- +Interactive inference +--------------------- + + .. code:: ipython3 diff --git a/docs/notebooks/258-blip-diffusion-subject-generation-with-output_files/index.html b/docs/notebooks/258-blip-diffusion-subject-generation-with-output_files/index.html index a5b7958047c553..3103cca3656887 100644 --- a/docs/notebooks/258-blip-diffusion-subject-generation-with-output_files/index.html +++ b/docs/notebooks/258-blip-diffusion-subject-generation-with-output_files/index.html @@ -1,12 +1,12 @@ -Index of /projects/ov-notebook/0.1.0-latest/20231030220807/dist/rst_files/258-blip-diffusion-subject-generation-with-output_files/ +Index of /projects/ov-notebook/0.1.0-latest/20231114220808/dist/rst_files/258-blip-diffusion-subject-generation-with-output_files/ -

Index of /projects/ov-notebook/0.1.0-latest/20231030220807/dist/rst_files/258-blip-diffusion-subject-generation-with-output_files/


../
-258-blip-diffusion-subject-generation-with-outp..> 31-Oct-2023 00:35              495502
-258-blip-diffusion-subject-generation-with-outp..> 31-Oct-2023 00:35              680845
-258-blip-diffusion-subject-generation-with-outp..> 31-Oct-2023 00:35              541801
-258-blip-diffusion-subject-generation-with-outp..> 31-Oct-2023 00:35              522726
-258-blip-diffusion-subject-generation-with-outp..> 31-Oct-2023 00:35              683108
-258-blip-diffusion-subject-generation-with-outp..> 31-Oct-2023 00:35              539707
+

Index of /projects/ov-notebook/0.1.0-latest/20231114220808/dist/rst_files/258-blip-diffusion-subject-generation-with-output_files/


../
+258-blip-diffusion-subject-generation-with-outp..> 15-Nov-2023 00:43              495502
+258-blip-diffusion-subject-generation-with-outp..> 15-Nov-2023 00:43              680845
+258-blip-diffusion-subject-generation-with-outp..> 15-Nov-2023 00:43              541801
+258-blip-diffusion-subject-generation-with-outp..> 15-Nov-2023 00:43              522726
+258-blip-diffusion-subject-generation-with-outp..> 15-Nov-2023 00:43              683108
+258-blip-diffusion-subject-generation-with-outp..> 15-Nov-2023 00:43              539707
 

diff --git a/docs/notebooks/260-pix2struct-docvqa-with-output.rst b/docs/notebooks/260-pix2struct-docvqa-with-output.rst index 23bc19c8c13535..73926ec941d0be 100644 --- a/docs/notebooks/260-pix2struct-docvqa-with-output.rst +++ b/docs/notebooks/260-pix2struct-docvqa-with-output.rst @@ -44,17 +44,17 @@ convert the model to OpenVINO™ IR format. **Table of contents:** - - `About Pix2Struct <#about-pixstruct>`__ - `Prerequisites <#prerequisites>`__ -- `Download and Convert - Model <#download-and-convert-model>`__ +- `Download and Convert Model <#download-and-convert-model>`__ - `Select inference device <#select-inference-device>`__ - `Test model inference <#test-model-inference>`__ - `Interactive demo <#interactive-demo>`__ -About Pix2Struct ----------------------------------------------------------- +About Pix2Struct +---------------- + + Pix2Struct is an image encoder - text decoder model that is trained on image-text pairs for various tasks, including image captioning and @@ -83,8 +83,10 @@ model can handle on-the-fly changes to the sequence length and resolution. To handle variable resolutions unambiguously, 2-dimensional absolute positional embeddings are used for the input patches. -Prerequisites -------------------------------------------------------- +Prerequisites +------------- + + First, we need to install the `Hugging Face Optimum `__ library @@ -97,10 +99,12 @@ documentation `__. .. code:: ipython3 %pip install -q torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu - %pip install -q "git+https://github.com/huggingface/optimum-intel.git" "openvino>=2023.1.0" transformers onnx gradio + %pip install -q "git+https://github.com/huggingface/optimum-intel.git" "openvino>=2023.1.0" "transformers>=4.33.0" onnx gradio + +Download and Convert Model +-------------------------- + -Download and Convert Model --------------------------------------------------------------------- Optimum Intel can be used to load optimized models from the `Hugging Face Hub `__ and @@ -159,8 +163,10 @@ applicable for other models from pix2struct family. warnings.warn( -Select inference device ------------------------------------------------------------------ +Select inference device +----------------------- + + select device from dropdown list for running inference using OpenVINO @@ -189,8 +195,10 @@ select device from dropdown list for running inference using OpenVINO -Test model inference --------------------------------------------------------------- +Test model inference +-------------------- + + The diagram below demonstrates how the model works: |pix2struct_diagram.png| @@ -222,7 +230,7 @@ by ``Pix2StructProcessor.decode`` Let’s see the model in action. For testing the model, we will use a screenshot from `OpenVINO -documentation `__ +documentation `__ .. code:: ipython3 @@ -273,8 +281,10 @@ documentation -Index of /projects/ov-notebook/0.1.0-latest/20231030220807/dist/rst_files/260-pix2struct-docvqa-with-output_files/ +Index of /projects/ov-notebook/0.1.0-latest/20231114220808/dist/rst_files/260-pix2struct-docvqa-with-output_files/ -

Index of /projects/ov-notebook/0.1.0-latest/20231030220807/dist/rst_files/260-pix2struct-docvqa-with-output_files/


../
-260-pix2struct-docvqa-with-output_11_0.jpg         31-Oct-2023 00:35              134092
-260-pix2struct-docvqa-with-output_11_0.png         31-Oct-2023 00:35              221889
+

Index of /projects/ov-notebook/0.1.0-latest/20231114220808/dist/rst_files/260-pix2struct-docvqa-with-output_files/


../
+260-pix2struct-docvqa-with-output_11_0.jpg         15-Nov-2023 00:43              134092
+260-pix2struct-docvqa-with-output_11_0.png         15-Nov-2023 00:43              221889
 

diff --git a/docs/notebooks/261-fast-segment-anything-with-output.rst b/docs/notebooks/261-fast-segment-anything-with-output.rst index 2e65fb6ae8504e..4710119100e349 100644 --- a/docs/notebooks/261-fast-segment-anything-with-output.rst +++ b/docs/notebooks/261-fast-segment-anything-with-output.rst @@ -28,26 +28,46 @@ the prompt. pipeline - **Table of contents:** ---- -- `Requirements and Imports <#prerequisites>`__ -- `Original Pipeline Inference <#fastsam-in-ultralytics>`__ -- `Converting the Model to OpenVINO IR <#convert-the-model-to-openvino-intermediate-representation-ir-format>`__ -- `Embedding the Converted Models into the Pipeline <#embedding-the-converted-models-into-the-original-pipeline>`__ -- `Run Gradio App <#try-out-the-converted-pipeline>`__ +- `Prerequisites <#prerequisites>`__ + + - `Install requirements <#install-requirements>`__ + - `Imports <#imports>`__ + +- `FastSAM in Ultralytics <#fastsam-in-ultralytics>`__ +- `Convert the model to OpenVINO Intermediate representation (IR) + format <#convert-the-model-to-openvino-intermediate-representation-ir-format>`__ +- `Embedding the converted models into the original + pipeline <#embedding-the-converted-models-into-the-original-pipeline>`__ + + - `Select inference device <#select-inference-device>`__ + - `Adapt OpenVINO models to the original + pipeline <#adapt-openvino-models-to-the-original-pipeline>`__ + +- `Optimize the model using NNCF Post-training Quantization + API <#optimize-the-model-using-nncf-post-training-quantization-api>`__ + + - `Compare the performance of the Original and Quantized + Models <#compare-the-performance-of-the-original-and-quantized-models>`__ + +- `Try out the converted pipeline <#try-out-the-converted-pipeline>`__ Prerequisites ------------- + + Install requirements ~~~~~~~~~~~~~~~~~~~~ + + .. code:: ipython3 %pip install -q "ultralytics==8.0.200" onnx %pip install -q "openvino-dev>=2023.1.0" + %pip install -q "nncf>=2.6.0" %pip install -q gradio @@ -59,23 +79,44 @@ Install requirements Note: you may need to restart the kernel to use updated packages. DEPRECATION: pytorch-lightning 1.6.5 has a non-standard dependency specifier torch>=1.8.*. pip 24.0 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of pytorch-lightning or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063 Note: you may need to restart the kernel to use updated packages. + DEPRECATION: pytorch-lightning 1.6.5 has a non-standard dependency specifier torch>=1.8.*. pip 24.0 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of pytorch-lightning or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063 + Note: you may need to restart the kernel to use updated packages. Imports ~~~~~~~ + + .. code:: ipython3 + import ipywidgets as widgets from pathlib import Path import openvino as ov import torch from PIL import Image, ImageDraw from ultralytics import FastSAM + + import urllib.request + # Fetch skip_kernel_extension module + urllib.request.urlretrieve( + url='https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/main/notebooks/utils/skip_kernel_extension.py', + filename='skip_kernel_extension.py' + ) + # Fetch `notebook_utils` module + urllib.request.urlretrieve( + url='https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/main/notebooks/utils/notebook_utils.py', + filename='notebook_utils.py' + ) + from notebook_utils import download_file + %load_ext skip_kernel_extension FastSAM in Ultralytics ---------------------- + + To work with `Fast Segment Anything Model `__ by ``CASIA-IVA-Lab``, we will use the `Ultralytics @@ -91,6 +132,7 @@ model and generate a segmentation map. # Run inference on an image image_uri = "https://storage.openvinotoolkit.org/repositories/openvino_notebooks/data/data/image/coco_bike.jpg" + image_uri = download_file(image_uri) results = model(image_uri, device="cpu", retina_masks=True, imgsz=1024, conf=0.6, iou=0.9) @@ -105,22 +147,17 @@ model and generate a segmentation map. 0%| | 0.00/138M [00:00`__ @@ -191,13 +232,13 @@ used to compile the model. Select inference device ^^^^^^^^^^^^^^^^^^^^^^^ + + Select device that will be used to do models inference using OpenVINO from the dropdown list: .. code:: ipython3 - import ipywidgets as widgets - DEVICE = widgets.Dropdown( options=core.available_devices + ["AUTO"], value="AUTO", @@ -219,23 +260,22 @@ from the dropdown list: Adapt OpenVINO models to the original pipeline ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + Here we create wrapper classes for the OpenVINO model that we want to embed in the original inference pipeline. Here are some of the things to -consider when adapting an OV model: - -- Make sure that parameters passed - by the original pipeline are forwarded to the compiled OV model - properly; sometimes the OV model uses only a portion of the input - arguments and some are ignored, sometimes you need to convert the - argument to another data type or unwrap some data structures such as - tuples or dictionaries. -- Guarantee that the wrapper class returns - results to the pipeline in an expected format. In the example below you - can see how we pack OV model outputs into a tuple of ``torch`` tensors. +consider when adapting an OV model: - Make sure that parameters passed +by the original pipeline are forwarded to the compiled OV model +properly; sometimes the OV model uses only a portion of the input +arguments and some are ignored, sometimes you need to convert the +argument to another data type or unwrap some data structures such as +tuples or dictionaries. - Guarantee that the wrapper class returns +results to the pipeline in an expected format. In the example below you +can see how we pack OV model outputs into a tuple of ``torch`` tensors. - Pay attention to the model method used in the original pipeline for - calling the model - it may be not the ``forward`` method! In this - example, the model is a part of a ``predictor`` object and called as and - object, so we need to redefine the magic ``__call__`` method. +calling the model - it may be not the ``forward`` method! In this +example, the model is a part of a ``predictor`` object and called as and +object, so we need to redefine the magic ``__call__`` method. .. code:: ipython3 @@ -266,9 +306,8 @@ pipeline. .. parsed-literal:: - Found https://storage.openvinotoolkit.org/repositories/openvino_notebooks/data/data/image/coco_bike.jpg locally at coco_bike.jpg - image 1/1 /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-534/.workspace/scm/ov-notebook/notebooks/261-fast-segment-anything/coco_bike.jpg: 480x640 33 objects, 356.4ms - Speed: 3.7ms preprocess, 356.4ms inference, 16.1ms postprocess per image at shape (1, 3, 480, 640) + image 1/1 /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-545/.workspace/scm/ov-notebook/notebooks/261-fast-segment-anything/coco_bike.jpg: 480x640 33 objects, 353.6ms + Speed: 3.5ms preprocess, 353.6ms inference, 14.7ms postprocess per image at shape (1, 3, 480, 640) One can observe the converted model outputs in the next cell, they is @@ -285,9 +324,306 @@ the same as of the original model. +Optimize the model using NNCF Post-training Quantization API +------------------------------------------------------------ + + + +`NNCF `__ provides a suite of +advanced algorithms for Neural Networks inference optimization in +OpenVINO with minimal accuracy drop. We will use 8-bit quantization in +post-training mode (without the fine-tuning pipeline) to optimize +FastSAM. + +The optimization process contains the following steps: + +1. Create a Dataset for quantization. +2. Run ``nncf.quantize`` to obtain a quantized model. +3. Save the INT8 model using ``openvino.save_model()`` function. + +.. code:: ipython3 + + do_quantize = widgets.Checkbox( + value=True, + description='Quantization', + disabled=False, + ) + + do_quantize + + + + +.. parsed-literal:: + + Checkbox(value=True, description='Quantization') + + + +The ``nncf.quantize`` function provides an interface for model +quantization. It requires an instance of the OpenVINO Model and +quantization dataset. Optionally, some additional parameters for the +configuration quantization process (number of samples for quantization, +preset, ignored scope, etc.) can be provided. YOLOv8 model backing +FastSAM contains non-ReLU activation functions, which require asymmetric +quantization of activations. To achieve a better result, we will use a +``mixed`` quantization preset. It provides symmetric quantization of +weights and asymmetric quantization of activations. For more accurate +results, we should keep the operation in the postprocessing subgraph in +floating point precision, using the ``ignored_scope`` parameter. + +The quantization algorithm is based on `The YOLOv8 quantization +example `__ +in the NNCF repo, refer there for more details. Moreover, you can check +out other quantization tutorials in the `OV notebooks +repo `__. + + **Note**: Model post-training quantization is time-consuming process. + Be patient, it can take several minutes depending on your hardware. + +.. code:: ipython3 + + %%skip not $do_quantize.value + + import pickle + from contextlib import contextmanager + from zipfile import ZipFile + + import cv2 + from tqdm.autonotebook import tqdm + + import nncf + + + COLLECT_CALIBRATION_DATA = False + calibration_data = [] + + @contextmanager + def calibration_data_collection(): + global COLLECT_CALIBRATION_DATA + try: + COLLECT_CALIBRATION_DATA = True + yield + finally: + COLLECT_CALIBRATION_DATA = False + + + class NNCFWrapper: + def __init__(self, ov_model, stride=32) -> None: + self.model = core.read_model(ov_model) + self.compiled_model = core.compile_model(self.model, device_name="CPU") + + self.stride = stride + self.pt = True + self.fp16 = False + self.names = {0: "object"} + + def __call__(self, im, **_): + if COLLECT_CALIBRATION_DATA: + calibration_data.append(im) + + result = self.compiled_model(im) + return torch.from_numpy(result[0]), torch.from_numpy(result[1]) + + # Fetch data from the web and descibe a dataloader + DATA_URL = "https://ultralytics.com/assets/coco128.zip" + OUT_DIR = Path('.') + + download_file(DATA_URL, directory=OUT_DIR, show_progress=True) + + if not (OUT_DIR / "coco128/images/train2017").exists(): + with ZipFile('coco128.zip', "r") as zip_ref: + zip_ref.extractall(OUT_DIR) + + class COCOLoader(torch.utils.data.Dataset): + def __init__(self, images_path): + self.images = list(Path(images_path).iterdir()) + + def __getitem__(self, index): + if isinstance(index, slice): + return [self.read_image(image_path) for image_path in self.images[index]] + return self.read_image(self.images[index]) + + def read_image(self, image_path): + image = cv2.imread(str(image_path)) + image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) + return image + + def __len__(self): + return len(self.images) + + + def collect_calibration_data_for_decoder(model, calibration_dataset_size: int, + calibration_cache_path: Path): + global calibration_data + + + if not calibration_cache_path.exists(): + coco_dataset = COCOLoader(OUT_DIR / 'coco128/images/train2017') + with calibration_data_collection(): + for image in tqdm(coco_dataset[:calibration_dataset_size], desc="Collecting calibration data"): + model(image, retina_masks=True, imgsz=640, conf=0.6, iou=0.9, verbose=False) + calibration_cache_path.parent.mkdir(parents=True, exist_ok=True) + with open(calibration_cache_path, "wb") as f: + pickle.dump(calibration_data, f) + else: + with open(calibration_cache_path, "rb") as f: + calibration_data = pickle.load(f) + + return calibration_data + + + def quantize(model, save_model_path: Path, calibration_cache_path: Path, + calibration_dataset_size: int, preset: nncf.QuantizationPreset): + calibration_data = collect_calibration_data_for_decoder( + model, calibration_dataset_size, calibration_cache_path) + quantized_ov_decoder = nncf.quantize( + model.predictor.model.model, + calibration_dataset=nncf.Dataset(calibration_data), + preset=preset, + subset_size=len(calibration_data), + fast_bias_correction=True, + ignored_scope=nncf.IgnoredScope( + types=["Multiply", "Subtract", "Sigmoid"], # ignore operations + names=[ + "/model.22/dfl/conv/Conv", # in the post-processing subgraph + "/model.22/Add", + "/model.22/Add_1", + "/model.22/Add_2", + "/model.22/Add_3", + "/model.22/Add_4", + "/model.22/Add_5", + "/model.22/Add_6", + "/model.22/Add_7", + "/model.22/Add_8", + "/model.22/Add_9", + "/model.22/Add_10", + ], + ) + ) + ov.save_model(quantized_ov_decoder, save_model_path) + + wrapped_model = NNCFWrapper(ov_model_path, stride=model.predictor.model.stride) + model.predictor.model = wrapped_model + + calibration_dataset_size = 128 + quantized_model_path = Path(f"{model_name}_quantized") / "FastSAM-x.xml" + calibration_cache_path = Path(f"calibration_data/coco{calibration_dataset_size}.pkl") + if not quantized_model_path.exists(): + quantize(model, quantized_model_path, calibration_cache_path, + calibration_dataset_size=calibration_dataset_size, + preset=nncf.QuantizationPreset.MIXED) + + +.. parsed-literal:: + + INFO:nncf:NNCF initialized successfully. Supported frameworks detected: torch, tensorflow, onnx, openvino + + + +.. parsed-literal:: + + coco128.zip: 0%| | 0.00/6.66M [00:00`__. @@ -319,7 +655,6 @@ bounding boxes on input image. for i, mask in enumerate(annotations): mask = cv2.morphologyEx(mask.astype(np.uint8), cv2.MORPH_CLOSE, np.ones((3, 3), np.uint8)) annotations[i] = cv2.morphologyEx(mask.astype(np.uint8), cv2.MORPH_OPEN, np.ones((8, 8), np.uint8)) - # device is CPU inner_mask = fast_show_mask( annotations, @@ -421,6 +756,7 @@ based on user input. def segment( image, + model_type, input_size=1024, iou_threshold=0.75, conf_threshold=0.4, @@ -429,6 +765,11 @@ based on user input. use_retina=True, mask_random_color=True, ): + if do_quantize.value and model_type == 'Quantized model': + model.predictor.model = quantized_wrapped_model + else: + model.predictor.model = wrapped_model + input_size = int(input_size) w, h = image.size scale = input_size / max(w, h) @@ -555,10 +896,15 @@ based on user input. with gr.Row(variant="panel"): original_img = gr.Image(label="Input", value=examples[0][0], type="pil") segmented_img = gr.Image(label="Segmentation Map", type="pil") - point_type = gr.Radio( - ["Object point", "Background point", "Bounding Box"], - value="Object point", label="Pixel selector type" - ) + with gr.Row(): + point_type = gr.Radio( + ["Object point", "Background point", "Bounding Box"], + value="Object point", label="Pixel selector type" + ) + model_type = gr.Radio( + ["FP32 model", "Quantized model"] if do_quantize.value else ["FP32 model"], + value="FP32 model", label="Select model variant" + ) with gr.Row(variant="panel"): segment_button = gr.Button("Segment", variant="primary") clear_button = gr.Button("Clear points", variant="secondary") @@ -572,7 +918,7 @@ based on user input. outputs=original_img) original_img.upload(save_last_picked_image, inputs=original_img, outputs=segmented_img) clear_button.click(clear_points, outputs=[original_img, segmented_img]) - segment_button.click(segment, inputs=[original_img,], outputs=segmented_img) + segment_button.click(segment, inputs=[original_img, model_type], outputs=segmented_img) try: demo.queue().launch(debug=False) diff --git a/docs/notebooks/261-fast-segment-anything-with-output_files/index.html b/docs/notebooks/261-fast-segment-anything-with-output_files/index.html index 0805c8173503d2..5e233240faa8f8 100644 --- a/docs/notebooks/261-fast-segment-anything-with-output_files/index.html +++ b/docs/notebooks/261-fast-segment-anything-with-output_files/index.html @@ -1,10 +1,10 @@ -Index of /projects/ov-notebook/0.1.0-latest/20231030220807/dist/rst_files/261-fast-segment-anything-with-output_files/ +Index of /projects/ov-notebook/0.1.0-latest/20231114220808/dist/rst_files/261-fast-segment-anything-with-output_files/ -

Index of /projects/ov-notebook/0.1.0-latest/20231030220807/dist/rst_files/261-fast-segment-anything-with-output_files/


../
-261-fast-segment-anything-with-output_21_0.jpg     31-Oct-2023 00:35              116049
-261-fast-segment-anything-with-output_21_0.png     31-Oct-2023 00:35              824318
-261-fast-segment-anything-with-output_9_0.jpg      31-Oct-2023 00:35              117489
-261-fast-segment-anything-with-output_9_0.png      31-Oct-2023 00:35              815077
+

Index of /projects/ov-notebook/0.1.0-latest/20231114220808/dist/rst_files/261-fast-segment-anything-with-output_files/


../
+261-fast-segment-anything-with-output_21_0.jpg     15-Nov-2023 00:43              116049
+261-fast-segment-anything-with-output_21_0.png     15-Nov-2023 00:43              824318
+261-fast-segment-anything-with-output_9_0.jpg      15-Nov-2023 00:43              117489
+261-fast-segment-anything-with-output_9_0.png      15-Nov-2023 00:43              815077
 

diff --git a/docs/notebooks/262-softvc-voice-conversion-with-output.rst b/docs/notebooks/262-softvc-voice-conversion-with-output.rst index e099f18d7f3c42..a333dba3adb807 100644 --- a/docs/notebooks/262-softvc-voice-conversion-with-output.rst +++ b/docs/notebooks/262-softvc-voice-conversion-with-output.rst @@ -18,27 +18,27 @@ audio are preserved. In this tutorial we will use the base model flow. -**Table of contents:** - +Table of contents: +^^^^^^^^^^^^^^^^^^ - `Prerequisites <#prerequisites>`__ - `Use the original model to run an - inference <#use-the-original-model-to-run-an-inference->`__ -- `Convert the original model to OpenVINO Intermediate Representation - (IR) - format <#convert-the-original-model-to-openvino-intermediate-representation-ir-format>`__ + inference <#use-the-original-model-to-run-an-inference>`__ +- `Convert to OpenVINO IR model <#convert-to-openvino-ir-model>`__ - `Run the OpenVINO model <#run-the-openvino-model>`__ - `Interactive inference <#interactive-inference>`__ Prerequisites ------------- + + .. code:: ipython3 %pip install -q --upgrade pip setuptools %pip install -q "openvino>=2023.2.0.dev20230922" !git clone https://github.com/svc-develop-team/so-vits-svc -b 4.1-Stable - %pip install -q --extra-index-url https://download.pytorch.org/whl/cpu tqdm librosa torch torchaudio faiss-cpu gradio "numpy==1.23.5" "fairseq==0.12.2" praat-parselmouth + %pip install -q --extra-index-url https://download.pytorch.org/whl/cpu tqdm librosa "torch>=2.1.0" "torchaudio>=2.1.0" faiss-cpu gradio "numpy==1.23.5" "fairseq==0.12.2" praat-parselmouth Download pretrained models and configs. We use a recommended encoder `ContentVec `__ and models from `a @@ -70,8 +70,10 @@ own `__. # a wav sample download_file("https://huggingface.co/datasets/santifiorino/spinetta/resolve/main/spinetta/000.wav", "000.wav", directory="so-vits-svc/raw/") -Use the original model to run an inference `⇧ <#table-of-content>`__ ---------------------------------------------------------------------- +Use the original model to run an inference +------------------------------------------ + + Change directory to ``so-vits-svc`` in purpose not to brake internal relative paths. @@ -121,6 +123,8 @@ And let compare the original audio with the result. Convert to OpenVINO IR model ---------------------------- + + Model components are PyTorch modules, that can be converted with ``ov.convert_model`` function directly. We also use ``ov.save_model`` function to serialize the result of conversion. ``Svc`` is not a model, @@ -170,6 +174,8 @@ without need to look inside. Run the OpenVINO model ---------------------- + + Select a device from dropdown list for running inference using OpenVINO. .. code:: ipython3 @@ -223,13 +229,15 @@ Check result. Is it identical to that created by the original model. Interactive inference --------------------- + + .. code:: ipython3 import gradio as gr - src_audio = gr.inputs.Audio(label="Source Audio", type='filepath') - output_audio = gr.outputs.Audio(label="Output Audio", type='numpy') + src_audio = gr.Audio(label="Source Audio", type='filepath') + output_audio = gr.Audio(label="Output Audio", type='numpy') title = 'SoftVC VITS Singing Voice Conversion with Gradio' description = f'Gradio Demo for SoftVC VITS Singing Voice Conversion and OpenVINO™. Upload a source audio, then click the "Submit" button to inference. Audio sample rate should be {model.target_sample}' diff --git a/docs/notebooks/263-latent-consistency-models-image-generation-with-output.rst b/docs/notebooks/263-latent-consistency-models-image-generation-with-output.rst index 76989c2f754f32..3caebe1dcd10f2 100644 --- a/docs/notebooks/263-latent-consistency-models-image-generation-with-output.rst +++ b/docs/notebooks/263-latent-consistency-models-image-generation-with-output.rst @@ -40,10 +40,12 @@ page `__, repository `__. In this tutorial, we consider how to convert and run LCM using OpenVINO. +An additional part demonstrates how to run quantization with +`NNCF `__ to speed up +pipeline. **Table of contents:** - - `Prerequisites <#prerequisites>`__ - `Prepare models for OpenVINO format conversion <#prepare-models-for-openvino-format-conversion>`__ - `Convert models to OpenVINO format <#convert-models-to-openvino-format>`__ @@ -53,18 +55,26 @@ In this tutorial, we consider how to convert and run LCM using OpenVINO. - `Prepare inference pipeline <#prepare-inference-pipeline>`__ - `Configure Inference Pipeline <#configure-inference-pipeline>`__ - `Text-to-image generation <#text-to-image-generation>`__ +- `Quantization <#quantization>`__ +- `Prepare calibration dataset <#prepare-calibration-dataset>`__ +- `Run quantization <#run-quantization>`__ +- `Compare inference time of the FP16 and INT8 models <#compare-inference-time-of-the-fp-and-int-models>`__ - `Interactive demo <#interactive-demo>`__ -Prerequisites -------------------------------------------------------- +Prerequisites +------------- + + .. code:: ipython3 %pip install -q "torch" --index-url https://download.pytorch.org/whl/cpu - %pip install -q "openvino>=2023.1.0" transformers "diffusers>=0.21.4" pillow gradio + %pip install -q "openvino>=2023.1.0" transformers "diffusers>=0.22.0" pillow gradio "nncf>=2.6.0" datasets + +Prepare models for OpenVINO format conversion +--------------------------------------------- + -Prepare models for OpenVINO format conversion ---------------------------------------------------------------------------------------- In this tutorial we will use `LCM_Dreamshaper_v7 `__ @@ -78,7 +88,7 @@ model is also integrated into Diffusers is the go-to library for state-of-the-art pretrained diffusion models for generating images, audio, and even 3D structures of molecules. This allows us to compare running original Stable Diffusion -(from this `notebook <../225-stable-diffusion-text-to-image>`__) and +(from this `notebook <225-stable-diffusion-text-to-image-with-output.html>`__) and distilled using LCD. The distillation approach efficiently converts a pre-trained guided diffusion model into a latent consistency model by solving an augmented PF-ODE. @@ -96,6 +106,7 @@ provide which module should be loaded for initialization using import warnings from pathlib import Path from diffusers import DiffusionPipeline + import numpy as np warnings.filterwarnings("ignore") @@ -105,16 +116,12 @@ provide which module should be loaded for initialization using VAE_DECODER_OV_PATH = Path("model/vae_decoder.xml") - def load_orginal_pytorch_pipeline_componets(skip_models=False): - pipe = DiffusionPipeline.from_pretrained( - "SimianLuo/LCM_Dreamshaper_v7", - custom_pipeline="latent_consistency_txt2img", - custom_revision="main", - ) + def load_orginal_pytorch_pipeline_componets(skip_models=False, skip_safety_checker=True): + pipe = DiffusionPipeline.from_pretrained("SimianLuo/LCM_Dreamshaper_v7") scheduler = pipe.scheduler tokenizer = pipe.tokenizer - feature_extractor = pipe.feature_extractor - safety_checker = pipe.safety_checker + feature_extractor = pipe.feature_extractor if not skip_safety_checker else None + safety_checker = pipe.safety_checker if not skip_safety_checker else None text_encoder, unet, vae = None, None, None if not skip_models: text_encoder = pipe.text_encoder @@ -135,26 +142,6 @@ provide which module should be loaded for initialization using vae, ) - -.. parsed-literal:: - - /home/ea/work/ov_venv/lib/python3.8/site-packages/bitsandbytes/cextension.py:34: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable. - warn("The installed version of bitsandbytes was compiled without GPU support. " - - -.. parsed-literal:: - - /home/ea/work/ov_venv/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cadam32bit_grad_fp32 - - -.. parsed-literal:: - - 2023-10-25 13:59:59.802031: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`. - 2023-10-25 13:59:59.841632: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. - To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. - 2023-10-25 14:00:00.487700: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT - - .. code:: ipython3 skip_conversion = ( @@ -177,11 +164,18 @@ provide which module should be loaded for initialization using .. parsed-literal:: - Loading pipeline components...: 0%| | 0/6 [00:00 x_t-1 latents, denoised = self.scheduler.step( - torch.from_numpy(model_pred), i, t, latents, return_dict=False + torch.from_numpy(model_pred), t, latents, return_dict=False ) progress_bar.update() @@ -704,8 +706,10 @@ decoded by the decoder part of the variational auto encoder. images=image, nsfw_content_detected=has_nsfw_concept ) -Configure Inference Pipeline -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Configure Inference Pipeline +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + First, you should create instances of OpenVINO Model and compile it using selected device. Select device from dropdown list for running @@ -745,8 +749,8 @@ inference using OpenVINO. vae_decoder = core.compile_model(VAE_DECODER_OV_PATH, device.value, ov_config) Model tokenizer and scheduler are also important parts of the pipeline. -This pipeline is also uses Safety Checker, the filter for detecting that -corresponding generated image contains “not-safe-for-work” (nsfw) +This pipeline is also can use Safety Checker, the filter for detecting +that corresponding generated image contains “not-safe-for-work” (nsfw) content. The process of nsfw content detection requires to obtain image embeddings using CLIP model, so additionally feature extractor component should be added in the pipeline. We reuse tokenizer, feature extractor, @@ -754,7 +758,7 @@ scheduler and safety checker from original LCM pipeline. .. code:: ipython3 - ov_pipe = LatentConsistencyModelPipeline( + ov_pipe = OVLatentConsistencyModelPipeline( tokenizer=tokenizer, text_encoder=text_enc, unet=unet_model, @@ -764,8 +768,10 @@ scheduler and safety checker from original LCM pipeline. safety_checker=safety_checker, ) -Text-to-image generation ------------------------------------------------------------------- +Text-to-image generation +------------------------ + + Now, let’s see model in action @@ -805,14 +811,335 @@ Now, let’s see model in action Nice. As you can see, the picture has quite a high definition 🔥. -Interactive demo ----------------------------------------------------------- +Quantization +------------ + + + +`NNCF `__ enables +post-training quantization by adding quantization layers into model +graph and then using a subset of the training dataset to initialize the +parameters of these additional quantization layers. Quantized operations +are executed in ``INT8`` instead of ``FP32``/``FP16`` making model +inference faster. + +According to ``LatentConsistencyModelPipeline`` structure, UNet used for +iterative denoising of input. It means that model runs in the cycle +repeating inference on each diffusion step, while other parts of +pipeline take part only once. That is why computation cost and speed of +UNet denoising becomes the critical path in the pipeline. Quantizing the +rest of the SD pipeline does not significantly improve inference +performance but can lead to a substantial degradation of accuracy. + +The optimization process contains the following steps: + +1. Create a calibration dataset for quantization. +2. Run ``nncf.quantize()`` to obtain quantized model. +3. Save the ``INT8`` model using ``openvino.save_model()`` function. + +Please select below whether you would like to run quantization to +improve model inference speed. + +.. code:: ipython3 + + to_quantize = widgets.Checkbox( + value=True, + description='Quantization', + disabled=False, + ) + + to_quantize + + + + +.. parsed-literal:: + + Checkbox(value=True, description='Quantization') + + + +Let’s load ``skip magic`` extension to skip quantization if +``to_quantize`` is not selected + +.. code:: ipython3 + + import sys + sys.path.append("../utils") + + int8_pipe = None + + if to_quantize.value and "GPU" in device.value: + to_quantize.value = False + + %load_ext skip_kernel_extension + +Prepare calibration dataset +~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + + +We use a portion of +`laion/laion2B-en `__ +dataset from Hugging Face as calibration data. To collect intermediate +model inputs for calibration we should customize ``CompiledModel``. + +.. code:: ipython3 + + %%skip not $to_quantize.value + + import datasets + from tqdm.notebook import tqdm + from transformers import Pipeline + from typing import Any, Dict, List + + class CompiledModelDecorator(ov.CompiledModel): + def __init__(self, compiled_model, prob: float, data_cache: List[Any] = None): + super().__init__(compiled_model) + self.data_cache = data_cache if data_cache else [] + self.prob = np.clip(prob, 0, 1) + + def __call__(self, *args, **kwargs): + if np.random.rand() >= self.prob: + self.data_cache.append(*args) + return super().__call__(*args, **kwargs) + + def collect_calibration_data(lcm_pipeline: Pipeline, subset_size: int) -> List[Dict]: + original_unet = lcm_pipeline.unet + lcm_pipeline.unet = CompiledModelDecorator(original_unet, prob=0.3) + + dataset = datasets.load_dataset("laion/laion2B-en", split="train", streaming=True).shuffle(seed=42) + lcm_pipeline.set_progress_bar_config(disable=True) + + # Run inference for data collection + pbar = tqdm(total=subset_size) + diff = 0 + for batch in dataset: + prompt = batch["TEXT"] + _ = lcm_pipeline( + prompt, + num_inference_steps=num_inference_steps, + guidance_scale=8.0, + lcm_origin_steps=50, + output_type="pil", + height=512, + width=512, + ) + collected_subset_size = len(lcm_pipeline.unet.data_cache) + if collected_subset_size >= subset_size: + pbar.update(subset_size - pbar.n) + break + pbar.update(collected_subset_size - diff) + diff = collected_subset_size + + calibration_dataset = lcm_pipeline.unet.data_cache + lcm_pipeline.set_progress_bar_config(disable=False) + lcm_pipeline.unet = original_unet + return calibration_dataset + +.. code:: ipython3 + + %%skip not $to_quantize.value + + import logging + logging.basicConfig(level=logging.WARNING) + logger = logging.getLogger(__name__) + + UNET_INT8_OV_PATH = Path("model/unet_int8.xml") + if not UNET_INT8_OV_PATH.exists(): + subset_size = 200 + unet_calibration_data = collect_calibration_data(ov_pipe, subset_size=subset_size) + + + +.. parsed-literal:: + + Downloading readme: 0%| | 0.00/56.0 [00:00 77). Running this sequence through the model will result in indexing errors + WARNING:__main__:The following part of your input was truncated because CLIP can only handle sequences up to 77 tokens: ['colleges harnessing technology to make education free'] + + +Run quantization +~~~~~~~~~~~~~~~~ + + + +Create a quantized model from the pre-trained converted OpenVINO model. + + **NOTE**: Quantization is time and memory consuming operation. + Running quantization code below may take some time. + +.. code:: ipython3 + + %%skip not $to_quantize.value + + import nncf + from nncf.scopes import IgnoredScope + + if UNET_INT8_OV_PATH.exists(): + print("Loading quantized model") + quantized_unet = core.read_model(UNET_INT8_OV_PATH) + else: + unet = core.read_model(UNET_OV_PATH) + quantized_unet = nncf.quantize( + model=unet, + subset_size=subset_size, + preset=nncf.QuantizationPreset.MIXED, + calibration_dataset=nncf.Dataset(unet_calibration_data), + model_type=nncf.ModelType.TRANSFORMER, + advanced_parameters=nncf.AdvancedQuantizationParameters( + disable_bias_correction=True + ) + ) + ov.save_model(quantized_unet, UNET_INT8_OV_PATH) + + +.. parsed-literal:: + + INFO:nncf:NNCF initialized successfully. Supported frameworks detected: torch, tensorflow, onnx, openvino + + +.. parsed-literal:: + + Statistics collection: 100%|██████████| 200/200 [03:15<00:00, 1.02it/s] + Applying Smooth Quant: 100%|██████████| 101/101 [00:07<00:00, 13.89it/s] + + +.. parsed-literal:: + + INFO:nncf:96 ignored nodes was found by name in the NNCFGraph + + +.. parsed-literal:: + + Statistics collection: 100%|██████████| 200/200 [03:57<00:00, 1.19s/it] + + +.. code:: ipython3 + + %%skip not $to_quantize.value + + unet_optimized = core.compile_model(UNET_INT8_OV_PATH, device.value) + + int8_pipe = OVLatentConsistencyModelPipeline( + tokenizer=tokenizer, + text_encoder=text_enc, + unet=unet_optimized, + vae_decoder=vae_decoder, + scheduler=scheduler, + feature_extractor=feature_extractor, + safety_checker=safety_checker, + ) + +Let us check predictions with the quantized UNet using the same input +data. + +.. code:: ipython3 + + %%skip not $to_quantize.value + + from IPython.display import display + + prompt = "a beautiful pink unicorn, 8k" + num_inference_steps = 4 + torch.manual_seed(1234567) + + images = int8_pipe( + prompt=prompt, + num_inference_steps=num_inference_steps, + guidance_scale=8.0, + lcm_origin_steps=50, + output_type="pil", + height=512, + width=512, + ).images + + display(images[0]) + + + +.. parsed-literal:: + + 0%| | 0/4 [00:00 -Index of /projects/ov-notebook/0.1.0-latest/20231030220807/dist/rst_files/263-latent-consistency-models-image-generation-with-output_files/ +Index of /projects/ov-notebook/0.1.0-latest/20231114220808/dist/rst_files/263-latent-consistency-models-image-generation-with-output_files/ -

Index of /projects/ov-notebook/0.1.0-latest/20231030220807/dist/rst_files/263-latent-consistency-models-image-generation-with-output_files/


../
-263-latent-consistency-models-image-generation-..> 31-Oct-2023 00:35               20240
-263-latent-consistency-models-image-generation-..> 31-Oct-2023 00:35              390302
+

Index of /projects/ov-notebook/0.1.0-latest/20231114220808/dist/rst_files/263-latent-consistency-models-image-generation-with-output_files/


../
+263-latent-consistency-models-image-generation-..> 15-Nov-2023 00:43              345052
+263-latent-consistency-models-image-generation-..> 15-Nov-2023 00:43              352748
 

diff --git a/docs/notebooks/264-qrcode-monster-with-output.rst b/docs/notebooks/264-qrcode-monster-with-output.rst new file mode 100644 index 00000000000000..8427ab36c7011f --- /dev/null +++ b/docs/notebooks/264-qrcode-monster-with-output.rst @@ -0,0 +1,989 @@ +Generate creative QR codes with ControlNet QR Code Monster and OpenVINO™ +======================================================================== + +`Stable Diffusion `__, a +cutting-edge image generation technique, but it can be further enhanced +by combining it with `ControlNet `__, +a widely used control network approach. The combination allows Stable +Diffusion to use a condition input to guide the image generation +process, resulting in highly accurate and visually appealing images. The +condition input could be in the form of various types of data such as +scribbles, edge maps, pose key points, depth maps, segmentation maps, +normal maps, or any other relevant information that helps to guide the +content of the generated image, for example - QR codes! This method can +be particularly useful in complex image generation scenarios where +precise control and fine-tuning are required to achieve the desired +results. + +In this tutorial, we will learn how to convert and run `Controlnet QR +Code Monster For +SD-1.5 `__ +by `monster-labs `__. + +|image0| + +If you want to learn more about ControlNet and particularly on +conditioning by pose, please refer to this +`tutorial <235-controlnet-stable-diffusion-with-output.html>`__ + +**Table of contents:** + +- `Prerequisites <#prerequisites>`__ +- `Instantiating Generation + Pipeline <#instantiating-generation-pipeline>`__ + + - `ControlNet in Diffusers + library <#controlnet-in-diffusers-library>`__ + +- `Convert models to OpenVINO Intermediate representation (IR) + format <#convert-models-to-openvino-intermediate-representation-ir-format>`__ + + - `ControlNet conversion <#controlnet-conversion>`__ + - `Text Encoder <#text-encoder>`__ + - `UNet conversion <#unet-conversion>`__ + - `VAE Decoder conversion <#vae-decoder-conversion>`__ + +- `Select inference device for Stable Diffusion + pipeline <#select-inference-device-for-stable-diffusion-pipeline>`__ +- `Prepare Inference pipeline <#prepare-inference-pipeline>`__ +- `Running Text-to-Image Generation with ControlNet Conditioning and + OpenVINO <#running-text-to-image-generation-with-controlnet-conditioning-and-openvino>`__ + +.. |image0| image:: https://github.com/openvinotoolkit/openvino_notebooks/assets/76463150/1a5978c6-e7a0-4824-9318-a3d8f4912c47 + +Prerequisites +------------- + + + +.. code:: ipython3 + + %pip install -q accelerate diffusers transformers torch gradio qrcode opencv-python --extra-index-url https://download.pytorch.org/whl/cpu + %pip install -q "openvino>=2023.1.0" + +Instantiating Generation Pipeline +--------------------------------- + + + +ControlNet in Diffusers library +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + + +For working with Stable Diffusion and ControlNet models, we will use +Hugging Face `Diffusers `__ +library. To experiment with ControlNet, Diffusers exposes the +`StableDiffusionControlNetPipeline `__ +similar to the `other Diffusers +pipelines `__. +Central to the ``StableDiffusionControlNetPipeline`` is the +``controlnet`` argument which enables providing a particularly trained +`ControlNetModel `__ +instance while keeping the pre-trained diffusion model weights the same. +The code below demonstrates how to create +``StableDiffusionControlNetPipeline``, using the ``controlnet-openpose`` +controlnet model and ``stable-diffusion-v1-5``: + +.. code:: ipython3 + + from diffusers import ( + StableDiffusionControlNetPipeline, + ControlNetModel, + ) + + controlnet = ControlNetModel.from_pretrained( + "monster-labs/control_v1p_sd15_qrcode_monster" + ) + + pipe = StableDiffusionControlNetPipeline.from_pretrained( + "runwayml/stable-diffusion-v1-5", + controlnet=controlnet, + ) + + +.. parsed-literal:: + + /home/idavidyu/.virtualenvs/test/lib/python3.10/site-packages/torch/cuda/__init__.py:611: UserWarning: Can't initialize NVML + warnings.warn("Can't initialize NVML") + + + +.. parsed-literal:: + + Loading pipeline components...: 0%| | 0/6 [00:00 by passing `safety_checker=None`. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 . + + +Convert models to OpenVINO Intermediate representation (IR) format +------------------------------------------------------------------ + + + +We need to provide a model object, input data for model tracing to +``ov.convert_model`` function to obtain OpenVINO ``ov.Model`` object +instance. Model can be saved on disk for next deployment using +``ov.save_model`` function. + +The pipeline consists of four important parts: + +- ControlNet for conditioning by image annotation. +- Text Encoder for creation condition to generate an image from a text + prompt. +- Unet for step-by-step denoising latent image representation. +- Autoencoder (VAE) for decoding latent space to image. + +.. code:: ipython3 + + import gc + from functools import partial + from pathlib import Path + from PIL import Image + import openvino as ov + import torch + + def cleanup_torchscript_cache(): + """ + Helper for removing cached model representation + """ + torch._C._jit_clear_class_registry() + torch.jit._recursive.concrete_type_store = torch.jit._recursive.ConcreteTypeStore() + torch.jit._state._clear_class_state() + +ControlNet conversion +~~~~~~~~~~~~~~~~~~~~~ + + + +The ControlNet model accepts the same inputs like UNet in Stable +Diffusion pipeline and additional condition sample - skeleton key points +map predicted by pose estimator: + +- ``sample`` - latent image sample from the previous step, generation + process has not been started yet, so we will use random noise, +- ``timestep`` - current scheduler step, +- ``encoder_hidden_state`` - hidden state of text encoder, +- ``controlnet_cond`` - condition input annotation. + +The output of the model is attention hidden states from down and middle +blocks, which serves additional context for the UNet model. + +.. code:: ipython3 + + controlnet_ir_path = Path('./controlnet.xml') + + controlnet_inputs = { + "sample": torch.randn((2, 4, 96, 96)), + "timestep": torch.tensor(1), + "encoder_hidden_states": torch.randn((2,77,768)), + "controlnet_cond": torch.randn((2,3,768,768)) + } + + with torch.no_grad(): + down_block_res_samples, mid_block_res_sample = controlnet(**controlnet_inputs, return_dict=False) + + if not controlnet_ir_path.exists(): + controlnet.forward = partial(controlnet.forward, return_dict=False) + with torch.no_grad(): + ov_model = ov.convert_model(controlnet, example_input=controlnet_inputs) + ov.save_model(ov_model, controlnet_ir_path) + del ov_model + del pipe.controlnet, controlnet + cleanup_torchscript_cache() + print('ControlNet successfully converted to IR') + else: + del pipe.controlnet, controlnet + print(f"ControlNet will be loaded from {controlnet_ir_path}") + + + +.. parsed-literal:: + + ControlNet will be loaded from controlnet.xml + + +Text Encoder +~~~~~~~~~~~~ + + + +The text-encoder is responsible for transforming the input prompt, for +example, “a photo of an astronaut riding a horse” into an embedding +space that can be understood by the U-Net. It is usually a simple +transformer-based encoder that maps a sequence of input tokens to a +sequence of latent text embeddings. + +The input of the text encoder is tensor ``input_ids``, which contains +indexes of tokens from text processed by the tokenizer and padded to the +maximum length accepted by the model. Model outputs are two tensors: +``last_hidden_state`` - hidden state from the last MultiHeadAttention +layer in the model and ``pooler_out`` - pooled output for whole model +hidden states. + +.. code:: ipython3 + + text_encoder_ir_path = Path('./text_encoder.xml') + + if not text_encoder_ir_path.exists(): + pipe.text_encoder.eval() + with torch.no_grad(): + ov_model = ov.convert_model( + pipe.text_encoder, # model instance + example_input=torch.ones((1, 77), dtype=torch.long), # inputs for model tracing + ) + ov.save_model(ov_model, text_encoder_ir_path) + del ov_model + del pipe.text_encoder + cleanup_torchscript_cache() + print('Text Encoder successfully converted to IR') + else: + del pipe.text_encoder + print(f"Text Encoder will be loaded from {controlnet_ir_path}") + + +.. parsed-literal:: + + Text Encoder will be loaded from controlnet.xml + + +UNet conversion +~~~~~~~~~~~~~~~ + + + +The process of UNet model conversion remains the same, like for original +Stable Diffusion model, but with respect to the new inputs generated by +ControlNet. + +.. code:: ipython3 + + from typing import Tuple + + unet_ir_path = Path('./unet.xml') + + dtype_mapping = { + torch.float32: ov.Type.f32, + torch.float64: ov.Type.f64, + torch.int32: ov.Type.i32, + torch.int64: ov.Type.i64 + } + + def flattenize_inputs(inputs): + flatten_inputs = [] + for input_data in inputs: + if input_data is None: + continue + if isinstance(input_data, (list, tuple)): + flatten_inputs.extend(flattenize_inputs(input_data)) + else: + flatten_inputs.append(input_data) + return flatten_inputs + + + class UnetWrapper(torch.nn.Module): + def __init__( + self, + unet, + sample_dtype=torch.float32, + timestep_dtype=torch.int64, + encoder_hidden_states=torch.float32, + down_block_additional_residuals=torch.float32, + mid_block_additional_residual=torch.float32 + ): + super().__init__() + self.unet = unet + self.sample_dtype = sample_dtype + self.timestep_dtype = timestep_dtype + self.encoder_hidden_states_dtype = encoder_hidden_states + self.down_block_additional_residuals_dtype = down_block_additional_residuals + self.mid_block_additional_residual_dtype = mid_block_additional_residual + + def forward( + self, + sample:torch.Tensor, + timestep:torch.Tensor, + encoder_hidden_states:torch.Tensor, + down_block_additional_residuals:Tuple[torch.Tensor], + mid_block_additional_residual:torch.Tensor + ): + sample.to(self.sample_dtype) + timestep.to(self.timestep_dtype) + encoder_hidden_states.to(self.encoder_hidden_states_dtype) + down_block_additional_residuals = [res.to(self.down_block_additional_residuals_dtype) for res in down_block_additional_residuals] + mid_block_additional_residual.to(self.mid_block_additional_residual_dtype) + return self.unet( + sample, + timestep, + encoder_hidden_states, + down_block_additional_residuals=down_block_additional_residuals, + mid_block_additional_residual=mid_block_additional_residual + ) + + + pipe.unet.eval() + unet_inputs = { + "sample": torch.randn((2, 4, 96, 96)), + "timestep": torch.tensor(1), + "encoder_hidden_states": torch.randn((2,77,768)), + "down_block_additional_residuals": down_block_res_samples, + "mid_block_additional_residual": mid_block_res_sample + } + + if not unet_ir_path.exists(): + with torch.no_grad(): + ov_model = ov.convert_model(UnetWrapper(pipe.unet), example_input=unet_inputs) + + flatten_inputs = flattenize_inputs(unet_inputs.values()) + for input_data, input_tensor in zip(flatten_inputs, ov_model.inputs): + input_tensor.get_node().set_partial_shape(ov.PartialShape(input_data.shape)) + input_tensor.get_node().set_element_type(dtype_mapping[input_data.dtype]) + ov_model.validate_nodes_and_infer_types() + + ov.save_model(ov_model, unet_ir_path) + del ov_model + cleanup_torchscript_cache() + del pipe.unet + gc.collect() + print('Unet successfully converted to IR') + else: + del pipe.unet + print(f"Unet will be loaded from {unet_ir_path}") + + +.. parsed-literal:: + + Unet will be loaded from unet.xml + + +VAE Decoder conversion +~~~~~~~~~~~~~~~~~~~~~~ + + + +The VAE model has two parts, an encoder, and a decoder. The encoder is +used to convert the image into a low-dimensional latent representation, +which will serve as the input to the U-Net model. The decoder, +conversely, transforms the latent representation back into an image. + +During latent diffusion training, the encoder is used to get the latent +representations (latents) of the images for the forward diffusion +process, which applies more and more noise at each step. During +inference, the denoised latents generated by the reverse diffusion +process are converted back into images using the VAE decoder. During +inference, we will see that we **only need the VAE decoder**. You can +find instructions on how to convert the encoder part in a stable +diffusion +`notebook <225-stable-diffusion-text-to-image-with-output.html>`__. + +.. code:: ipython3 + + vae_ir_path = Path('./vae.xml') + + + class VAEDecoderWrapper(torch.nn.Module): + def __init__(self, vae): + super().__init__() + vae.eval() + self.vae = vae + + def forward(self, latents): + return self.vae.decode(latents) + + if not vae_ir_path.exists(): + vae_decoder = VAEDecoderWrapper(pipe.vae) + latents = torch.zeros((1, 4, 96, 96)) + + vae_decoder.eval() + with torch.no_grad(): + ov_model = ov.convert_model(vae_decoder, example_input=latents) + ov.save_model(ov_model, vae_ir_path) + del ov_model + del pipe.vae + cleanup_torchscript_cache() + print('VAE decoder successfully converted to IR') + else: + del pipe.vae + print(f"VAE decoder will be loaded from {vae_ir_path}") + + +.. parsed-literal:: + + VAE decoder successfully converted to IR + + +Select inference device for Stable Diffusion pipeline +----------------------------------------------------- + + + +select device from dropdown list for running inference using OpenVINO + +.. code:: ipython3 + + import ipywidgets as widgets + + core = ov.Core() + + device = widgets.Dropdown( + options=core.available_devices + ["AUTO"], + value="CPU", + description="Device:", + disabled=False, + ) + + device + + + + +.. parsed-literal:: + + Dropdown(description='Device:', index=1, options=('CPU', 'AUTO'), value='AUTO') + + + +Prepare Inference pipeline +-------------------------- + + + +The stable diffusion model takes both a latent seed and a text prompt as +input. The latent seed is then used to generate random latent image +representations of size :math:`96 \times 96` where as the text prompt is +transformed to text embeddings of size :math:`77 \times 768` via CLIP’s +text encoder. + +Next, the U-Net iteratively *denoises* the random latent image +representations while being conditioned on the text embeddings. In +comparison with the original stable-diffusion pipeline, latent image +representation, encoder hidden states, and control condition annotation +passed via ControlNet on each denoising step for obtaining middle and +down blocks attention parameters, these attention blocks results +additionally will be provided to the UNet model for the control +generation process. The output of the U-Net, being the noise residual, +is used to compute a denoised latent image representation via a +scheduler algorithm. Many different scheduler algorithms can be used for +this computation, each having its pros and cons. For Stable Diffusion, +it is recommended to use one of: + +- `PNDM + scheduler `__ +- `DDIM + scheduler `__ +- `K-LMS + scheduler `__ + +Theory on how the scheduler algorithm function works is out of scope for +this notebook, but in short, you should remember that they compute the +predicted denoised image representation from the previous noise +representation and the predicted noise residual. For more information, +it is recommended to look into `Elucidating the Design Space of +Diffusion-Based Generative Models `__ + +In this tutorial, instead of using Stable Diffusion’s default +`PNDMScheduler `__, +we use +`EulerAncestralDiscreteScheduler `__, +recommended by authors. More information regarding schedulers can be +found +`here `__. + +The *denoising* process is repeated a given number of times (by default +50) to step-by-step retrieve better latent image representations. Once +complete, the latent image representation is decoded by the decoder part +of the variational auto-encoder. + +Similarly to Diffusers ``StableDiffusionControlNetPipeline``, we define +our own ``OVContrlNetStableDiffusionPipeline`` inference pipeline based +on OpenVINO. + +.. code:: ipython3 + + from diffusers import DiffusionPipeline + from transformers import CLIPTokenizer + from typing import Union, List, Optional, Tuple + import cv2 + import numpy as np + + + def scale_fit_to_window(dst_width:int, dst_height:int, image_width:int, image_height:int): + """ + Preprocessing helper function for calculating image size for resize with peserving original aspect ratio + and fitting image to specific window size + + Parameters: + dst_width (int): destination window width + dst_height (int): destination window height + image_width (int): source image width + image_height (int): source image height + Returns: + result_width (int): calculated width for resize + result_height (int): calculated height for resize + """ + im_scale = min(dst_height / image_height, dst_width / image_width) + return int(im_scale * image_width), int(im_scale * image_height) + + + def preprocess(image: Image.Image): + """ + Image preprocessing function. Takes image in PIL.Image format, resizes it to keep aspect ration and fits to model input window 768x768, + then converts it to np.ndarray and adds padding with zeros on right or bottom side of image (depends from aspect ratio), after that + converts data to float32 data type and change range of values from [0, 255] to [-1, 1], finally, converts data layout from planar NHWC to NCHW. + The function returns preprocessed input tensor and padding size, which can be used in postprocessing. + + Parameters: + image (Image.Image): input image + Returns: + image (np.ndarray): preprocessed image tensor + pad (Tuple[int]): pading size for each dimension for restoring image size in postprocessing + """ + src_width, src_height = image.size + dst_width, dst_height = scale_fit_to_window(768, 768, src_width, src_height) + image = image.convert("RGB") + image = np.array(image.resize((dst_width, dst_height), resample=Image.Resampling.LANCZOS))[None, :] + pad_width = 768 - dst_width + pad_height = 768 - dst_height + pad = ((0, 0), (0, pad_height), (0, pad_width), (0, 0)) + image = np.pad(image, pad, mode="constant") + image = image.astype(np.float32) / 255.0 + image = image.transpose(0, 3, 1, 2) + return image, pad + + + def randn_tensor( + shape: Union[Tuple, List], + dtype: Optional[np.dtype] = np.float32, + ): + """ + Helper function for generation random values tensor with given shape and data type + + Parameters: + shape (Union[Tuple, List]): shape for filling random values + dtype (np.dtype, *optiona*, np.float32): data type for result + Returns: + latents (np.ndarray): tensor with random values with given data type and shape (usually represents noise in latent space) + """ + latents = np.random.randn(*shape).astype(dtype) + + return latents + + + class OVContrlNetStableDiffusionPipeline(DiffusionPipeline): + """ + OpenVINO inference pipeline for Stable Diffusion with ControlNet guidence + """ + def __init__( + self, + tokenizer: CLIPTokenizer, + scheduler, + core: ov.Core, + controlnet: ov.Model, + text_encoder: ov.Model, + unet: ov.Model, + vae_decoder: ov.Model, + device:str = "AUTO" + ): + super().__init__() + self.tokenizer = tokenizer + self.vae_scale_factor = 8 + self.scheduler = scheduler + self.load_models(core, device, controlnet, text_encoder, unet, vae_decoder) + self.set_progress_bar_config(disable=True) + + def load_models(self, core: ov.Core, device: str, controlnet:ov.Model, text_encoder: ov.Model, unet: ov.Model, vae_decoder: ov.Model): + """ + Function for loading models on device using OpenVINO + + Parameters: + core (Core): OpenVINO runtime Core class instance + device (str): inference device + controlnet (Model): OpenVINO Model object represents ControlNet + text_encoder (Model): OpenVINO Model object represents text encoder + unet (Model): OpenVINO Model object represents UNet + vae_decoder (Model): OpenVINO Model object represents vae decoder + Returns + None + """ + self.text_encoder = core.compile_model(text_encoder, device) + self.text_encoder_out = self.text_encoder.output(0) + self.controlnet = core.compile_model(controlnet, device) + self.unet = core.compile_model(unet, device) + self.unet_out = self.unet.output(0) + self.vae_decoder = core.compile_model(vae_decoder, device) + self.vae_decoder_out = self.vae_decoder.output(0) + + def __call__( + self, + prompt: Union[str, List[str]], + image: Image.Image, + num_inference_steps: int = 10, + negative_prompt: Union[str, List[str]] = None, + guidance_scale: float = 7.5, + controlnet_conditioning_scale: float = 1.0, + eta: float = 0.0, + latents: Optional[np.array] = None, + output_type: Optional[str] = "pil", + ): + """ + Function invoked when calling the pipeline for generation. + + Parameters: + prompt (`str` or `List[str]`): + The prompt or prompts to guide the image generation. + image (`Image.Image`): + `Image`, or tensor representing an image batch which will be repainted according to `prompt`. + num_inference_steps (`int`, *optional*, defaults to 100): + The number of denoising steps. More denoising steps usually lead to a higher quality image at the + expense of slower inference. + negative_prompt (`str` or `List[str]`): + negative prompt or prompts for generation + guidance_scale (`float`, *optional*, defaults to 7.5): + Guidance scale as defined in [Classifier-Free Diffusion Guidance](https://arxiv.org/abs/2207.12598). + `guidance_scale` is defined as `w` of equation 2. of [Imagen + Paper](https://arxiv.org/pdf/2205.11487.pdf). Guidance scale is enabled by setting `guidance_scale > + 1`. Higher guidance scale encourages to generate images that are closely linked to the text `prompt`, + usually at the expense of lower image quality. This pipeline requires a value of at least `1`. + latents (`np.ndarray`, *optional*): + Pre-generated noisy latents, sampled from a Gaussian distribution, to be used as inputs for image + generation. Can be used to tweak the same generation with different prompts. If not provided, a latents + tensor will ge generated by sampling using the supplied random `generator`. + output_type (`str`, *optional*, defaults to `"pil"`): + The output format of the generate image. Choose between + [PIL](https://pillow.readthedocs.io/en/stable/): `Image.Image` or `np.array`. + Returns: + image ([List[Union[np.ndarray, Image.Image]]): generaited images + + """ + + # 1. Define call parameters + batch_size = 1 if isinstance(prompt, str) else len(prompt) + # here `guidance_scale` is defined analog to the guidance weight `w` of equation (2) + # of the Imagen paper: https://arxiv.org/pdf/2205.11487.pdf . `guidance_scale = 1` + # corresponds to doing no classifier free guidance. + do_classifier_free_guidance = guidance_scale > 1.0 + # 2. Encode input prompt + text_embeddings = self._encode_prompt(prompt, negative_prompt=negative_prompt) + + # 3. Preprocess image + orig_width, orig_height = image.size + image, pad = preprocess(image) + height, width = image.shape[-2:] + if do_classifier_free_guidance: + image = np.concatenate(([image] * 2)) + + # 4. set timesteps + self.scheduler.set_timesteps(num_inference_steps) + timesteps = self.scheduler.timesteps + + # 6. Prepare latent variables + num_channels_latents = 4 + latents = self.prepare_latents( + batch_size, + num_channels_latents, + height, + width, + text_embeddings.dtype, + latents, + ) + + # 7. Denoising loop + num_warmup_steps = len(timesteps) - num_inference_steps * self.scheduler.order + with self.progress_bar(total=num_inference_steps) as progress_bar: + for i, t in enumerate(timesteps): + # Expand the latents if we are doing classifier free guidance. + # The latents are expanded 3 times because for pix2pix the guidance\ + # is applied for both the text and the input image. + latent_model_input = np.concatenate( + [latents] * 2) if do_classifier_free_guidance else latents + latent_model_input = self.scheduler.scale_model_input(latent_model_input, t) + + result = self.controlnet([latent_model_input, t, text_embeddings, image]) + down_and_mid_blok_samples = [sample * controlnet_conditioning_scale for _, sample in result.items()] + + # predict the noise residual + noise_pred = self.unet([latent_model_input, t, text_embeddings, *down_and_mid_blok_samples])[self.unet_out] + + # perform guidance + if do_classifier_free_guidance: + noise_pred_uncond, noise_pred_text = noise_pred[0], noise_pred[1] + noise_pred = noise_pred_uncond + guidance_scale * (noise_pred_text - noise_pred_uncond) + + # compute the previous noisy sample x_t -> x_t-1 + latents = self.scheduler.step(torch.from_numpy(noise_pred), t, torch.from_numpy(latents)).prev_sample.numpy() + + # update progress + if i == len(timesteps) - 1 or ((i + 1) > num_warmup_steps and (i + 1) % self.scheduler.order == 0): + progress_bar.update() + + # 8. Post-processing + image = self.decode_latents(latents, pad) + + # 9. Convert to PIL + if output_type == "pil": + image = self.numpy_to_pil(image) + image = [img.resize((orig_width, orig_height), Image.Resampling.LANCZOS) for img in image] + else: + image = [cv2.resize(img, (orig_width, orig_width)) + for img in image] + + return image + + def _encode_prompt(self, prompt:Union[str, List[str]], num_images_per_prompt:int = 1, do_classifier_free_guidance:bool = True, negative_prompt:Union[str, List[str]] = None): + """ + Encodes the prompt into text encoder hidden states. + + Parameters: + prompt (str or list(str)): prompt to be encoded + num_images_per_prompt (int): number of images that should be generated per prompt + do_classifier_free_guidance (bool): whether to use classifier free guidance or not + negative_prompt (str or list(str)): negative prompt to be encoded + Returns: + text_embeddings (np.ndarray): text encoder hidden states + """ + batch_size = len(prompt) if isinstance(prompt, list) else 1 + + # tokenize input prompts + text_inputs = self.tokenizer( + prompt, + padding="max_length", + max_length=self.tokenizer.model_max_length, + truncation=True, + return_tensors="np", + ) + text_input_ids = text_inputs.input_ids + + text_embeddings = self.text_encoder( + text_input_ids)[self.text_encoder_out] + + # duplicate text embeddings for each generation per prompt + if num_images_per_prompt != 1: + bs_embed, seq_len, _ = text_embeddings.shape + text_embeddings = np.tile( + text_embeddings, (1, num_images_per_prompt, 1)) + text_embeddings = np.reshape( + text_embeddings, (bs_embed * num_images_per_prompt, seq_len, -1)) + + # get unconditional embeddings for classifier free guidance + if do_classifier_free_guidance: + uncond_tokens: List[str] + max_length = text_input_ids.shape[-1] + if negative_prompt is None: + uncond_tokens = [""] * batch_size + elif isinstance(negative_prompt, str): + uncond_tokens = [negative_prompt] + else: + uncond_tokens = negative_prompt + uncond_input = self.tokenizer( + uncond_tokens, + padding="max_length", + max_length=max_length, + truncation=True, + return_tensors="np", + ) + + uncond_embeddings = self.text_encoder(uncond_input.input_ids)[self.text_encoder_out] + + # duplicate unconditional embeddings for each generation per prompt, using mps friendly method + seq_len = uncond_embeddings.shape[1] + uncond_embeddings = np.tile(uncond_embeddings, (1, num_images_per_prompt, 1)) + uncond_embeddings = np.reshape(uncond_embeddings, (batch_size * num_images_per_prompt, seq_len, -1)) + + # For classifier free guidance, we need to do two forward passes. + # Here we concatenate the unconditional and text embeddings into a single batch + # to avoid doing two forward passes + text_embeddings = np.concatenate([uncond_embeddings, text_embeddings]) + + return text_embeddings + + def prepare_latents(self, batch_size:int, num_channels_latents:int, height:int, width:int, dtype:np.dtype = np.float32, latents:np.ndarray = None): + """ + Preparing noise to image generation. If initial latents are not provided, they will be generated randomly, + then prepared latents scaled by the standard deviation required by the scheduler + + Parameters: + batch_size (int): input batch size + num_channels_latents (int): number of channels for noise generation + height (int): image height + width (int): image width + dtype (np.dtype, *optional*, np.float32): dtype for latents generation + latents (np.ndarray, *optional*, None): initial latent noise tensor, if not provided will be generated + Returns: + latents (np.ndarray): scaled initial noise for diffusion + """ + shape = (batch_size, num_channels_latents, height // self.vae_scale_factor, width // self.vae_scale_factor) + if latents is None: + latents = randn_tensor(shape, dtype=dtype) + else: + latents = latents + + # scale the initial noise by the standard deviation required by the scheduler + latents = latents * np.array(self.scheduler.init_noise_sigma) + return latents + + def decode_latents(self, latents:np.array, pad:Tuple[int]): + """ + Decode predicted image from latent space using VAE Decoder and unpad image result + + Parameters: + latents (np.ndarray): image encoded in diffusion latent space + pad (Tuple[int]): each side padding sizes obtained on preprocessing step + Returns: + image: decoded by VAE decoder image + """ + latents = 1 / 0.18215 * latents + image = self.vae_decoder(latents)[self.vae_decoder_out] + (_, end_h), (_, end_w) = pad[1:3] + h, w = image.shape[2:] + unpad_h = h - end_h + unpad_w = w - end_w + image = image[:, :, :unpad_h, :unpad_w] + image = np.clip(image / 2 + 0.5, 0, 1) + image = np.transpose(image, (0, 2, 3, 1)) + return image + + +.. parsed-literal:: + + /tmp/ipykernel_438166/1889049886.py:1: FutureWarning: Importing `DiffusionPipeline` or `ImagePipelineOutput` from diffusers.pipeline_utils is deprecated. Please import from diffusers.pipelines.pipeline_utils instead. + from diffusers.pipeline_utils import DiffusionPipeline + + +Running Text-to-Image Generation with ControlNet Conditioning and OpenVINO +-------------------------------------------------------------------------- + + + +Now, we are ready to start generation. For improving the generation +process, we also introduce an opportunity to provide a +``negative prompt``. Technically, positive prompt steers the diffusion +toward the images associated with it, while negative prompt steers the +diffusion away from it. More explanation of how it works can be found in +this +`article `__. +We can keep this field empty if we want to generate image without +negative prompting. + +.. code:: ipython3 + + from transformers import CLIPTokenizer + from diffusers import EulerAncestralDiscreteScheduler + + tokenizer = CLIPTokenizer.from_pretrained('openai/clip-vit-large-patch14') + scheduler = EulerAncestralDiscreteScheduler.from_config(pipe.scheduler.config) + + ov_pipe = OVContrlNetStableDiffusionPipeline(tokenizer, scheduler, core, controlnet_ir_path, text_encoder_ir_path, unet_ir_path, vae_ir_path, device=device.value) + + +.. code:: ipython3 + + import qrcode + + def create_code(content: str): + """Creates QR codes with provided content.""" + qr = qrcode.QRCode( + version=1, + error_correction=qrcode.constants.ERROR_CORRECT_H, + box_size=16, + border=0, + ) + qr.add_data(content) + qr.make(fit=True) + img = qr.make_image(fill_color="black", back_color="white") + + # find smallest image size multiple of 256 that can fit qr + offset_min = 8 * 16 + w, h = img.size + w = (w + 255 + offset_min) // 256 * 256 + h = (h + 255 + offset_min) // 256 * 256 + if w > 1024: + raise gr.Error("QR code is too large, please use a shorter content") + bg = Image.new('L', (w, h), 128) + + # align on 16px grid + coords = ((w - img.size[0]) // 2 // 16 * 16, + (h - img.size[1]) // 2 // 16 * 16) + bg.paste(img, coords) + return bg + +.. code:: ipython3 + + import gradio as gr + + def _generate( + qr_code_content: str, + prompt: str, + negative_prompt: str, + seed: Optional[int] = 42, + guidance_scale: float = 10.0, + controlnet_conditioning_scale: float = 2.0, + num_inference_steps: int = 5, + ): + if seed is not None: + np.random.seed(int(seed)) + qrcode_image = create_code(qr_code_content) + return ov_pipe( + prompt, qrcode_image, negative_prompt=negative_prompt, + num_inference_steps=int(num_inference_steps), + guidance_scale=guidance_scale, + controlnet_conditioning_scale=controlnet_conditioning_scale + )[0] + + demo = gr.Interface( + _generate, + inputs=[ + gr.Textbox(label="QR Code content"), + gr.Textbox(label="Text Prompt"), + gr.Textbox(label="Negative Text Prompt"), + gr.Number( + minimum=-1, + maximum=9999999999, + step=1, + value=42, + label="Seed", + info="Seed for the random number generator" + ), + gr.Slider( + minimum=0.0, + maximum=25.0, + step=0.25, + value=7, + label="Guidance Scale", + info="Controls the amount of guidance the text prompt guides the image generation" + ), + gr.Slider( + minimum=0.5, + maximum=2.5, + step=0.01, + value=1.5, + label="Controlnet Conditioning Scale", + info="""Controls the readability/creativity of the QR code. + High values: The generated QR code will be more readable. + Low values: The generated QR code will be more creative. + """ + ), + gr.Slider(label="Steps", step=1, value=5, minimum=1, maximum=50) + ], + outputs=[ + "image" + ], + examples=[ + [ + "Hi OpenVINO", + "cozy town on snowy mountain slope 8k", + "blurry unreal occluded", + 42, 7.7, 1.4, 25 + ], + ], + ) + try: + demo.queue().launch(debug=False) + except Exception: + demo.queue().launch(share=True, debug=False) + + # If you are launching remotely, specify server_name and server_port + # EXAMPLE: `demo.launch(server_name='your server name', server_port='server port in int')` + # To learn more please refer to the Gradio docs: https://gradio.app/docs/ diff --git a/docs/notebooks/265-wuerstchen-image-generation-with-output.rst b/docs/notebooks/265-wuerstchen-image-generation-with-output.rst new file mode 100644 index 00000000000000..9e1e9e7362fe7f --- /dev/null +++ b/docs/notebooks/265-wuerstchen-image-generation-with-output.rst @@ -0,0 +1,594 @@ +Image generation with Würstchen and OpenVINO +============================================ + +.. figure:: 265-wuerstchen-image-generation-with-output_files/499b779a-61d1-4e68-a1c3-437122622ba7.png + :alt: image.png + + image.png + +`Würstchen `__ is a diffusion model, +whose text-conditional model works in a highly compressed latent space +of images. Why is this important? Compressing data can reduce +computational costs for both training and inference by magnitudes. +Training on 1024x1024 images, is way more expensive than training at +32x32. Usually, other works make use of a relatively small compression, +in the range of 4x - 8x spatial compression. Würstchen takes this to an +extreme. Through its novel design, authors achieve a 42x spatial +compression. This was unseen before because common methods fail to +faithfully reconstruct detailed images after 16x spatial compression. +Würstchen employs a two-stage compression (referred below as *Decoder*). +The first one is a VQGAN, and the second is a Diffusion Autoencoder +(more details can be found in the paper). A third model (referred below +as *Prior*) is learned in that highly compressed latent space. This +training requires fractions of the compute used for current +top-performing models, allowing also cheaper and faster inference. + +We will use PyTorch version of Würstchen `model from HuggingFace +Hub `__. + +**Table of contents:** + +- `Prerequisites <#prerequisites>`__ +- `Load the original model <#load-the-original-model>`__ + + - `Infer the original model <#infer-the-original-model>`__ + +- `Convert the model to OpenVINO + IR <#convert-the-model-to-openvino-ir>`__ + + - `Prior pipeline <#prior-pipeline>`__ + - `Decoder pipeline <#decoder-pipeline>`__ + +- `Compiling models <#compiling-models>`__ +- `Building the pipeline <#building-the-pipeline>`__ +- `Inference <#inference>`__ +- `Interactive inference <#interactive-inference>`__ + +Prerequisites +------------- + + + +.. code:: ipython3 + + %pip install -q "diffusers>=0.21.0" transformers accelerate matplotlib gradio + %pip uninstall -q -y openvino-dev openvino openvino-nightly + %pip install -q openvino-nightly + + +.. parsed-literal:: + + Note: you may need to restart the kernel to use updated packages. + Note: you may need to restart the kernel to use updated packages. + Note: you may need to restart the kernel to use updated packages. + + +.. code:: ipython3 + + from pathlib import Path + from collections import namedtuple + import gc + + import diffusers + import torch + import matplotlib.pyplot as plt + import gradio as gr + import numpy as np + + import openvino as ov + +.. code:: ipython3 + + MODELS_DIR = Path("models") + PRIOR_TEXT_ENCODER_PATH = MODELS_DIR / "prior_text_encoder.xml" + PRIOR_PRIOR_PATH = MODELS_DIR / "prior_prior.xml" + DECODER_PATH = MODELS_DIR / "decoder.xml" + TEXT_ENCODER_PATH = MODELS_DIR / "text_encoder.xml" + VQGAN_PATH = MODELS_DIR / "vqgan.xml" + + MODELS_DIR.mkdir(parents=True, exist_ok=True) + +.. code:: ipython3 + + BaseModelOutputWithPooling = namedtuple("BaseModelOutputWithPooling", "last_hidden_state") + DecoderOutput = namedtuple("DecoderOutput", "sample") + +Load the original model +----------------------- + + + +We use ``from_pretrained`` method of +``diffusers.AutoPipelineForText2Image`` to load the pipeline. + +.. code:: ipython3 + + pipeline = diffusers.AutoPipelineForText2Image.from_pretrained("warp-diffusion/wuerstchen") + + +.. parsed-literal:: + + /home/itrushkin/.virtualenvs/wuerstchen/lib/python3.10/site-packages/torch/cuda/__init__.py:611: UserWarning: Can't initialize NVML + warnings.warn("Can't initialize NVML") + + + +.. parsed-literal:: + + Loading pipeline components...: 0%| | 0/5 [00:00`__. + +.. code:: ipython3 + + convert( + pipeline.text_encoder, + TEXT_ENCODER_PATH, + example_input={ + "input_ids": torch.zeros(1, 77, dtype=torch.int32), + "attention_mask": torch.zeros(1, 77), + }, + input={"input_ids": ((1, 77),), "attention_mask": ((1, 77),)}, + ) + del pipeline.text_encoder + del pipeline.decoder_pipe.text_encoder + gc.collect() + + + + +.. parsed-literal:: + + 0 + + + +Pipeline uses VQGAN model ``decode`` method to get the full-size output +image. Here we create the wrapper module for decoding part only. Our +decoder takes as input 4x256x256 latent image. + +.. code:: ipython3 + + class VqganDecoderWrapper(torch.nn.Module): + def __init__(self, vqgan): + super().__init__() + self.vqgan = vqgan + + def forward(self, h): + return self.vqgan.decode(h) + +.. code:: ipython3 + + convert( + VqganDecoderWrapper(pipeline.vqgan), + VQGAN_PATH, + example_input=torch.zeros(1, 4, 256, 256), + input=(1, 4, 256, 256), + ) + del pipeline.decoder_pipe.vqgan + gc.collect() + + + + +.. parsed-literal:: + + 0 + + + +Compiling models +---------------- + + + +.. code:: ipython3 + + core = ov.Core() + +Select device from dropdown list for running inference using OpenVINO. + +.. code:: ipython3 + + import ipywidgets as widgets + + device = widgets.Dropdown( + options=core.available_devices + ["AUTO"], + value='AUTO', + description='Device:', + disabled=False, + ) + + device + +.. code:: ipython3 + + ov_prior_text_encoder = core.compile_model(PRIOR_TEXT_ENCODER_PATH, device.value) + +.. code:: ipython3 + + ov_prior_prior = core.compile_model(PRIOR_PRIOR_PATH, device.value) + +.. code:: ipython3 + + ov_decoder = core.compile_model(DECODER_PATH, device.value) + +.. code:: ipython3 + + ov_text_encoder = core.compile_model(TEXT_ENCODER_PATH, device.value) + +.. code:: ipython3 + + ov_vqgan = core.compile_model(VQGAN_PATH, device.value) + +Building the pipeline +--------------------- + + + +Let’s create callable wrapper classes for compiled models to allow +interaction with original ``WuerstchenCombinedPipeline`` class. Note +that all of wrapper classes return ``torch.Tensor``\ s instead of +``np.array``\ s. + +.. code:: ipython3 + + class TextEncoderWrapper: + dtype = torch.float32 # accessed in the original workflow + + def __init__(self, text_encoder): + self.text_encoder = text_encoder + + def __call__(self, input_ids, attention_mask): + output = self.text_encoder({"input_ids": input_ids, "attention_mask": attention_mask})[ + "last_hidden_state" + ] + output = torch.tensor(output) + return BaseModelOutputWithPooling(output) + +.. code:: ipython3 + + class PriorPriorWrapper: + config = namedtuple("PriorPriorWrapperConfig", "c_in")(16) # accessed in the original workflow + + def __init__(self, prior): + self.prior = prior + + def __call__(self, x, r, c): + output = self.prior([x, r, c])[0] + return torch.tensor(output) + +.. code:: ipython3 + + class DecoderWrapper: + dtype = torch.float32 # accessed in the original workflow + + def __init__(self, decoder): + self.decoder = decoder + + def __call__(self, x, r, effnet, clip): + output = self.decoder({"x": x, "r": r, "effnet": effnet, "clip": clip})[0] + output = torch.tensor(output) + return output + +.. code:: ipython3 + + class VqganWrapper: + config = namedtuple("VqganWrapperConfig", "scale_factor")(0.3764) # accessed in the original workflow + + def __init__(self, vqgan): + self.vqgan = vqgan + + def decode(self, h): + output = self.vqgan(h)[0] + output = torch.tensor(output) + return DecoderOutput(output) + +And insert wrappers instances in the pipeline: + +.. code:: ipython3 + + pipeline.prior_pipe.text_encoder = TextEncoderWrapper(ov_prior_text_encoder) + pipeline.prior_pipe.prior = PriorPriorWrapper(ov_prior_prior) + + pipeline.decoder_pipe.decoder = DecoderWrapper(ov_decoder) + pipeline.decoder_pipe.text_encoder = TextEncoderWrapper(ov_text_encoder) + pipeline.decoder_pipe.vqgan = VqganWrapper(ov_vqgan) + +Inference +--------- + + + +.. code:: ipython3 + + caption = "Anthropomorphic cat dressed as a fire fighter" + negative_prompt = "" + + output = pipeline( + prompt=caption, + height=1024, + width=1024, + negative_prompt=negative_prompt, + prior_guidance_scale=4.0, + decoder_guidance_scale=0.0, + output_type="pil", + ).images + + + +.. parsed-literal:: + + 0%| | 0/60 [00:00 +Index of /projects/ov-notebook/0.1.0-latest/20231114220808/dist/rst_files/265-wuerstchen-image-generation-with-output_files/ + +

Index of /projects/ov-notebook/0.1.0-latest/20231114220808/dist/rst_files/265-wuerstchen-image-generation-with-output_files/


../
+265-wuerstchen-image-generation-with-output_11_..> 15-Nov-2023 00:43             1321476
+265-wuerstchen-image-generation-with-output_45_..> 15-Nov-2023 00:43             1293108
+

+ diff --git a/docs/notebooks/266-speculative-sampling-with-output.rst b/docs/notebooks/266-speculative-sampling-with-output.rst new file mode 100644 index 00000000000000..42f694e151ab34 --- /dev/null +++ b/docs/notebooks/266-speculative-sampling-with-output.rst @@ -0,0 +1,374 @@ +Text Generation via Speculative Sampling, KV Caching, and OpenVINO™ +=================================================================== + +As model sizes grow, Generative AI implementations require significant +inference resources. This not only increases the cost per generation +from a prompt, but also increases the power consumption used to serve +such requests. + +Inference optimizations for text generation are essential for reducing +costs and power consumption. When optimizing the inference process, the +amount of time and energy required to generate text can be significantly +reduced. This can lead to cost savings in terms of hardware and +software, as well as reduced power consumption. Additionally, inference +optimizations can help improve the accuracy of text generation as well +as the speed at which it can be generated. This can lead to an improved +user experience and increased efficiency in text-generation tasks. In +summary, inference optimizations for text generation are essential to +reduce costs and power consumption, while also improving the accuracy +and speed of text generation. + +Another necessary condition is that the optimizations are compatible +with each other. That is, implementing a certain optimization should not +preclude other optimizations. There are several levels of optimizations +that can provide significant speedup without “bumping into each other” +in a way that will compromise overall efficiency. + +For details on this method, please refer to the paper by Chen et al, +http://arxiv.org/abs/2302.01318. Additionally, there’s an interesting +proof of correctness of speculative sampling (showing that the original +distribution is preserved) by Leviathan et al, +http://arxiv.org/abs/2211.17192 + +Our blog article describing this implementation with OpenVino is +available at openvino.ai + +**Table of contents:** + +- `Prerequisites <#prerequisites>`__ + + - `Select inference device <#select-inference-device>`__ + +- `Create autoregressive and speculative forms of sampling with KV + Cache + support <#create-autoregressive-and-speculative-forms-of-sampling-with-kv-cache-support>`__ + + - `Setup imports <#setup-imports>`__ + - `Prepare autoregressive + sampling <#prepare-autoregressive-sampling>`__ + - `Prepare speculative sampling <#prepare-speculative-sampling>`__ + +- `Main generation function <#main-generation-function>`__ + + - `Download and Convert Model <#download-and-convert-model>`__ + +Prerequisites +------------- + + + +First, we should install the `Hugging Face +Optimum `__ library +accelerated by OpenVINO integration. The Hugging Face Optimum Intel API +is a high-level API that enables us to convert and quantize models from +the Hugging Face Transformers library to the OpenVINO™ IR format. For +more details, refer to the `Hugging Face Optimum Intel +documentation `__. + +We will also need to install transformers (HuggingFace) and some other +useful modules. + +.. code:: ipython3 + + %pip install -q --upgrade pip + %pip install -q --upgrade transformers torch gradio openvino accelerate onnx onnxruntime ipywidgets + %pip install -q "git+https://github.com/huggingface/optimum-intel.git" + +Select inference device +~~~~~~~~~~~~~~~~~~~~~~~ + + + +Select the device from dropdown list for running inference using +OpenVINO. + +.. code:: ipython3 + + import ipywidgets as widgets + from openvino.runtime import Core + + core = Core() + + device = widgets.Dropdown( + options=core.available_devices + ["AUTO"], + value='CPU', + description='Device:', + disabled=False, + ) + + device + +Create autoregressive and speculative forms of sampling with KV Cache support +----------------------------------------------------------------------------- + + + +Text generation is often done in an autoregressive fashion. We will all +support a KV cache (aka Past Value Cache) in the code. Note that we are +using greedy sampling. We do not adjust other text generation parameters +(e.g. temperature) so keep this illustration of speculative sampling as +simple and understandable as possible. + +Setup imports +~~~~~~~~~~~~~ + + + +.. code:: ipython3 + + import time + import numpy as np + import torch + import gradio as gr + +Prepare autoregressive sampling +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + + +.. code:: ipython3 + + def max_fn(x): + x_max = torch.where(x > 0, x, torch.zeros_like(x)) + return x_max / torch.sum(x_max) + + def autoregressive_sampling_with_pkv(x, model, N): + n = len(x) + T = n + N + input = x + past_kv = None + + while n < T: + res = model(input, attention_mask=torch.ones(input.size(), dtype=torch.long), past_key_values=past_kv) + model_out = torch.softmax(res.logits, dim=2) + past_kv = res.past_key_values + next_token = torch.reshape(torch.argmax(model_out[-1][-1]), (1, 1)) + x = torch.cat((x, next_token), dim=1) + n += 1 + input = next_token + + return x + +Prepare speculative sampling +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + + +- Step 1: With speculative sampling, we first generate K samples from + the draft model (in an autoregressive manner). +- Step 2: These are now candidates to examine using the target model + (step 2) using a batch size of K. +- Step 3: We now determine if the K candidates from the draft model are + acceptable based on the logits generated from the target model in + step 2. +- Step 4: We can sample another token with no additional cost (assuming + that all the candidates were accepted). + +Regarding the acceptance criterion for step 3, we need to compare logits +from the target model and compare with the draft model. If the ratio is +high enough, it’s likely to be accepted (using a random number). + +.. code:: ipython3 + + def speculative_sampling_with_pkv(x, draft_model, target_model, N, K): + n = x.size(1) + T = n + N + target_past_kv = None + while n < T: + # Step 1: autoregressive decode of K candidate tokens from + # the draft model and get final p for this batch of candidates + x_draft = None + draft_past_kv = None + x_draft_input = x + p_cum = None + for _ in range(K): + res_draft = draft_model(x_draft_input, attention_mask=torch.ones(x_draft_input.size(), dtype=torch.long), past_key_values=draft_past_kv, use_cache=True) + p = res_draft.logits + p = torch.softmax(p, dim=2) + draft_past_kv = res_draft.past_key_values + next_token = torch.reshape(torch.argmax(p[-1][-1]), (1, 1)) + x_draft_input = next_token + if p_cum is None: + p_cum = p[:, -1].unsqueeze(1) + x_draft = next_token + else: + p_cum = torch.cat((p_cum, p), dim=1) + x_draft = torch.cat((x_draft, next_token), dim=1) + # Step 2: target model forward passes on x_draft + if target_past_kv is None: + x_draft_target_input = torch.cat((x, x_draft), dim=1) + else: + x_draft_target_input = x_draft + + res = target_model(x_draft_target_input, attention_mask=torch.ones(x_draft_target_input.size(), dtype=torch.long), use_cache=False) + q = res.logits + + target_new_past_kv = res.past_key_values + # Step 3: append draft tokens based on acceptance-rejection criterion and resample a token on rejection + all_accepted = True + for k in range(K): + j = x_draft[0][k].item() + + q_item = q[-1][k][j].detach().numpy() + p_item = p_cum[-1][k][j].detach().numpy() + + if np.random.random() < min(1, (q_item / p_item)): # accepted + x = torch.cat((x, torch.tensor(j).reshape(1,1)), dim=1) + n += 1 + else: # rejected + q_p = max_fn(q[0][k] - p_cum[0][k]) + resampled_output = torch.argmax(q_p) + resampled_output = torch.reshape(resampled_output, (1,1)) + x = torch.cat((x, resampled_output), dim=1) + n += 1 + all_accepted = False + break + + target_past_kv = target_new_past_kv + # Step 4: if all draft tokens were accepted, sample a final token + if all_accepted: + x = torch.cat((x, torch.reshape(torch.argmax(q[-1][-1]), (1,1))), dim=1) + n += 1 + + return x + +Main generation function +------------------------ + + + +Download and Convert Model +~~~~~~~~~~~~~~~~~~~~~~~~~~ + + + +Optimum Intel can be used to load optimized models from the `Hugging +Face Hub `__ and +create pipelines to run an inference with OpenVINO Runtime using Hugging +Face APIs. The Optimum Inference models are API compatible with Hugging +Face Transformers models. This means we just need to replace +``AutoModelForXxx`` class with the corresponding ``OVModelForXxx`` +class. + +Below is an example of the Dolly model + +.. code:: diff + + -from transformers import AutoModelForCausalLM + +from optimum.intel.openvino import OVModelForCausalLM + from transformers import AutoTokenizer, pipeline + + model_id = "databricks/dolly-v2-3b" + -model = AutoModelForCausalLM.from_pretrained(model_id) + +model = OVModelForCausalLM.from_pretrained(model_id, from_transformers=True) + +Model class initialization starts with calling ``from_pretrained`` +method. When downloading and converting Transformers model, the +parameter ``from_transformers=True`` should be added. We can save the +converted model for the next usage with the ``save_pretrained`` method. +Tokenizer class and pipelines API are compatible with Optimum models. + +.. code:: ipython3 + + from pathlib import Path + from transformers import AutoTokenizer + from optimum.intel.openvino import OVModelForCausalLM + + # If you are on a large system with lots of memory, you can run a larger model like DollyV2 + # draft_model_id = "databricks/dolly-v2-3b" + # draft_model_path = Path("dolly-v2-3b") + # target_model_id = "databricks/dolly-v2-12b" + # target_model_path = Path("dolly-v2-12b") + # If you are on a system with limited memory, you can try the smaller GPT2 models + draft_model_id = "gpt2" + draft_model_path = Path("gpt2-local") + target_model_id = "gpt2-xl" + target_model_path = Path("gpt2-xl-local") + + target_tokenizer = AutoTokenizer.from_pretrained(target_model_id) + + current_device = device.value + + # Save local copies for subsequent runs + if draft_model_path.exists(): + draft_ov_model = OVModelForCausalLM.from_pretrained(draft_model_path, device=current_device) + else: + draft_ov_model = OVModelForCausalLM.from_pretrained(draft_model_id, device=current_device, from_transformers=True) + draft_ov_model.save_pretrained(draft_model_path) + if target_model_path.exists(): + target_ov_model = OVModelForCausalLM.from_pretrained(target_model_path, device=current_device) + else: + target_ov_model = OVModelForCausalLM.from_pretrained(target_model_id, device=current_device, from_transformers=True) + target_ov_model.save_pretrained(target_model_path) + + +.. code:: ipython3 + + def main( + prompt: str = "Explain the difference between fission and fusion", + n_tokens_to_generate: int = 100, + K: int = 5, + seed: int = 5555, + ): + # seed numpy rng + np.random.seed(seed) + draft_model = draft_ov_model + target_model = target_ov_model + + + input_ids = target_tokenizer(prompt, return_tensors="pt")['input_ids'] + + def run_autoregressive_sampling_fn(decode_fn, input_ids, **kwargs): + start = time.perf_counter() + output_ids = decode_fn(x=input_ids, **kwargs) + text = target_tokenizer.decode(output_ids[0], skip_special_tokens=True) + elapsed_time = time.perf_counter() - start + return text, elapsed_time + + def run_speculative_sampling_fn(decode_fn, input_ids, **kwargs): + start = time.perf_counter() + output_ids = decode_fn(x=input_ids, **kwargs) + text = target_tokenizer.decode(output_ids[0], skip_special_tokens=True) + elapsed_time = time.perf_counter() - start + return text, elapsed_time + + autoregressive_text, autoregressive_time = run_autoregressive_sampling_fn( + autoregressive_sampling_with_pkv, + input_ids, + model=target_model, + N=n_tokens_to_generate, + ) + + speculative_text, speculative_time = run_speculative_sampling_fn( + speculative_sampling_with_pkv, + input_ids, + target_model=target_model, + draft_model=draft_model, + N=n_tokens_to_generate, + K=K, + ) + + # Format results for output in gradio + out = "\n" + "Autoregressive Decode" + "\n" + "---------------------" + "\n" + out = out + f"Time = {autoregressive_time:.2f}s" + "\n" + f"Text = {autoregressive_text}" + "\n" + out = out + "\n" + "Speculative Decode" + "\n" + "------------------" + "\n" + out = out + f"Time = {speculative_time:.2f}s" + "\n" + f"Text = {speculative_text}" + return out + + if __name__ == "__main__": + with gr.Blocks() as demo: + gr.Markdown( + """ + # Speculative Sampling Demo + ## The output will show a comparison of Autoregressive Sampling vs Speculative Sampling + - Target Model: Dolly V2 12B + - Draft Model: Dolly V2 3B + - K = 5 + > Some improvements can be made to acceptance criterion and adjusting temperature to improve text quality. + """) + with gr.Row(): + inp = gr.Textbox(placeholder="THIS CANNOT BE EMPTY", label="Input Prompt") + out = gr.Textbox(label="Output") + btn = gr.Button("Run") + btn.click(fn=main, inputs=inp, outputs=out) + demo.launch() diff --git a/docs/notebooks/267-distil-whisper-asr-with-output.rst b/docs/notebooks/267-distil-whisper-asr-with-output.rst new file mode 100644 index 00000000000000..f9fc4e58cb43a7 --- /dev/null +++ b/docs/notebooks/267-distil-whisper-asr-with-output.rst @@ -0,0 +1,1228 @@ +Automatic speech recognition using Distil-Whisper and OpenVINO +============================================================== + +`Distil-Whisper `__ +is a distilled variant of the +`Whisper `__ model by +OpenAI. The Distil-Whisper is proposed in the paper `Robust Knowledge +Distillation via Large-Scale Pseudo +Labelling `__. According to authors, +compared to Whisper, Distil-Whisper runs 6x faster with 50% fewer +parameters, while performing to within 1% word error rate (WER) on +out-of-distribution evaluation data. + +Whisper is a Transformer based encoder-decoder model, also referred to +as a sequence-to-sequence model. It maps a sequence of audio spectrogram +features to a sequence of text tokens. First, the raw audio inputs are +converted to a log-Mel spectrogram by action of the feature extractor. +Then, the Transformer encoder encodes the spectrogram to form a sequence +of encoder hidden states. Finally, the decoder autoregressively predicts +text tokens, conditional on both the previous tokens and the encoder +hidden states. + +You can see the model architecture in the diagram below: + +.. figure:: https://user-images.githubusercontent.com/29454499/204536571-8f6d8d77-5fbd-4c6d-8e29-14e734837860.svg + :alt: whisper_architecture.svg + + whisper_architecture.svg + +In this tutorial, we consider how to run Distil-Whisper using OpenVINO. +We will use the pre-trained model from the `Hugging Face +Transformers `__ +library. To simplify the user experience, the `Hugging Face +Optimum `__ library is used to +convert the model to OpenVINO™ IR format. To further improve OpenVINO +Distil-Whisper model performance ``INT8`` post-training quantization +from `NNCF `__ is applied. + +**Table of contents:** + +- `Prerequisites <#prerequisites>`__ +- `Load PyTorch model <#load-pytorch-model>`__ +- `Prepare input sample <#prepare-input-sample>`__ +- `Run model inference <#run-model-inference>`__ +- `Load OpenVINO model using Optimum library <#load-openvino-model-using-optimum-library>`__ +- `Select Inference device <#select-inference-device>`__ +- `Compile OpenVINO model <#compile-openvino-model>`__ +- `Run OpenVINO model inference <#run-openvino-model-inference>`__ +- `Compare performance PyTorch vs OpenVINO <#compare-performance-pytorch-vs-openvino>`__ +- `Compare with OpenAI Whisper <#compare-with-openai-whisper>`__ +- `Usage OpenVINO model with HuggingFace pipelines <#usage-openvino-model-with-huggingface-pipelines>`__ +- `Quantization <#quantization>`__ +- `Prepare calibration datasets <#prepare-calibration-datasets>`__ +- `Quantize Distil-Whisper encoder and decoder models <#quantize-distil-whisper-encoder-and-decoder-models>`__ +- `Run quantized model inference <#run-quantized-model-inference>`__ +- `Compare performance and accuracy of the original and quantized models <#compare-performance-and-accuracy-of-the-original-and-quantized-models>`__ +- `Interactive demo <#interactive-demo>`__ + +Prerequisites +------------------------------------------------------- + +.. code:: ipython3 + + %pip uninstall -q -y openvino-dev openvino openvino-nightly + %pip install -q openvino-nightly + %pip install -q "transformers" onnx datasets "git+https://github.com/eaidova/optimum-intel.git@ea/whisper" "gradio>=4.0" "librosa" "soundfile" + %pip install -q "nncf>=2.6.0" "jiwer" + + +.. parsed-literal:: + + Note: you may need to restart the kernel to use updated packages. + ERROR: tokenizers 0.14.1 has requirement huggingface_hub<0.18,>=0.16.4, but you'll have huggingface-hub 0.19.0 which is incompatible. + Note: you may need to restart the kernel to use updated packages. + + +Load PyTorch model +------------------ + + + +The ``AutoModelForSpeechSeq2Seq.from_pretrained`` method is used for the +initialization of PyTorch Whisper model using the transformers library. +We will use the ``distil-whisper/distil-large-v2`` model as an example +in this tutorial. The model will be downloaded once during first run and +this process may require some time. More details about this model can be +found in +`model_card `__. + +Preprocessing and post-processing are important in this model use. +``AutoProcessor`` class used for initialization ``WhisperProcessor`` is +responsible for preparing audio input data for the model, converting it +to Mel-spectrogram and decoding predicted output token_ids into string +using tokenizer. + +.. code:: ipython3 + + from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq + + distil_model_id = "distil-whisper/distil-large-v2" + + processor = AutoProcessor.from_pretrained(distil_model_id) + + pt_distil_model = AutoModelForSpeechSeq2Seq.from_pretrained(distil_model_id) + pt_distil_model.eval(); + + +.. parsed-literal:: + + Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. + + +Prepare input sample +~~~~~~~~~~~~~~~~~~~~ + + + +The processor expects audio data in numpy array format and information +about the audio sampling rate and returns the ``input_features`` tensor +for making predictions. Conversion of audio to numpy format is handled +by Hugging Face datasets implementation. + +.. code:: ipython3 + + from datasets import load_dataset + + def extract_input_features(sample): + input_features = processor( + sample["audio"]["array"], + sampling_rate=sample["audio"]["sampling_rate"], + return_tensors="pt", + ).input_features + return input_features + + dataset = load_dataset( + "hf-internal-testing/librispeech_asr_dummy", "clean", split="validation" + ) + sample = dataset[0] + input_features = extract_input_features(sample) + +Run model inference +~~~~~~~~~~~~~~~~~~~ + + + +To perform speech recognition, one can use ``generate`` interface of the +model. After generation is finished processor.batch_decode can be used +for decoding predicted token_ids into text transcription. + +.. code:: ipython3 + + import IPython.display as ipd + + predicted_ids = pt_distil_model.generate(input_features) + transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True) + + display(ipd.Audio(sample["audio"]["array"], rate=sample["audio"]["sampling_rate"])) + print(f"Reference: {sample['text']}") + print(f"Result: {transcription[0]}") + + + +.. raw:: html + + + + + + +.. parsed-literal:: + + Reference: MISTER QUILTER IS THE APOSTLE OF THE MIDDLE CLASSES AND WE ARE GLAD TO WELCOME HIS GOSPEL + Result: Mr. Quilter is the apostle of the middle classes, and we are glad to welcome his gospel. + + +Load OpenVINO model using Optimum library +----------------------------------------- + + + +The Hugging Face Optimum API is a high-level API that enables us to +convert and quantize models from the Hugging Face Transformers library +to the OpenVINO™ IR format. For more details, refer to the `Hugging Face +Optimum +documentation `__. + +Optimum Intel can be used to load optimized models from the `Hugging +Face Hub `__ and +create pipelines to run an inference with OpenVINO Runtime using Hugging +Face APIs. The Optimum Inference models are API compatible with Hugging +Face Transformers models. This means we just need to replace the +``AutoModelForXxx`` class with the corresponding ``OVModelForXxx`` +class. + +Below is an example of the distil-whisper model + +.. code:: diff + + -from transformers import AutoModelForSpeechSeq2Seq + +from optimum.intel.openvino import OVModelForSpeechSeq2Seq + from transformers import AutoTokenizer, pipeline + + model_id = "distil-whisper/distil-large-v2" + -model = AutoModelForSpeechSeq2Seq.from_pretrained(model_id) + +model = OVModelForSpeechSeq2Seq.from_pretrained(model_id, export=True) + +Model class initialization starts with calling the ``from_pretrained`` +method. When downloading and converting the Transformers model, the +parameter ``export=True`` should be added. We can save the converted +model for the next usage with the ``save_pretrained`` method. Tokenizers +and Processors are distributed with models also compatible with the +OpenVINO model. It means that we can reuse initialized early processor. + +.. code:: ipython3 + + from pathlib import Path + from optimum.intel.openvino import OVModelForSpeechSeq2Seq + + distil_model_path = Path(distil_model_id.split("/")[-1]) + + if not distil_model_path.exists(): + ov_distil_model = OVModelForSpeechSeq2Seq.from_pretrained( + distil_model_id, export=True, compile=False + ) + ov_distil_model.half() + ov_distil_model.save_pretrained(distil_model_path) + else: + ov_distil_model = OVModelForSpeechSeq2Seq.from_pretrained( + distil_model_path, compile=False + ) + + +.. parsed-literal:: + + INFO:nncf:NNCF initialized successfully. Supported frameworks detected: torch, onnx, openvino + + +Select Inference device\ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. code:: ipython3 + + import openvino as ov + import ipywidgets as widgets + + core = ov.Core() + + device = widgets.Dropdown( + options=core.available_devices + ["AUTO"], + value="AUTO", + description="Device:", + disabled=False, + ) + + device + + + + +.. parsed-literal:: + + Dropdown(description='Device:', index=4, options=('CPU', 'GPU.0', 'GPU.1', 'GPU.2', 'AUTO'), value='AUTO') + + + +Compile OpenVINO model +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. code:: ipython3 + + ov_distil_model.to(device.value) + ov_distil_model.compile() + + +.. parsed-literal:: + + Compiling the encoder to AUTO ... + Compiling the decoder to AUTO ... + Compiling the decoder to AUTO ... + + +Run OpenVINO model inference +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. code:: ipython3 + + predicted_ids = ov_distil_model.generate(input_features) + transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True) + + display(ipd.Audio(sample["audio"]["array"], rate=sample["audio"]["sampling_rate"])) + print(f"Reference: {sample['text']}") + print(f"Result: {transcription[0]}") + + +.. parsed-literal:: + + /home/nsavel/venvs/ov_notebooks_tmp/lib/python3.8/site-packages/optimum/intel/openvino/modeling_seq2seq.py:457: FutureWarning: `shared_memory` is deprecated and will be removed in 2024.0. Value of `shared_memory` is going to override `share_inputs` value. Please use only `share_inputs` explicitly. + last_hidden_state = torch.from_numpy(self.request(inputs, shared_memory=True)["last_hidden_state"]).to( + /home/nsavel/venvs/ov_notebooks_tmp/lib/python3.8/site-packages/optimum/intel/openvino/modeling_seq2seq.py:538: FutureWarning: `shared_memory` is deprecated and will be removed in 2024.0. Value of `shared_memory` is going to override `share_inputs` value. Please use only `share_inputs` explicitly. + self.request.start_async(inputs, shared_memory=True) + + + +.. raw:: html + + + + + + +.. parsed-literal:: + + Reference: MISTER QUILTER IS THE APOSTLE OF THE MIDDLE CLASSES AND WE ARE GLAD TO WELCOME HIS GOSPEL + Result: Mr. Quilter is the apostle of the middle classes, and we are glad to welcome his gospel. + + +Compare performance PyTorch vs OpenVINO\ +--------------------------------------------------------------------------------- + +.. code:: ipython3 + + import time + import numpy as np + from tqdm.notebook import tqdm + + + def measure_perf(model, sample, n=10): + timers = [] + input_features = extract_input_features(sample) + for _ in tqdm(range(n), desc="Measuring performance"): + start = time.perf_counter() + model.generate(input_features) + end = time.perf_counter() + timers.append(end - start) + return np.median(timers) + +.. code:: ipython3 + + perf_distil_torch = measure_perf(pt_distil_model, sample) + perf_distil_ov = measure_perf(ov_distil_model, sample) + + + +.. parsed-literal:: + + Measuring performance: 0%| | 0/10 [00:00`__ +interface for ``automatic-speech-recognition``. Pipeline can be used for +long audio transcription. Distil-Whisper uses a chunked algorithm to +transcribe long-form audio files. In practice, this chunked long-form +algorithm is 9x faster than the sequential algorithm proposed by OpenAI +in the Whisper paper. To enable chunking, pass the chunk_length_s +parameter to the pipeline. For Distil-Whisper, a chunk length of 15 +seconds is optimal. To activate batching, pass the argument batch_size. + +.. code:: ipython3 + + from transformers import pipeline + + ov_distil_model.generation_config = pt_distil_model.generation_config + + pipe = pipeline( + "automatic-speech-recognition", + model=ov_distil_model, + tokenizer=processor.tokenizer, + feature_extractor=processor.feature_extractor, + max_new_tokens=128, + chunk_length_s=15, + batch_size=16, + ) + + +.. parsed-literal:: + + The model 'OVModelForWhisper' is not supported for automatic-speech-recognition. Supported models are ['Pop2PianoForConditionalGeneration', 'SeamlessM4TForSpeechToText', 'SpeechEncoderDecoderModel', 'Speech2TextForConditionalGeneration', 'SpeechT5ForSpeechToText', 'WhisperForConditionalGeneration', 'Data2VecAudioForCTC', 'HubertForCTC', 'MCTCTForCTC', 'SEWForCTC', 'SEWDForCTC', 'UniSpeechForCTC', 'UniSpeechSatForCTC', 'Wav2Vec2ForCTC', 'Wav2Vec2ConformerForCTC', 'WavLMForCTC']. + + +.. code:: ipython3 + + dataset = load_dataset("distil-whisper/librispeech_long", "clean", split="validation") + sample_long = dataset[0] + + + def format_timestamp(seconds: float): + """ + format time in srt-file expected format + """ + assert seconds >= 0, "non-negative timestamp expected" + milliseconds = round(seconds * 1000.0) + + hours = milliseconds // 3_600_000 + milliseconds -= hours * 3_600_000 + + minutes = milliseconds // 60_000 + milliseconds -= minutes * 60_000 + + seconds = milliseconds // 1_000 + milliseconds -= seconds * 1_000 + + return ( + f"{hours}:" if hours > 0 else "00:" + ) + f"{minutes:02d}:{seconds:02d},{milliseconds:03d}" + + + def prepare_srt(transcription): + """ + Format transcription into srt file format + """ + segment_lines = [] + for idx, segment in enumerate(transcription["chunks"]): + segment_lines.append(str(idx + 1) + "\n") + timestamps = segment["timestamp"] + time_start = format_timestamp(timestamps[0]) + time_end = format_timestamp(timestamps[1]) + time_str = f"{time_start} --> {time_end}\n" + segment_lines.append(time_str) + segment_lines.append(segment["text"] + "\n\n") + return segment_lines + +``return_timestamps`` argument allows getting timestamps of start and +end of speech associated with each processed chunk. It could be useful +in tasks like speech separation or generation of video subtitles. In +this example, we provide output formatting in SRT format, one of the +popular subtitles format. + +.. code:: ipython3 + + result = pipe(sample_long["audio"].copy(), return_timestamps=True) + +.. code:: ipython3 + + srt_lines = prepare_srt(result) + + display( + ipd.Audio(sample_long["audio"]["array"], rate=sample_long["audio"]["sampling_rate"]) + ) + print("".join(srt_lines)) + + + +.. raw:: html + + + + + + +.. parsed-literal:: + + 1 + 00:00:00,000 --> 00:00:06,560 + Mr. Quilter is the apostle of the middle classes, and we are glad to welcome his gospel. + + 2 + 00:00:06,560 --> 00:00:11,280 + Nor is Mr. Quilter's manner less interesting than his matter. + + 3 + 00:00:11,280 --> 00:00:16,840 + He tells us that at this festive season of the year, with Christmas and roast beef looming + + 4 + 00:00:16,840 --> 00:00:23,760 + before us, similes drawn from eating and its results occur most readily to the mind. + + 5 + 00:00:23,760 --> 00:00:29,360 + He has grave doubts whether Sir Frederick Leighton's work is really Greek after all, and + + 6 + 00:00:29,360 --> 00:00:33,640 + can discover in it but little of Rocky Ithaca. + + 7 + 00:00:33,640 --> 00:00:39,760 + Lennel's pictures are a sort of upgards and Adam paintings, and Mason's exquisite + + 8 + 00:00:39,760 --> 00:00:44,720 + idles are as national as a jingo poem. + + 9 + 00:00:44,720 --> 00:00:50,320 + Mr. Burkett Foster's landscapes smile at one much in the same way that Mr. Carker used + + 10 + 00:00:50,320 --> 00:00:52,920 + to flash his teeth. + + 11 + 00:00:52,920 --> 00:00:58,680 + And Mr. John Collier gives his sitter a cheerful slap on the back, before he says, like + + 12 + 00:00:58,680 --> 00:01:01,120 + a shampooer and a Turkish bath, + + 13 + 00:01:01,120 --> 00:01:02,000 + Next man! + + + + +Quantization +------------ + + + +`NNCF `__ enables +post-training quantization by adding the quantization layers into the +model graph and then using a subset of the training dataset to +initialize the parameters of these additional quantization layers. The +framework is designed so that modifications to your original training +code are minor. + +The optimization process contains the following steps: + +1. Create a calibration dataset for quantization. +2. Run ``nncf.quantize`` to obtain quantized encoder and decoder models. +3. Serialize the ``INT8`` model using ``openvino.save_model`` function. + +.. + + **Note**: Quantization is time and memory consuming operation. + Running quantization code below may take some time. + +Please select below whether you would like to run Distil-Whisper +quantization. + +.. code:: ipython3 + + to_quantize = widgets.Checkbox( + value=True, + description='Quantization', + disabled=False, + ) + + to_quantize + + + + +.. parsed-literal:: + + Checkbox(value=True, description='Quantization') + + + +.. code:: ipython3 + + # Fetch notebook_utils module + import urllib.request + + urllib.request.urlretrieve( + url='https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/main/notebooks/utils/skip_kernel_extension.py', + filename='skip_kernel_extension.py' + ) + + %load_ext skip_kernel_extension + +Prepare calibration datasets +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + + +First step is to prepare calibration datasets for quantization. Since we +quantize whisper encoder and decoder separately, we need to prepare a +calibration dataset for each of the models. We define a +``InferRequestWrapper`` class that will intercept model inputs and +collect them to a list. Then we run model inference on some small amount +of audio samples. Generally, increasing the calibration dataset size +improves quantization quality. + +.. code:: ipython3 + + %%skip not $to_quantize.value + + from itertools import islice + from typing import List, Any + from openvino import Tensor + + + class InferRequestWrapper: + def __init__(self, request, data_cache: List): + self.request = request + self.data_cache = data_cache + + def __call__(self, *args, **kwargs): + self.data_cache.append(*args) + return self.request(*args, *kwargs) + + def infer(self, inputs: Any = None, shared_memory: bool = False): + self.data_cache.append(inputs) + return self.request.infer(inputs, shared_memory) + + def start_async( + self, + inputs: Any = None, + userdata: Any = None, + shared_memory: bool = False, + ): + self.data_cache.append(inputs) + self.request.infer(inputs, shared_memory) + + def wait(self): + pass + + def get_tensor(self, name: str): + return Tensor(self.request.results[name]) + + def __getattr__(self, attr): + if attr in self.__dict__: + return getattr(self, attr) + return getattr(self.request, attr) + + def collect_calibration_dataset(ov_model, calibration_dataset_size): + # Overwrite model request properties, saving the original ones for restoring later + original_encoder_request = ov_model.encoder.request + original_decoder_with_past_request = ov_model.decoder_with_past.request + encoder_calibration_data = [] + decoder_calibration_data = [] + ov_model.encoder.request = InferRequestWrapper(original_encoder_request, encoder_calibration_data) + ov_model.decoder_with_past.request = InferRequestWrapper(original_decoder_with_past_request, + decoder_calibration_data) + + calibration_dataset = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation") + for sample in tqdm(islice(calibration_dataset, calibration_dataset_size), desc="Collecting calibration data", + total=calibration_dataset_size): + input_features = extract_input_features(sample) + ov_model.generate(input_features) + + ov_model.encoder.request = original_encoder_request + ov_model.decoder_with_past.request = original_decoder_with_past_request + + return encoder_calibration_data, decoder_calibration_data + +Quantize Distil-Whisper encoder and decoder models +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + + +Below we run the ``quantize`` function which calls ``nncf.quantize`` on +Distil-Whisper encoder and decoder-with-past models. We don’t quantize +first-step-decoder because its share in whole inference time is +negligible. + +.. code:: ipython3 + + %%skip not $to_quantize.value + + import shutil + import nncf + + CALIBRATION_DATASET_SIZE = 10 + quantized_distil_model_path = Path(f"{distil_model_path}_quantized") + + + def quantize(ov_model, calibration_dataset_size): + if not quantized_distil_model_path.exists(): + encoder_calibration_data, decoder_calibration_data = collect_calibration_dataset( + ov_model, calibration_dataset_size + ) + print("Quantizing encoder") + quantized_encoder = nncf.quantize( + ov_model.encoder.model, + nncf.Dataset(encoder_calibration_data), + preset=nncf.QuantizationPreset.MIXED, + subset_size=len(encoder_calibration_data), + model_type=nncf.ModelType.TRANSFORMER, + # Smooth Quant algorithm reduces activation quantization error; optimal alpha value was obtained through grid search + advanced_parameters=nncf.AdvancedQuantizationParameters(smooth_quant_alpha=0.50) + ) + ov.save_model(quantized_encoder, quantized_distil_model_path / "openvino_encoder_model.xml") + del quantized_encoder + del encoder_calibration_data + gc.collect() + + print("Quantizing decoder with past") + quantized_decoder_with_past = nncf.quantize( + ov_model.decoder_with_past.model, + nncf.Dataset(decoder_calibration_data), + preset=nncf.QuantizationPreset.MIXED, + subset_size=len(decoder_calibration_data), + model_type=nncf.ModelType.TRANSFORMER, + # Smooth Quant algorithm reduces activation quantization error; optimal alpha value was obtained through grid search + advanced_parameters=nncf.AdvancedQuantizationParameters(smooth_quant_alpha=0.95) + ) + ov.save_model(quantized_decoder_with_past, quantized_distil_model_path / "openvino_decoder_with_past_model.xml") + del quantized_decoder_with_past + del decoder_calibration_data + gc.collect() + + # Copy the config file and the first-step-decoder manually + shutil.copy(distil_model_path / "config.json", quantized_distil_model_path / "config.json") + shutil.copy(distil_model_path / "openvino_decoder_model.xml", quantized_distil_model_path / "openvino_decoder_model.xml") + shutil.copy(distil_model_path / "openvino_decoder_model.bin", quantized_distil_model_path / "openvino_decoder_model.bin") + + quantized_ov_model = OVModelForSpeechSeq2Seq.from_pretrained(quantized_distil_model_path, compile=False) + quantized_ov_model.to(device.value) + quantized_ov_model.compile() + return quantized_ov_model + + + ov_quantized_distil_model = quantize(ov_distil_model, CALIBRATION_DATASET_SIZE) + + + +.. parsed-literal:: + + Collecting calibration data: 0%| | 0/10 [00:00 + + Your browser does not support the audio element. + + + + +.. parsed-literal:: + + Original : Mr. Quilter is the apostle of the middle classes, and we are glad to welcome his gospel. + Quantized: Mr. Quilter is the apostle of the middle classes, and we are glad to welcome his gospel. + + +Results are the same! + +Compare performance and accuracy of the original and quantized models +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + + +Finally, we compare original and quantized Distil-Whisper models from +accuracy and performance stand-points. + +To measure accuracy, we use ``1 - WER`` as a metric, where WER stands +for Word Error Rate. + +When measuring inference time, we do it separately for encoder and +decoder-with-past model forwards, and for the whole model inference too. + +.. code:: ipython3 + + %%skip not $to_quantize.value + + import time + from contextlib import contextmanager + from jiwer import wer, wer_standardize + + + TEST_DATASET_SIZE = 50 + MEASURE_TIME = False + + @contextmanager + def time_measurement(): + global MEASURE_TIME + try: + MEASURE_TIME = True + yield + finally: + MEASURE_TIME = False + + def time_fn(obj, fn_name, time_list): + original_fn = getattr(obj, fn_name) + + def wrapper(*args, **kwargs): + if not MEASURE_TIME: + return original_fn(*args, **kwargs) + start_time = time.perf_counter() + result = original_fn(*args, **kwargs) + end_time = time.perf_counter() + time_list.append(end_time - start_time) + return result + + setattr(obj, fn_name, wrapper) + + def calculate_transcription_time_and_accuracy(ov_model, test_samples): + encoder_infer_times = [] + decoder_with_past_infer_times = [] + whole_infer_times = [] + time_fn(ov_model, "generate", whole_infer_times) + time_fn(ov_model.encoder, "forward", encoder_infer_times) + time_fn(ov_model.decoder_with_past, "forward", decoder_with_past_infer_times) + + ground_truths = [] + predictions = [] + for data_item in tqdm(test_samples, desc="Measuring performance and accuracy"): + input_features = extract_input_features(data_item) + + with time_measurement(): + predicted_ids = ov_model.generate(input_features) + transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True) + + ground_truths.append(data_item["text"]) + predictions.append(transcription[0]) + + word_accuracy = (1 - wer(ground_truths, predictions, reference_transform=wer_standardize, + hypothesis_transform=wer_standardize)) * 100 + mean_whole_infer_time = sum(whole_infer_times) + mean_encoder_infer_time = sum(encoder_infer_times) + mean_decoder_with_time_infer_time = sum(decoder_with_past_infer_times) + return word_accuracy, (mean_whole_infer_time, mean_encoder_infer_time, mean_decoder_with_time_infer_time) + + test_dataset = load_dataset("librispeech_asr", "clean", split="test", streaming=True) + test_dataset = test_dataset.shuffle(seed=42).take(TEST_DATASET_SIZE) + test_samples = [sample for sample in test_dataset] + + accuracy_original, times_original = calculate_transcription_time_and_accuracy(ov_distil_model, test_samples) + accuracy_quantized, times_quantized = calculate_transcription_time_and_accuracy(ov_quantized_distil_model, test_samples) + print(f"Encoder performance speedup: {times_original[1] / times_quantized[1]:.3f}") + print(f"Decoder with past performance speedup: {times_original[2] / times_quantized[2]:.3f}") + print(f"Whole pipeline performance speedup: {times_original[0] / times_quantized[0]:.3f}") + print(f"Whisper transcription word accuracy. Original model: {accuracy_original:.2f}%. Quantized model: {accuracy_quantized:.2f}%.") + print(f"Accuracy drop: {accuracy_original - accuracy_quantized:.2f}%.") + + +.. parsed-literal:: + + Got disconnected from remote data host. Retrying in 5sec [1/20] + Got disconnected from remote data host. Retrying in 5sec [2/20] + + + +.. parsed-literal:: + + Measuring performance and accuracy: 0%| | 0/50 [00:00 MAX_AUDIO_MINS: + raise gr.Error( + f"To ensure fair usage of the Space, the maximum audio length permitted is {MAX_AUDIO_MINS} minutes." + f"Got an audio of length {round(audio_length_mins, 3)} minutes." + ) + + inputs = {"array": inputs, "sampling_rate": pipe.feature_extractor.sampling_rate} + + def _forward_ov_time(*args, **kwargs): + global ov_time + start_time = time.time() + result = pipe_forward(*args, **kwargs) + ov_time = time.time() - start_time + ov_time = round(ov_time, 2) + return result + + pipe._forward = _forward_ov_time + ov_text = pipe(inputs.copy(), batch_size=BATCH_SIZE)["text"] + return ov_text, ov_time + + + with gr.Blocks() as demo: + gr.HTML( + """ +
+
+

+ OpenVINO Distil-Whisper demo +

+
+
+ """ + ) + audio = gr.components.Audio(type="filepath", label="Audio input") + with gr.Row(): + button = gr.Button("Transcribe") + if to_quantize.value: + button_q = gr.Button("Transcribe quantized") + with gr.Row(): + infer_time = gr.components.Textbox( + label="OpenVINO Distil-Whisper Transcription Time (s)" + ) + if to_quantize.value: + infer_time_q = gr.components.Textbox( + label="OpenVINO Quantized Distil-Whisper Transcription Time (s)" + ) + with gr.Row(): + transcription = gr.components.Textbox( + label="OpenVINO Distil-Whisper Transcription", show_copy_button=True + ) + if to_quantize.value: + transcription_q = gr.components.Textbox( + label="OpenVINO Quantized Distil-Whisper Transcription", show_copy_button=True + ) + button.click( + fn=transcribe, + inputs=audio, + outputs=[transcription, infer_time], + ) + if to_quantize.value: + button_q.click( + fn=transcribe, + inputs=[audio, gr.Number(value=1, visible=False)], + outputs=[transcription_q, infer_time_q], + ) + gr.Markdown("## Examples") + gr.Examples( + [["./example_1.wav"]], + audio, + outputs=[transcription, infer_time], + fn=transcribe, + cache_examples=False, + ) + # if you are launching remotely, specify server_name and server_port + # demo.launch(server_name='your server name', server_port='server port in int') + # Read more in the docs: https://gradio.app/docs/ + try: + demo.launch(debug=False) + except Exception: + demo.launch(share=True, debug=False) + + +.. parsed-literal:: + + The model 'OVModelForWhisper' is not supported for automatic-speech-recognition. Supported models are ['Pop2PianoForConditionalGeneration', 'SeamlessM4TForSpeechToText', 'SpeechEncoderDecoderModel', 'Speech2TextForConditionalGeneration', 'SpeechT5ForSpeechToText', 'WhisperForConditionalGeneration', 'Data2VecAudioForCTC', 'HubertForCTC', 'MCTCTForCTC', 'SEWForCTC', 'SEWDForCTC', 'UniSpeechForCTC', 'UniSpeechSatForCTC', 'Wav2Vec2ForCTC', 'Wav2Vec2ConformerForCTC', 'WavLMForCTC']. + The model 'OVModelForWhisper' is not supported for automatic-speech-recognition. Supported models are ['Pop2PianoForConditionalGeneration', 'SeamlessM4TForSpeechToText', 'SpeechEncoderDecoderModel', 'Speech2TextForConditionalGeneration', 'SpeechT5ForSpeechToText', 'WhisperForConditionalGeneration', 'Data2VecAudioForCTC', 'HubertForCTC', 'MCTCTForCTC', 'SEWForCTC', 'SEWDForCTC', 'UniSpeechForCTC', 'UniSpeechSatForCTC', 'Wav2Vec2ForCTC', 'Wav2Vec2ConformerForCTC', 'WavLMForCTC']. + /home/nsavel/venvs/ov_notebooks_tmp/lib/python3.8/site-packages/gradio/blocks.py:891: UserWarning: api_name transcribe already exists, using transcribe_1 + warnings.warn(f"api_name {api_name} already exists, using {api_name_}") + + +.. parsed-literal:: + + Running on local URL: http://127.0.0.1:7860 + + To create a public link, set `share=True` in `launch()`. + + + +.. .. raw:: html + +..
+ + diff --git a/docs/notebooks/268-table-question-answering-with-output.rst b/docs/notebooks/268-table-question-answering-with-output.rst new file mode 100644 index 00000000000000..60a6495dbd03d5 --- /dev/null +++ b/docs/notebooks/268-table-question-answering-with-output.rst @@ -0,0 +1,483 @@ +Table Question Answering using TAPAS and OpenVINO™ +================================================== + +Table Question Answering (Table QA) is the answering a question about an +information on a given table. You can use the Table Question Answering +models to simulate SQL execution by inputting a table. + +In this tutorial we demonstrate how to perform table question answering +using OpenVINO. This example based on `TAPAS base model fine-tuned on +WikiTable Questions +(WTQ) `__ that +is based on the paper `TAPAS: Weakly Supervised Table Parsing via +Pre-training `__. + +Answering natural language questions over tables is usually seen as a +semantic parsing task. To alleviate the collection cost of full logical +forms, one popular approach focuses on weak supervision consisting of +denotations instead of logical forms. However, training semantic parsers +from weak supervision poses difficulties, and in addition, the generated +logical forms are only used as an intermediate step prior to retrieving +the denotation. In `this +paper `__, it is presented TAPAS, +an approach to question answering over tables without generating logical +forms. TAPAS trains from weak supervision, and predicts the denotation +by selecting table cells and optionally applying a corresponding +aggregation operator to such selection. TAPAS extends BERT’s +architecture to encode tables as input, initializes from an effective +joint pre-training of text segments and tables crawled from Wikipedia, +and is trained end-to-end. + +**Table of contents:** + +- `Prerequisites <#prerequisites>`__ +- `Use the original model to run an + inference <#use-the-original-model-to-run-an-inference>`__ +- `Convert the original model to OpenVINO Intermediate Representation + (IR) + format <#convert-the-original-model-to-openvino-intermediate-representation-ir-format>`__ +- `Run the OpenVINO model <#run-the-openvino-model>`__ +- `Interactive inference <#interactive-inference>`__ + +Prerequisites +~~~~~~~~~~~~~ + + + +.. code:: ipython3 + + %pip uninstall -q -y openvino-dev openvino openvino-nightly + %pip install -q openvino-nightly + # other dependencies + %pip install -q torch "transformers>=4.31.0" --extra-index-url https://download.pytorch.org/whl/cpu + %pip install -q "gradio>=4.0.2" + + +.. parsed-literal:: + + WARNING: Skipping openvino-nightly as it is not installed. + Note: you may need to restart the kernel to use updated packages. + DEPRECATION: pytorch-lightning 1.6.5 has a non-standard dependency specifier torch>=1.8.*. pip 24.0 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of pytorch-lightning or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063 + Note: you may need to restart the kernel to use updated packages. + DEPRECATION: pytorch-lightning 1.6.5 has a non-standard dependency specifier torch>=1.8.*. pip 24.0 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of pytorch-lightning or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063 + Note: you may need to restart the kernel to use updated packages. + DEPRECATION: pytorch-lightning 1.6.5 has a non-standard dependency specifier torch>=1.8.*. pip 24.0 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of pytorch-lightning or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063 + Note: you may need to restart the kernel to use updated packages. + + +.. code:: ipython3 + + import torch + from transformers import TapasForQuestionAnswering + from transformers import TapasTokenizer + from transformers import pipeline + import pandas as pd + + +.. parsed-literal:: + + 2023-11-15 00:16:22.014004: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`. + 2023-11-15 00:16:22.047161: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. + To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. + 2023-11-15 00:16:22.631876: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT + + +Use ``TapasForQuestionAnswering.from_pretrained`` to download a +pretrained model and ``TapasTokenizer.from_pretrained`` to get a +tokenizer. + +.. code:: ipython3 + + model = TapasForQuestionAnswering.from_pretrained('google/tapas-large-finetuned-wtq') + tokenizer = TapasTokenizer.from_pretrained("google/tapas-large-finetuned-wtq") + + data = {"Actors": ["Brad Pitt", "Leonardo Di Caprio", "George Clooney"], "Number of movies": ["87", "53", "69"]} + table = pd.DataFrame.from_dict(data) + question = "how many movies does Leonardo Di Caprio have?" + table + + + + +.. raw:: html + +
+ + + + + + + + + + + + + + + + + + + + + + + + + + +
ActorsNumber of movies
0Brad Pitt87
1Leonardo Di Caprio53
2George Clooney69
+
+ + + +Use the original model to run an inference +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + + +We use `this +example `__ to +demonstrate how to make an inference. You can use ``pipeline`` from +``transformer`` library for this purpose. + +.. code:: ipython3 + + tqa = pipeline(task="table-question-answering", model=model, tokenizer=tokenizer) + result = tqa(table=table, query=question) + print(f"The answer is {result['cells'][0]}") + + +.. parsed-literal:: + + The answer is 53 + + +.. parsed-literal:: + + /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-545/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/transformers/models/tapas/modeling_tapas.py:1785: UserWarning: scatter_reduce() is in beta and the API may change at any time. (Triggered internally at ../aten/src/ATen/native/TensorAdvancedIndexing.cpp:1615.) + segment_means = out.scatter_reduce( + + +You can read more about the inference output structure in `this +documentation `__. + +Convert the original model to OpenVINO Intermediate Representation (IR) format +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + + +The original model is a PyTorch module, that can be converted with +``ov.convert_model`` function directly. We also use ``ov.save_model`` +function to serialize the result of conversion. + +.. code:: ipython3 + + import openvino as ov + from pathlib import Path + + + # Define the input shape + batch_size = 1 + sequence_length = 29 + + # Modify the input shape of the dummy_input dictionary + dummy_input = { + "input_ids": torch.zeros((batch_size, sequence_length), dtype=torch.long), + "attention_mask": torch.zeros((batch_size, sequence_length), dtype=torch.long), + "token_type_ids": torch.zeros((batch_size, sequence_length, 7), dtype=torch.long), + } + + + ov_model_xml_path = Path('models/ov_model.xml') + + if not ov_model_xml_path.exists(): + ov_model = ov.convert_model( + model, + example_input=dummy_input + ) + ov.save_model(ov_model, ov_model_xml_path) + + +.. parsed-literal:: + + WARNING:tensorflow:Please fix your imports. Module tensorflow.python.training.tracking.base has been moved to tensorflow.python.trackable.base. The old module will be deleted in version 2.11. + + +.. parsed-literal:: + + [ WARNING ] Please fix your imports. Module %s has been moved to %s. The old module will be deleted in version %s. + /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-545/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/transformers/models/tapas/modeling_tapas.py:1600: TracerWarning: torch.as_tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect. + self.indices = torch.as_tensor(indices) + /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-545/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/transformers/models/tapas/modeling_tapas.py:1601: TracerWarning: torch.as_tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect. + self.num_segments = torch.as_tensor(num_segments, device=indices.device) + /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-545/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/transformers/models/tapas/modeling_tapas.py:1703: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect. + batch_size = torch.prod(torch.tensor(list(index.batch_shape()))) + /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-545/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/transformers/models/tapas/modeling_tapas.py:1779: TracerWarning: torch.as_tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect. + [torch.as_tensor([-1], dtype=torch.long), torch.as_tensor(vector_shape, dtype=torch.long)], dim=0 + /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-545/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/transformers/models/tapas/modeling_tapas.py:1782: TracerWarning: Converting a tensor to a Python list might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! + flat_values = values.reshape(flattened_shape.tolist()) + /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-545/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/transformers/models/tapas/modeling_tapas.py:1784: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! + out = torch.zeros(int(flat_index.num_segments), dtype=torch.float, device=flat_values.device) + /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-545/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/transformers/models/tapas/modeling_tapas.py:1792: TracerWarning: torch.as_tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect. + torch.as_tensor(index.batch_shape(), dtype=torch.long), + /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-545/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/transformers/models/tapas/modeling_tapas.py:1793: TracerWarning: torch.as_tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect. + torch.as_tensor([index.num_segments], dtype=torch.long), + /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-545/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/transformers/models/tapas/modeling_tapas.py:1794: TracerWarning: torch.as_tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect. + torch.as_tensor(vector_shape, dtype=torch.long), + /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-545/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/transformers/models/tapas/modeling_tapas.py:1799: TracerWarning: Converting a tensor to a Python list might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! + output_values = segment_means.clone().view(new_shape.tolist()).to(values.dtype) + /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-545/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/transformers/models/tapas/modeling_tapas.py:1730: TracerWarning: torch.as_tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect. + batch_shape = torch.as_tensor( + /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-545/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/transformers/models/tapas/modeling_tapas.py:1734: TracerWarning: torch.as_tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect. + num_segments = torch.as_tensor(num_segments) # create a rank 0 tensor (scalar) containing num_segments (e.g. 64) + /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-545/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/transformers/models/tapas/modeling_tapas.py:1745: TracerWarning: Converting a tensor to a Python list might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! + new_shape = [int(x) for x in new_tensor.tolist()] + /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-545/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/transformers/models/tapas/modeling_tapas.py:1748: TracerWarning: torch.as_tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect. + multiples = torch.cat([batch_shape, torch.as_tensor([1])], dim=0) + /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-545/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/transformers/models/tapas/modeling_tapas.py:1749: TracerWarning: Converting a tensor to a Python list might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! + indices = indices.repeat(multiples.tolist()) + /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-545/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/transformers/models/tapas/modeling_tapas.py:316: TracerWarning: torch.as_tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect. + torch.as_tensor(self.config.max_position_embeddings - 1, device=device), position - first_position + /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-545/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/transformers/models/tapas/modeling_tapas.py:1260: TracerWarning: torch.as_tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect. + indices=torch.min(row_ids, torch.as_tensor(self.config.max_num_rows - 1, device=row_ids.device)), + /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-545/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/transformers/models/tapas/modeling_tapas.py:1265: TracerWarning: torch.as_tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect. + indices=torch.min(column_ids, torch.as_tensor(self.config.max_num_columns - 1, device=column_ids.device)), + /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-545/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/transformers/models/tapas/modeling_tapas.py:1957: TracerWarning: torch.as_tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect. + column_logits += CLOSE_ENOUGH_TO_LOG_ZERO * torch.as_tensor( + /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-545/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/transformers/models/tapas/modeling_tapas.py:1962: TracerWarning: torch.as_tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect. + column_logits += CLOSE_ENOUGH_TO_LOG_ZERO * torch.as_tensor( + /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-545/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/transformers/models/tapas/modeling_tapas.py:1998: TracerWarning: torch.as_tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect. + labels_per_column, _ = reduce_sum(torch.as_tensor(labels, dtype=torch.float32, device=labels.device), col_index) + /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-545/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/transformers/models/tapas/modeling_tapas.py:2021: TracerWarning: torch.as_tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect. + torch.as_tensor(labels, dtype=torch.long, device=labels.device), cell_index + /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-545/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/transformers/models/tapas/modeling_tapas.py:2028: TracerWarning: torch.as_tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect. + column_mask = torch.as_tensor( + /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-545/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/transformers/models/tapas/modeling_tapas.py:2053: TracerWarning: torch.as_tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect. + selected_column_id = torch.as_tensor( + /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-545/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/transformers/models/tapas/modeling_tapas.py:2058: TracerWarning: torch.as_tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect. + selected_column_mask = torch.as_tensor( + + +Run the OpenVINO model +~~~~~~~~~~~~~~~~~~~~~~ + + + +Select a device from dropdown list for running inference using OpenVINO. + +.. code:: ipython3 + + import ipywidgets as widgets + + core = ov.Core() + + device = widgets.Dropdown( + options=core.available_devices + ["AUTO"], + value='AUTO', + description='Device:', + disabled=False, + ) + + device + + + + +.. parsed-literal:: + + Dropdown(description='Device:', index=1, options=('CPU', 'AUTO'), value='AUTO') + + + +We use ``ov.compile_model`` to make it ready to use for loading on a +device. To prepare inputs use the original ``tokenizer``. + +.. code:: ipython3 + + inputs = tokenizer(table=table, queries=question, padding="max_length", return_tensors="pt") + + compiled_model = core.compile_model(ov_model_xml_path, device.value) + result = compiled_model((inputs["input_ids"], inputs["attention_mask"], inputs["token_type_ids"])) + +Now we should postprocess results. For this, we can use the appropriate +part of the code from +`postprocess `__ +method of ``TableQuestionAnsweringPipeline``. + +.. code:: ipython3 + + logits = result[0] + logits_aggregation = result[1] + + + predictions = tokenizer.convert_logits_to_predictions(inputs, torch.from_numpy(result[0])) + answer_coordinates_batch = predictions[0] + aggregators = {} + aggregators_prefix = {} + answers = [] + for index, coordinates in enumerate(answer_coordinates_batch): + cells = [table.iat[coordinate] for coordinate in coordinates] + aggregator = aggregators.get(index, "") + aggregator_prefix = aggregators_prefix.get(index, "") + answer = { + "answer": aggregator_prefix + ", ".join(cells), + "coordinates": coordinates, + "cells": [table.iat[coordinate] for coordinate in coordinates], + } + if aggregator: + answer["aggregator"] = aggregator + + answers.append(answer) + + print(answers[0]["cells"][0]) + + +.. parsed-literal:: + + 53 + + +Also, we can use the original pipeline. For this, we should create a +wrapper for ``TapasForQuestionAnswering`` class replacing ``forward`` +method to use the OpenVINO model for inference and methods and +attributes of original model class to be integrated into the pipeline. + +.. code:: ipython3 + + from transformers import TapasConfig + + + # get config for pretrained model + config = TapasConfig.from_pretrained('google/tapas-large-finetuned-wtq') + + + + class TapasForQuestionAnswering(TapasForQuestionAnswering): # it is better to keep the class name to avoid warnings + def __init__(self, ov_model_path): + super().__init__(config) # pass config from the pretrained model + self.tqa_model = core.compile_model(ov_model_path, device.value) + + def forward(self, input_ids, *, attention_mask, token_type_ids): + results = self.tqa_model((input_ids, attention_mask, token_type_ids)) + + return torch.from_numpy(results[0]), torch.from_numpy(results[1]) + + + compiled_model = TapasForQuestionAnswering(ov_model_xml_path) + tqa = pipeline(task="table-question-answering", model=compiled_model, tokenizer=tokenizer) + print(tqa(table=table, query=question)["cells"][0]) + + +.. parsed-literal:: + + 53 + + +Interactive inference +~~~~~~~~~~~~~~~~~~~~~ + + + +.. code:: ipython3 + + import urllib.request + + import gradio as gr + import pandas as pd + + + urllib.request.urlretrieve( + url="https://github.com/openvinotoolkit/openvino_notebooks/files/13215688/eu_city_population_top10.csv", + filename="eu_city_population_top10.csv" + ) + + + def display_table(csv_file_name): + table = pd.read_csv(csv_file_name.name, delimiter=",") + table = table.astype(str) + + return table + + + def highlight_answers(x, coordinates): + highlighted_table = pd.DataFrame('', index=x.index, columns=x.columns) + for coordinates_i in coordinates: + highlighted_table.iloc[coordinates_i[0], coordinates_i[1]] = "background-color: lightgreen" + + return highlighted_table + + + def infer(query, csv_file_name): + table = pd.read_csv(csv_file_name.name, delimiter=",") + table = table.astype(str) + + result = tqa(table=table, query=query) + table = table.style.apply(highlight_answers, axis=None, coordinates=result["coordinates"]) + + return result["answer"], table + + + with gr.Blocks(title="TAPAS Table Question Answering") as demo: + with gr.Row(): + with gr.Column(): + search_query = gr.Textbox(label="Search query") + csv_file = gr.File(label="CSV file") + infer_button = gr.Button("Submit", variant="primary") + with gr.Column(): + answer = gr.Textbox(label="Result") + result_csv_file = gr.Dataframe(label="All data") + + examples = [ + ["What is the city with the highest population that is not a capital?", "eu_city_population_top10.csv"], + ["In which country is Madrid?", "eu_city_population_top10.csv"], + ["In which cities is the population greater than 2,000,000?", "eu_city_population_top10.csv"], + ] + gr.Examples(examples, inputs=[search_query, csv_file]) + + # Callbacks + csv_file.upload(display_table, inputs=csv_file, outputs=result_csv_file) + csv_file.select(display_table, inputs=csv_file, outputs=result_csv_file) + csv_file.change(display_table, inputs=csv_file, outputs=result_csv_file) + infer_button.click(infer, inputs=[search_query, csv_file], outputs=[answer, result_csv_file]) + + try: + demo.queue().launch(debug=False) + except Exception: + demo.queue().launch(share=True, debug=False) + + +.. parsed-literal:: + + /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-545/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/gradio/blocks.py:928: UserWarning: api_name display_table already exists, using display_table_1 + warnings.warn(f"api_name {api_name} already exists, using {api_name_}") + /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-545/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/gradio/blocks.py:928: UserWarning: api_name display_table already exists, using display_table_2 + warnings.warn(f"api_name {api_name} already exists, using {api_name_}") + + +.. parsed-literal:: + + Running on local URL: http://127.0.0.1:7860 + + To create a public link, set `share=True` in `launch()`. + + + +.. .. raw:: html + +..
+ diff --git a/docs/notebooks/269-film-slowmo-with-output.rst b/docs/notebooks/269-film-slowmo-with-output.rst new file mode 100644 index 00000000000000..14841979ccc0fb --- /dev/null +++ b/docs/notebooks/269-film-slowmo-with-output.rst @@ -0,0 +1,521 @@ +Frame interpolation using FILM and OpenVINO +=========================================== + +`Frame +interpolation `__ is +the process of synthesizing in-between images from a given set of +images. The technique is often used for `temporal +up-sampling `__ +to increase the refresh rate of videos or to create slow motion effects. +Nowadays, with digital cameras and smartphones, we often take several +photos within a few seconds to capture the best picture. Interpolating +between these “near-duplicate” photos can lead to engaging videos that +reveal scene motion, often delivering an even more pleasing sense of the +moment than the original photos. + +|image0| + +In `“FILM: Frame Interpolation for Large +Motion” `__, published at ECCV +2022, a method to create high quality slow-motion videos from +near-duplicate photos is presented. FILM is a new neural network +architecture that achieves state-of-the-art results in large motion, +while also handling smaller motions well. + +The FILM model takes two images as input and outputs a middle image. At +inference time, the model is recursively invoked to output in-between +images. FILM has three components: 1. Feature extractor that summarizes +each input image with deep multi-scale (pyramid) features; 2. +Bi-directional motion estimator that computes pixel-wise motion (i.e., +flows) at each pyramid level; 3. Fusion module that outputs the final +interpolated image. + +FILM is trained on regular video frame triplets, with the middle frame +serving as the ground-truth for supervision. + +In this tutorial, we will use `TensorFlow Hub `__ as +a model source. + +**Table of contents:** + +- `Prerequisites <#prerequisites>`__ +- `Prepare images <#prepare-images>`__ +- `Load the model <#load-the-model>`__ +- `Infer the model <#infer-the-model>`__ +- `Single middle frame interpolation <#single-middle-frame-interpolation>`__ +- `Recursive frame generation <#recursive-frame-generation>`__ +- `Convert the model to OpenVINO IR <#convert-the-model-to-openvino-ir>`__ +- `Inference <#inference>`__ +- `Select inference device <#select-inference-device>`__ +- `Single middle frame interpolation <#single-middle-frame-interpolation>`__ +- `Recursive frame generation <#recursive-frame-generation>`__ +- `Interactive inference <#interactive-inference>`__ + +.. |image0| image:: https://github.com/googlestaging/frame-interpolation/raw/main/moment.gif + +Prerequisites +------------- + + + +.. code:: ipython3 + + %pip install -q tensorflow tensorflow_hub numpy "opencv-python" tqdm matplotlib gradio Pillow + %pip uninstall -q -y openvino-dev openvino openvino-nightly + %pip install -q openvino-nightly + + +.. parsed-literal:: + + + [notice] A new release of pip is available: 23.2.1 -> 23.3.1 + [notice] To update, run: pip install --upgrade pip + Note: you may need to restart the kernel to use updated packages. + WARNING: Skipping openvino as it is not installed. + Note: you may need to restart the kernel to use updated packages. + + [notice] A new release of pip is available: 23.2.1 -> 23.3.1 + [notice] To update, run: pip install --upgrade pip + Note: you may need to restart the kernel to use updated packages. + + +.. code:: ipython3 + + from pathlib import Path + from urllib.request import urlretrieve + from typing import Optional, Generator + from datetime import datetime + import gc + + + import tensorflow_hub as hub + import tensorflow as tf + import openvino as ov + import ipywidgets + import numpy as np + import cv2 + import matplotlib.pyplot as plt + from tqdm.auto import tqdm + import gradio as gr + import PIL + import IPython + + +.. parsed-literal:: + + 2023-11-02 11:23:42.519606: I tensorflow/core/util/port.cc:111] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`. + 2023-11-02 11:23:42.521340: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used. + 2023-11-02 11:23:42.549839: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered + 2023-11-02 11:23:42.549860: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered + 2023-11-02 11:23:42.549882: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered + 2023-11-02 11:23:42.555392: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used. + 2023-11-02 11:23:42.556206: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. + To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. + 2023-11-02 11:23:43.247021: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT + + +.. code:: ipython3 + + MODEL_PATH = Path("models/model.xml") + DATA_PATH = Path("data") + IMAGES = { + "https://raw.githubusercontent.com/google-research/frame-interpolation/main/photos/one.png": Path("data/one.png"), + "https://raw.githubusercontent.com/google-research/frame-interpolation/main/photos/two.png": Path("data/two.png") + } + OUTPUT_VIDEO_PATH = DATA_PATH / "output.webm" + OV_OUTPUT_VIDEO_PATH = DATA_PATH / "ov_output.webm" + TIMES_TO_INTERPOLATE = 5 + DATA_PATH.mkdir(parents=True, exist_ok=True) + + PIL.ImageFile.LOAD_TRUNCATED_IMAGES = True # allows Gradio to read PNG images with large metadata + +Prepare images +-------------- + + + +Download images and cast them to NumPy arrays to provide as model +inputs. + +.. code:: ipython3 + + def preprocess_np_frame(frame): + result = frame.astype(np.float32) / 255 # normalize to [0, 1] + result = result[np.newaxis, ...] # add batch dim + return result + + def prepare_input(img_url: str): + if not IMAGES[img_url].exists(): + urlretrieve(img_url, IMAGES[img_url]) + filename = str(IMAGES[img_url]) + img = cv2.imread(filename) + img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) + img = np.array(img) + img = preprocess_np_frame(img) + + return img + + input_images = [prepare_input(url) for url in IMAGES] + + input = { + "x0": input_images[0], + "x1": input_images[1], + "time": np.array([[0.5]], dtype=np.float32) + } + +.. code:: ipython3 + + plt.figure(figsize=(16, 8), layout="tight") + plt.subplot(1, 2, 1) + plt.imshow(input_images[0][0]) + plt.axis("off") + plt.subplot(1, 2, 2) + plt.imshow(input_images[1][0]) + plt.axis("off"); + + + +.. image:: 269-film-slowmo-with-output_files/269-film-slowmo-with-output_7_0.png + + +Load the model +-------------- + + + +Model is loaded using ``tensorflow_hub.KerasLayer`` function. Then, we +specify shapes of input tensors to cast loaded object to +``tf.keras.Model`` class. + +Input tensors are: - ``time`` - value between :math:`[0,1]` that says +where the generated image should be. :math:`0.5` is midway between the +input images. - ``x0`` - initial frame. - ``x1`` - final frame. + +For more details, see `model page on TensorFlow +Hub `__. + +.. code:: ipython3 + + inputs = dict( + x0=tf.keras.layers.Input(shape=(None, None, 3)), + x1=tf.keras.layers.Input(shape=(None, None, 3)), + time=tf.keras.layers.Input(shape=(1)), + ) + film_layer = hub.KerasLayer("https://tfhub.dev/google/film/1")(inputs) + film_model = tf.keras.Model(inputs=inputs, outputs=film_layer) + +Infer the model +--------------- + + + +Single middle frame interpolation +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + + +.. code:: ipython3 + + output = film_model(input) + interpolated_image = output["image"][0] + interpolated_image = np.clip(interpolated_image, 0, 1) + +.. code:: ipython3 + + def draw(img1, mid_img, img2): + title2img = {"First frame": img1, "Interpolated frame": mid_img, "Last frame": img2} + plt.figure(figsize=(16,8), layout="tight") + for i, (title, img) in enumerate(title2img.items()): + ax = plt.subplot(1, 3, i + 1) + ax.set_title(title) + plt.imshow(img) + plt.axis("off") + +.. code:: ipython3 + + draw(input_images[0][0], interpolated_image, input_images[1][0]) + + + +.. image:: 269-film-slowmo-with-output_files/269-film-slowmo-with-output_14_0.png + + +Recursive frame generation +~~~~~~~~~~~~~~~~~~~~~~~~~~ + + + +The process will take as input 2 original frames (first and last) and +generate a midpoint frame. Then, it will repeat itself for pairs “first +- midpoint”, “midpoint - last” to provide midpoints for them, and so on. +Recursion is executed :math:`t=` ``times_to_interpolate`` times +generating :math:`2^t-1` images. + +.. code:: ipython3 + + class Interpolator: + def __init__(self, model): + self._model = model + + def _recursive_generator( + self, + frame1: np.ndarray, + frame2: np.ndarray, + num_recursions: int, + bar: Optional[tqdm] = None, + ) -> Generator[np.ndarray, None, None]: + """Splits halfway to repeatedly generate more frames. + + Args: + frame1: Input image 1. + frame2: Input image 2. + num_recursions: How many times to interpolate the consecutive image pairs. + + Yields: + The interpolated frames, including the first frame (frame1), but excluding + the final frame2. + """ + if num_recursions == 0: + yield frame1 + else: + time = np.array([[0.5]], dtype=np.float32) + mid_frame = self._model({"x0": frame1, "x1": frame2, "time": time})["image"] + if bar is not None: + bar.update(1) + yield from self._recursive_generator(frame1, mid_frame, num_recursions - 1, bar) + yield from self._recursive_generator(mid_frame, frame2, num_recursions - 1, bar) + + def interpolate_recursively( + self, frame1: np.ndarray, frame2: np.ndarray, times_to_interpolate: int + ) -> Generator[np.ndarray, None, None]: + """Generates interpolated frames by repeatedly interpolating the midpoint. + + Args: + frame1: Input image 1. + frame2: Input image 2. + times_to_interpolate: Number of times to do recursive midpoint + interpolation. + + Yields: + The interpolated frames (including the inputs). + """ + num_frames = 2 ** (times_to_interpolate) - 1 + bar = tqdm(total=num_frames) + yield from self._recursive_generator(frame1, frame2, times_to_interpolate, bar) + # Separately yield the final frame. + yield frame2 + +.. code:: ipython3 + + def save_as_video(frames: Generator[np.ndarray, None, None], width: int, height: int, filename: Path): + out = cv2.VideoWriter(str(filename), cv2.VideoWriter_fourcc(*'VP90'), 30, (width, height)) + for frame in frames: + img = frame[0] + img = np.clip(img, 0, 1) + rgb_img = img * 255 + rgb_img = rgb_img.astype(np.uint8) + bgr_img = cv2.cvtColor(rgb_img, cv2.COLOR_RGB2BGR) + out.write(bgr_img) + out.release() + +.. code:: ipython3 + + height, width = input_images[0][0].shape[:2] + interpolator = Interpolator(film_model) + frames = interpolator.interpolate_recursively(input_images[0], input_images[1], TIMES_TO_INTERPOLATE) + save_as_video(frames, width, height, OUTPUT_VIDEO_PATH) + + +.. parsed-literal:: + + OpenCV: FFMPEG: tag 0x30395056/'VP90' is not supported with codec id 167 and format 'webm / WebM' + + + +.. parsed-literal:: + + 0%| | 0/31 [00:00 + + Your browser does not support the video tag. + + + + +Convert the model to OpenVINO IR +-------------------------------- + + + +To convert a TensorFlow Keras Model to OpenVINO Intermediate +Representation (IR), call the ``openvino.convert_model()`` function and +pass the model as the only argument. You can then serialize the model +object to disk using the ``openvino.save_model()`` function. + +.. code:: ipython3 + + if not MODEL_PATH.exists(): + converted_model = ov.convert_model(film_model) + ov.save_model(converted_model, MODEL_PATH) + del converted_model + del film_model + gc.collect() + + + + +.. parsed-literal:: + + 100834 + + + +Inference +--------- + + + +Select inference device +~~~~~~~~~~~~~~~~~~~~~~~ + + + +select device from dropdown list for running inference using OpenVINO + +.. code:: ipython3 + + core = ov.Core() + device = ipywidgets.Dropdown( + options=core.available_devices + ["AUTO"], + value='AUTO', + description='Device:', + disabled=False, + ) + device + + + + +.. parsed-literal:: + + Dropdown(description='Device:', index=4, options=('CPU', 'GPU.0', 'GPU.1', 'GPU.2', 'AUTO'), value='AUTO') + + + +.. code:: ipython3 + + compiled_model = core.compile_model(MODEL_PATH, device.value) + +Single middle frame interpolation +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + + +Model output has multiple tensors, including auxiliary inference data. +The main output tensor - interpolated image - is stored at “image” key. + +.. code:: ipython3 + + result = compiled_model(input)["image"] + image = result[0] + image = np.clip(image, 0, 1) + +Model returned intermediate image. Let’s see what it is. + +.. code:: ipython3 + + draw(input_images[0][0], image, input_images[1][0]) + + + +.. image:: 269-film-slowmo-with-output_files/269-film-slowmo-with-output_29_0.png + + +Recursive frame generation +~~~~~~~~~~~~~~~~~~~~~~~~~~ + + + +Now let’s create a smooth video by recursively generating frames between +initial, middle and final images. + +.. code:: ipython3 + + height, width = input_images[0][0].shape[:2] + ov_interpolator = Interpolator(compiled_model) + frames = ov_interpolator.interpolate_recursively(input_images[0], input_images[1], TIMES_TO_INTERPOLATE) + save_as_video(frames, width, height, OV_OUTPUT_VIDEO_PATH) + + +.. parsed-literal:: + + OpenCV: FFMPEG: tag 0x30395056/'VP90' is not supported with codec id 167 and format 'webm / WebM' + + + +.. parsed-literal:: + + 0%| | 0/31 [00:00 + + Your browser does not support the video tag. + + + + +Interactive inference +--------------------- + + + +.. code:: ipython3 + + def generate(frame1, frame2, times_to_interpolate, _=gr.Progress(track_tqdm=True)): + x0, x1 = [preprocess_np_frame(frame) for frame in [frame1, frame2]] + frames = ov_interpolator.interpolate_recursively(x0, x1, times_to_interpolate) + height, width = frame1.shape[:2] + filename = DATA_PATH / f"output_{datetime.now().isoformat()}.webm" + save_as_video(frames, width, height, filename) + return filename + + demo = gr.Interface( + generate, + [ + gr.Image(label="First image"), + gr.Image(label="Last image"), + gr.Slider(1, 8, step=1, label="Times to interpolate", info="""Controls the number of times the frame interpolator is invoked. + The output will be the interpolation video with (2^value + 1) frames, fps of 30.""") + ], + gr.Video(), + examples=[[*IMAGES.values(), 5]], + allow_flagging="never" + ) + try: + demo.queue().launch(debug=False) + except Exception: + demo.queue().launch(share=True, debug=False) + # if you are launching remotely, specify server_name and server_port + # demo.launch(server_name='your server name', server_port='server port in int') + # Read more in the docs: https://gradio.app/docs/ diff --git a/docs/notebooks/269-film-slowmo-with-output_files/269-film-slowmo-with-output_14_0.png b/docs/notebooks/269-film-slowmo-with-output_files/269-film-slowmo-with-output_14_0.png new file mode 100644 index 00000000000000..89955bc7f4657b --- /dev/null +++ b/docs/notebooks/269-film-slowmo-with-output_files/269-film-slowmo-with-output_14_0.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ffe395ace584017e03ac0647bc97aec7c6d7f81b076b40f54ea78cd661cb0d16 +size 550928 diff --git a/docs/notebooks/269-film-slowmo-with-output_files/269-film-slowmo-with-output_29_0.png b/docs/notebooks/269-film-slowmo-with-output_files/269-film-slowmo-with-output_29_0.png new file mode 100644 index 00000000000000..393a5ee509b78e --- /dev/null +++ b/docs/notebooks/269-film-slowmo-with-output_files/269-film-slowmo-with-output_29_0.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:90a0af9b2bbb5e9128745d574203bedcc49f26f947cc7d9e68f996fd8f965083 +size 550615 diff --git a/docs/notebooks/269-film-slowmo-with-output_files/269-film-slowmo-with-output_7_0.png b/docs/notebooks/269-film-slowmo-with-output_files/269-film-slowmo-with-output_7_0.png new file mode 100644 index 00000000000000..90f4f7b294b140 --- /dev/null +++ b/docs/notebooks/269-film-slowmo-with-output_files/269-film-slowmo-with-output_7_0.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a680d786b21952ca28579b4c4a14ff255ce1c4b8132cb83e6a49ac2e93594ec7 +size 794884 diff --git a/docs/notebooks/269-film-slowmo-with-output_files/index.html b/docs/notebooks/269-film-slowmo-with-output_files/index.html new file mode 100644 index 00000000000000..48822d60d20385 --- /dev/null +++ b/docs/notebooks/269-film-slowmo-with-output_files/index.html @@ -0,0 +1,9 @@ + +Index of /projects/ov-notebook/0.1.0-latest/20231114220808/dist/rst_files/269-film-slowmo-with-output_files/ + +

Index of /projects/ov-notebook/0.1.0-latest/20231114220808/dist/rst_files/269-film-slowmo-with-output_files/


../
+269-film-slowmo-with-output_14_0.png               15-Nov-2023 00:43              550928
+269-film-slowmo-with-output_29_0.png               15-Nov-2023 00:43              550615
+269-film-slowmo-with-output_7_0.png                15-Nov-2023 00:43              794884
+

+ diff --git a/docs/notebooks/301-tensorflow-training-openvino-nncf-with-output.rst b/docs/notebooks/301-tensorflow-training-openvino-nncf-with-output.rst index c0b3fd60c271ef..425e0a521a2ec6 100644 --- a/docs/notebooks/301-tensorflow-training-openvino-nncf-with-output.rst +++ b/docs/notebooks/301-tensorflow-training-openvino-nncf-with-output.rst @@ -3,7 +3,7 @@ Post-Training Quantization with TensorFlow Classification Model This example demonstrates how to quantize the OpenVINO model that was created in `301-tensorflow-training-openvino -notebook <301-tensorflow-training-openvino.ipynb>`__, to improve +notebook <301-tensorflow-training-openvino-with-output.html>`__, to improve inference speed. Quantization is performed with `Post-training Quantization with NNCF `__. @@ -12,18 +12,25 @@ performance will be computed for the original IR model and the quantized model. **Table of contents:** ---- -- `Preparation <#preparation>`__ -- `Imports <#imports>`__ -- `Post-training Quantization with NNCF <#post-training-quantization-with-nncf>`__ -- `Select inference device <#select-inference-device>`__ -- `Compare Metrics <#compare-metrics>`__ -- `Run Inference on Quantized Model <#run-inference-on-quantized-model>`__ -- `Compare Inference Speed <#compare-inference-speed>`__ +- `Preparation <#preparation>`__ + + - `Imports <#imports>`__ + +- `Post-training Quantization with + NNCF <#post-training-quantization-with-nncf>`__ + + - `Select inference device <#select-inference-device>`__ + +- `Compare Metrics <#compare-metrics>`__ +- `Run Inference on Quantized + Model <#run-inference-on-quantized-model>`__ +- `Compare Inference Speed <#compare-inference-speed>`__ + +Preparation +----------- + -Preparation ------------------------------------------------------ The notebook requires that the training notebook has been run and that the Intermediate Representation (IR) models are created. If the IR @@ -49,10 +56,10 @@ notebook. This will take a while. .. parsed-literal:: - 2023-10-31 00:10:31.765486: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`. - 2023-10-31 00:10:31.799656: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. + 2023-11-15 00:16:49.889319: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`. + 2023-11-15 00:16:49.923536: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. - 2023-10-31 00:10:32.415064: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT + 2023-11-15 00:16:50.514602: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT .. parsed-literal:: @@ -67,8 +74,12 @@ notebook. This will take a while. .. parsed-literal:: - 2023-10-31 00:10:36.524645: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1960] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform. - Skipping registering GPU devices... + 2023-11-15 00:16:57.452170: E tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:266] failed call to cuInit: CUDA_ERROR_COMPAT_NOT_SUPPORTED_ON_DEVICE: forward compatibility was attempted on non supported HW + 2023-11-15 00:16:57.452208: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:168] retrieving CUDA diagnostic information for host: iotg-dev-workstation-07 + 2023-11-15 00:16:57.452212: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:175] hostname: iotg-dev-workstation-07 + 2023-11-15 00:16:57.452343: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:199] libcuda reported version is: 470.223.2 + 2023-11-15 00:16:57.452357: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:203] kernel reported version is: 470.182.3 + 2023-11-15 00:16:57.452361: E tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:312] kernel version 470.182.3 does not match DSO version 470.223.2 -- cannot find working devices in this configuration .. parsed-literal:: @@ -78,19 +89,47 @@ notebook. This will take a while. ['daisy', 'dandelion', 'roses', 'sunflowers', 'tulips'] +.. parsed-literal:: + + 2023-11-15 00:16:57.742287: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_0' with dtype string and shape [2936] + [[{{node Placeholder/_0}}]] + 2023-11-15 00:16:57.742572: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_0' with dtype string and shape [2936] + [[{{node Placeholder/_0}}]] -.. image:: 301-tensorflow-training-openvino-nncf-with-output_files/301-tensorflow-training-openvino-nncf-with-output_2_4.png + + +.. image:: 301-tensorflow-training-openvino-nncf-with-output_files/301-tensorflow-training-openvino-nncf-with-output_2_5.png + + +.. parsed-literal:: + + 2023-11-15 00:16:58.318525: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_0' with dtype string and shape [2936] + [[{{node Placeholder/_0}}]] + 2023-11-15 00:16:58.318753: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_4' with dtype int32 and shape [2936] + [[{{node Placeholder/_4}}]] + 2023-11-15 00:16:58.449065: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_4' with dtype int32 and shape [2936] + [[{{node Placeholder/_4}}]] + 2023-11-15 00:16:58.449324: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_4' with dtype int32 and shape [2936] + [[{{node Placeholder/_4}}]] .. parsed-literal:: (32, 180, 180, 3) (32,) - 0.0 0.99167764 + 0.02886732 1.0 + +.. parsed-literal:: + + 2023-11-15 00:16:59.254592: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_0' with dtype string and shape [2936] + [[{{node Placeholder/_0}}]] + 2023-11-15 00:16:59.254957: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_0' with dtype string and shape [2936] + [[{{node Placeholder/_0}}]] -.. image:: 301-tensorflow-training-openvino-nncf-with-output_files/301-tensorflow-training-openvino-nncf-with-output_2_6.png + +.. image:: 301-tensorflow-training-openvino-nncf-with-output_files/301-tensorflow-training-openvino-nncf-with-output_2_9.png .. parsed-literal:: @@ -105,18 +144,18 @@ notebook. This will take a while. conv2d_3 (Conv2D) (None, 180, 180, 16) 448 - max_pooling2d_3 (MaxPoolin (None, 90, 90, 16) 0 - g2D) + max_pooling2d_3 (MaxPooling (None, 90, 90, 16) 0 + 2D) conv2d_4 (Conv2D) (None, 90, 90, 32) 4640 - max_pooling2d_4 (MaxPoolin (None, 45, 45, 32) 0 - g2D) + max_pooling2d_4 (MaxPooling (None, 45, 45, 32) 0 + 2D) conv2d_5 (Conv2D) (None, 45, 45, 64) 18496 - max_pooling2d_5 (MaxPoolin (None, 22, 22, 64) 0 - g2D) + max_pooling2d_5 (MaxPooling (None, 22, 22, 64) 0 + 2D) dropout (Dropout) (None, 22, 22, 64) 0 @@ -127,50 +166,119 @@ notebook. This will take a while. outputs (Dense) (None, 5) 645 ================================================================= - Total params: 3989285 (15.22 MB) - Trainable params: 3989285 (15.22 MB) - Non-trainable params: 0 (0.00 Byte) + Total params: 3,989,285 + Trainable params: 3,989,285 + Non-trainable params: 0 _________________________________________________________________ Epoch 1/15 - 92/92 [==============================] - 6s 60ms/step - loss: 1.2926 - accuracy: 0.4435 - val_loss: 1.0857 - val_accuracy: 0.5327 + + +.. parsed-literal:: + + 2023-11-15 00:17:00.168317: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_0' with dtype string and shape [2936] + [[{{node Placeholder/_0}}]] + 2023-11-15 00:17:00.168869: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_4' with dtype int32 and shape [2936] + [[{{node Placeholder/_4}}]] + + +.. parsed-literal:: + + 92/92 [==============================] - ETA: 0s - loss: 1.3326 - accuracy: 0.4384 + +.. parsed-literal:: + + 2023-11-15 00:17:06.452362: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_4' with dtype int32 and shape [734] + [[{{node Placeholder/_4}}]] + 2023-11-15 00:17:06.452638: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_0' with dtype string and shape [734] + [[{{node Placeholder/_0}}]] + + +.. parsed-literal:: + + 92/92 [==============================] - 7s 66ms/step - loss: 1.3326 - accuracy: 0.4384 - val_loss: 1.0733 - val_accuracy: 0.5463 Epoch 2/15 - 92/92 [==============================] - 5s 57ms/step - loss: 1.0228 - accuracy: 0.5991 - val_loss: 0.9881 - val_accuracy: 0.6226 + 92/92 [==============================] - 6s 63ms/step - loss: 1.0268 - accuracy: 0.5923 - val_loss: 0.9584 - val_accuracy: 0.6294 Epoch 3/15 - 92/92 [==============================] - 5s 57ms/step - loss: 0.9082 - accuracy: 0.6519 - val_loss: 0.8962 - val_accuracy: 0.6526 + 92/92 [==============================] - 6s 63ms/step - loss: 0.9440 - accuracy: 0.6301 - val_loss: 0.9253 - val_accuracy: 0.6526 Epoch 4/15 - 92/92 [==============================] - 5s 57ms/step - loss: 0.8277 - accuracy: 0.6832 - val_loss: 0.9586 - val_accuracy: 0.6540 + 92/92 [==============================] - 6s 63ms/step - loss: 0.8713 - accuracy: 0.6635 - val_loss: 0.8400 - val_accuracy: 0.6730 Epoch 5/15 - 92/92 [==============================] - 5s 57ms/step - loss: 0.7965 - accuracy: 0.6853 - val_loss: 0.8849 - val_accuracy: 0.6689 + 92/92 [==============================] - 6s 63ms/step - loss: 0.8136 - accuracy: 0.6880 - val_loss: 0.8348 - val_accuracy: 0.6921 Epoch 6/15 - 92/92 [==============================] - 5s 57ms/step - loss: 0.7680 - accuracy: 0.7044 - val_loss: 0.7855 - val_accuracy: 0.6962 + 92/92 [==============================] - 6s 64ms/step - loss: 0.7706 - accuracy: 0.7067 - val_loss: 0.8327 - val_accuracy: 0.6662 Epoch 7/15 - 92/92 [==============================] - 5s 57ms/step - loss: 0.7319 - accuracy: 0.7292 - val_loss: 0.7772 - val_accuracy: 0.7016 + 92/92 [==============================] - 6s 63ms/step - loss: 0.7178 - accuracy: 0.7292 - val_loss: 0.8413 - val_accuracy: 0.6635 Epoch 8/15 - 92/92 [==============================] - 5s 57ms/step - loss: 0.6945 - accuracy: 0.7415 - val_loss: 0.7605 - val_accuracy: 0.7071 + 92/92 [==============================] - 6s 63ms/step - loss: 0.6965 - accuracy: 0.7302 - val_loss: 0.8255 - val_accuracy: 0.6689 Epoch 9/15 - 92/92 [==============================] - 5s 57ms/step - loss: 0.6561 - accuracy: 0.7490 - val_loss: 0.7764 - val_accuracy: 0.6948 + 92/92 [==============================] - 6s 64ms/step - loss: 0.6646 - accuracy: 0.7405 - val_loss: 0.7556 - val_accuracy: 0.7057 Epoch 10/15 - 92/92 [==============================] - 5s 57ms/step - loss: 0.6333 - accuracy: 0.7568 - val_loss: 0.7509 - val_accuracy: 0.7207 + 92/92 [==============================] - 6s 63ms/step - loss: 0.6365 - accuracy: 0.7619 - val_loss: 0.8055 - val_accuracy: 0.7030 Epoch 11/15 - 92/92 [==============================] - 5s 57ms/step - loss: 0.5991 - accuracy: 0.7766 - val_loss: 0.7724 - val_accuracy: 0.7153 + 92/92 [==============================] - 6s 63ms/step - loss: 0.6238 - accuracy: 0.7626 - val_loss: 0.7584 - val_accuracy: 0.7057 Epoch 12/15 - 92/92 [==============================] - 5s 57ms/step - loss: 0.5786 - accuracy: 0.7810 - val_loss: 0.7096 - val_accuracy: 0.7275 + 92/92 [==============================] - 6s 64ms/step - loss: 0.5788 - accuracy: 0.7888 - val_loss: 0.6973 - val_accuracy: 0.7425 Epoch 13/15 - 92/92 [==============================] - 5s 57ms/step - loss: 0.5741 - accuracy: 0.7858 - val_loss: 0.6902 - val_accuracy: 0.7384 + 92/92 [==============================] - 6s 63ms/step - loss: 0.5555 - accuracy: 0.7933 - val_loss: 0.7188 - val_accuracy: 0.7221 Epoch 14/15 - 92/92 [==============================] - 5s 57ms/step - loss: 0.5555 - accuracy: 0.7892 - val_loss: 0.7097 - val_accuracy: 0.7193 + 92/92 [==============================] - 6s 63ms/step - loss: 0.5301 - accuracy: 0.8031 - val_loss: 0.6915 - val_accuracy: 0.7262 Epoch 15/15 - 92/92 [==============================] - 5s 57ms/step - loss: 0.5330 - accuracy: 0.8038 - val_loss: 0.7023 - val_accuracy: 0.7289 + 92/92 [==============================] - 6s 64ms/step - loss: 0.5229 - accuracy: 0.8062 - val_loss: 0.6774 - val_accuracy: 0.7234 -.. image:: 301-tensorflow-training-openvino-nncf-with-output_files/301-tensorflow-training-openvino-nncf-with-output_2_8.png +.. image:: 301-tensorflow-training-openvino-nncf-with-output_files/301-tensorflow-training-openvino-nncf-with-output_2_15.png .. parsed-literal:: 1/1 [==============================] - 0s 71ms/step - This image most likely belongs to sunflowers with a 97.82 percent confidence. + This image most likely belongs to sunflowers with a 61.26 percent confidence. + + +.. parsed-literal:: + + 2023-11-15 00:18:29.410210: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'random_flip_input' with dtype float and shape [?,180,180,3] + [[{{node random_flip_input}}]] + 2023-11-15 00:18:29.495685: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'inputs' with dtype float and shape [?,180,180,3] + [[{{node inputs}}]] + 2023-11-15 00:18:29.505970: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'random_flip_input' with dtype float and shape [?,180,180,3] + [[{{node random_flip_input}}]] + 2023-11-15 00:18:29.516900: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'inputs' with dtype float and shape [?,180,180,3] + [[{{node inputs}}]] + 2023-11-15 00:18:29.523761: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'inputs' with dtype float and shape [?,180,180,3] + [[{{node inputs}}]] + 2023-11-15 00:18:29.530546: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'inputs' with dtype float and shape [?,180,180,3] + [[{{node inputs}}]] + 2023-11-15 00:18:29.541270: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'inputs' with dtype float and shape [?,180,180,3] + [[{{node inputs}}]] + 2023-11-15 00:18:29.580534: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'sequential_1_input' with dtype float and shape [?,180,180,3] + [[{{node sequential_1_input}}]] + 2023-11-15 00:18:29.646973: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'inputs' with dtype float and shape [?,180,180,3] + [[{{node inputs}}]] + 2023-11-15 00:18:29.667227: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'sequential_1_input' with dtype float and shape [?,180,180,3] + [[{{node sequential_1_input}}]] + 2023-11-15 00:18:29.706480: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'inputs' with dtype float and shape [?,22,22,64] + [[{{node inputs}}]] + 2023-11-15 00:18:29.731467: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'inputs' with dtype float and shape [?,180,180,3] + [[{{node inputs}}]] + 2023-11-15 00:18:29.804461: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'inputs' with dtype float and shape [?,180,180,3] + [[{{node inputs}}]] + 2023-11-15 00:18:29.947172: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'inputs' with dtype float and shape [?,180,180,3] + [[{{node inputs}}]] + 2023-11-15 00:18:30.084990: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'inputs' with dtype float and shape [?,22,22,64] + [[{{node inputs}}]] + 2023-11-15 00:18:30.118837: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'inputs' with dtype float and shape [?,180,180,3] + [[{{node inputs}}]] + 2023-11-15 00:18:30.147404: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'inputs' with dtype float and shape [?,180,180,3] + [[{{node inputs}}]] + 2023-11-15 00:18:30.195961: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'inputs' with dtype float and shape [?,180,180,3] + [[{{node inputs}}]] + WARNING:absl:Found untraced functions such as _jit_compiled_convolution_op, _jit_compiled_convolution_op, _jit_compiled_convolution_op, _update_step_xla while saving (showing 4 of 4). These functions will not be directly callable after loading. + + +.. parsed-literal:: + INFO:tensorflow:Assets written to: model/flower/saved_model/assets @@ -189,15 +297,17 @@ notebook. This will take a while. (1, 180, 180, 3) [1,180,180,3] - This image most likely belongs to dandelion with a 99.80 percent confidence. + This image most likely belongs to dandelion with a 98.99 percent confidence. + +.. image:: 301-tensorflow-training-openvino-nncf-with-output_files/301-tensorflow-training-openvino-nncf-with-output_2_22.png -.. image:: 301-tensorflow-training-openvino-nncf-with-output_files/301-tensorflow-training-openvino-nncf-with-output_2_13.png + +Imports +~~~~~~~ -Imports -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The Post Training Quantization API is implemented in the ``nncf`` library. @@ -223,8 +333,10 @@ library. INFO:nncf:NNCF initialized successfully. Supported frameworks detected: torch, tensorflow, onnx, openvino -Post-training Quantization with NNCF ------------------------------------------------------------------------------- +Post-training Quantization with NNCF +------------------------------------ + + `NNCF `__ provides a suite of advanced algorithms for Neural Networks inference optimization in @@ -264,6 +376,14 @@ The validation dataset already defined in the training notebook. +.. parsed-literal:: + + 2023-11-15 00:18:32.258873: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_4' with dtype int32 and shape [734] + [[{{node Placeholder/_4}}]] + 2023-11-15 00:18:32.259150: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_0' with dtype string and shape [734] + [[{{node Placeholder/_0}}]] + + The validation dataset can be reused in quantization process. But it returns a tuple (images, labels), whereas calibration_dataset should only return images. The transformation function helps to transform a @@ -307,8 +427,8 @@ control ` .. parsed-literal:: - Statistics collection: 73%|███████▎ | 734/1000 [00:04<00:01, 166.59it/s] - Applying Fast Bias correction: 100%|██████████| 5/5 [00:01<00:00, 3.92it/s] + Statistics collection: 73%|███████▎ | 734/1000 [00:04<00:01, 168.06it/s] + Applying Fast Bias correction: 100%|██████████| 5/5 [00:01<00:00, 3.98it/s] Save quantized model to benchmark. @@ -320,8 +440,10 @@ Save quantized model to benchmark. compressed_model_xml = compressed_model_dir / "flower_ir.xml" serialize(quantized_model, str(compressed_model_xml)) -Select inference device -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Select inference device +~~~~~~~~~~~~~~~~~~~~~~~ + + select device from dropdown list for running inference using OpenVINO @@ -347,8 +469,10 @@ select device from dropdown list for running inference using OpenVINO -Compare Metrics ---------------------------------------------------------- +Compare Metrics +--------------- + + Define a metric to determine the performance of the model. @@ -398,8 +522,8 @@ Calculate accuracy for the original model and the quantized model. .. parsed-literal:: - Accuracy of the original model: 0.729 - Accuracy of the quantized model: 0.730 + Accuracy of the original model: 0.723 + Accuracy of the quantized model: 0.729 Compare file size of the models. @@ -422,8 +546,10 @@ Compare file size of the models. So, we can see that the original and quantized models have similar accuracy with a much smaller size of the quantized model. -Run Inference on Quantized Model --------------------------------------------------------------------------- +Run Inference on Quantized Model +-------------------------------- + + Copy the preprocess function from the training notebook and run inference on the quantized model with Inference Engine. See the @@ -491,15 +617,17 @@ Python API. 'output/A_Close_Up_Photo_of_a_Dandelion.jpg' already exists. input image shape: (1, 180, 180, 3) input layer shape: [1,180,180,3] - This image most likely belongs to dandelion with a 99.79 percent confidence. + This image most likely belongs to dandelion with a 98.94 percent confidence. .. image:: 301-tensorflow-training-openvino-nncf-with-output_files/301-tensorflow-training-openvino-nncf-with-output_26_1.png -Compare Inference Speed ------------------------------------------------------------------ +Compare Inference Speed +----------------------- + + Measure inference speed with the `OpenVINO Benchmark App `__. @@ -565,7 +693,7 @@ measured for CPU+GPU as well. The number of seconds is set to 15. [ WARNING ] Performance hint was not explicitly specified in command line. Device(CPU) performance hint will be set to PerformanceMode.THROUGHPUT. [Step 4/11] Reading model files [ INFO ] Loading model files - [ INFO ] Read model took 12.32 ms + [ INFO ] Read model took 13.80 ms [ INFO ] Original model I/O parameters: [ INFO ] Model inputs: [ INFO ] sequential_1_input (node: sequential_1_input) : f32 / [...] / [1,180,180,3] @@ -579,7 +707,7 @@ measured for CPU+GPU as well. The number of seconds is set to 15. [ INFO ] Model outputs: [ INFO ] outputs (node: sequential_2/outputs/BiasAdd) : f32 / [...] / [1,5] [Step 7/11] Loading the model to the device - [ INFO ] Compile model took 61.75 ms + [ INFO ] Compile model took 63.65 ms [Step 8/11] Querying optimal runtime parameters [ INFO ] Model: [ INFO ] NETWORK_NAME: TensorFlow_Frontend_IR @@ -603,17 +731,17 @@ measured for CPU+GPU as well. The number of seconds is set to 15. [ INFO ] Fill input 'sequential_1_input' with random values [Step 10/11] Measuring performance (Start inference asynchronously, 12 inference requests, limits: 15000 ms duration) [ INFO ] Benchmarking in inference only mode (inputs filling are not included in measurement loop). - [ INFO ] First inference took 10.01 ms + [ INFO ] First inference took 7.17 ms [Step 11/11] Dumping statistics report [ INFO ] Execution Devices:['CPU'] - [ INFO ] Count: 58152 iterations - [ INFO ] Duration: 15004.57 ms + [ INFO ] Count: 57636 iterations + [ INFO ] Duration: 15005.09 ms [ INFO ] Latency: - [ INFO ] Median: 2.90 ms - [ INFO ] Average: 2.91 ms - [ INFO ] Min: 2.15 ms - [ INFO ] Max: 11.75 ms - [ INFO ] Throughput: 3875.62 FPS + [ INFO ] Median: 2.92 ms + [ INFO ] Average: 2.93 ms + [ INFO ] Min: 1.86 ms + [ INFO ] Max: 11.74 ms + [ INFO ] Throughput: 3841.10 FPS .. code:: ipython3 @@ -639,7 +767,7 @@ measured for CPU+GPU as well. The number of seconds is set to 15. [ WARNING ] Performance hint was not explicitly specified in command line. Device(CPU) performance hint will be set to PerformanceMode.THROUGHPUT. [Step 4/11] Reading model files [ INFO ] Loading model files - [ INFO ] Read model took 13.50 ms + [ INFO ] Read model took 13.40 ms [ INFO ] Original model I/O parameters: [ INFO ] Model inputs: [ INFO ] sequential_1_input (node: sequential_1_input) : f32 / [...] / [1,180,180,3] @@ -653,7 +781,7 @@ measured for CPU+GPU as well. The number of seconds is set to 15. [ INFO ] Model outputs: [ INFO ] outputs (node: sequential_2/outputs/BiasAdd) : f32 / [...] / [1,5] [Step 7/11] Loading the model to the device - [ INFO ] Compile model took 57.59 ms + [ INFO ] Compile model took 67.51 ms [Step 8/11] Querying optimal runtime parameters [ INFO ] Model: [ INFO ] NETWORK_NAME: TensorFlow_Frontend_IR @@ -677,17 +805,17 @@ measured for CPU+GPU as well. The number of seconds is set to 15. [ INFO ] Fill input 'sequential_1_input' with random values [Step 10/11] Measuring performance (Start inference asynchronously, 12 inference requests, limits: 15000 ms duration) [ INFO ] Benchmarking in inference only mode (inputs filling are not included in measurement loop). - [ INFO ] First inference took 1.99 ms + [ INFO ] First inference took 2.16 ms [Step 11/11] Dumping statistics report [ INFO ] Execution Devices:['CPU'] - [ INFO ] Count: 178968 iterations - [ INFO ] Duration: 15001.10 ms + [ INFO ] Count: 179388 iterations + [ INFO ] Duration: 15001.74 ms [ INFO ] Latency: [ INFO ] Median: 0.92 ms [ INFO ] Average: 0.92 ms - [ INFO ] Min: 0.58 ms - [ INFO ] Max: 6.18 ms - [ INFO ] Throughput: 11930.32 FPS + [ INFO ] Min: 0.57 ms + [ INFO ] Max: 6.92 ms + [ INFO ] Throughput: 11957.81 FPS **Benchmark on MULTI:CPU,GPU** @@ -757,14 +885,14 @@ cached to the ``model_cache`` directory. .. parsed-literal:: - [ INFO ] Count: 58680 iterations - [ INFO ] Duration: 15004.60 ms + [ INFO ] Count: 57144 iterations + [ INFO ] Duration: 15003.56 ms [ INFO ] Latency: - [ INFO ] Median: 2.88 ms - [ INFO ] Average: 2.87 ms - [ INFO ] Min: 2.00 ms - [ INFO ] Max: 11.71 ms - [ INFO ] Throughput: 3910.80 FPS + [ INFO ] Median: 2.94 ms + [ INFO ] Average: 2.95 ms + [ INFO ] Min: 1.48 ms + [ INFO ] Max: 11.21 ms + [ INFO ] Throughput: 3808.70 FPS **Quantized IR model - CPU** @@ -779,14 +907,14 @@ cached to the ``model_cache`` directory. .. parsed-literal:: - [ INFO ] Count: 179220 iterations - [ INFO ] Duration: 15000.72 ms + [ INFO ] Count: 178968 iterations + [ INFO ] Duration: 15001.62 ms [ INFO ] Latency: [ INFO ] Median: 0.92 ms [ INFO ] Average: 0.92 ms - [ INFO ] Min: 0.56 ms - [ INFO ] Max: 6.53 ms - [ INFO ] Throughput: 11947.42 FPS + [ INFO ] Min: 0.57 ms + [ INFO ] Max: 6.34 ms + [ INFO ] Throughput: 11929.91 FPS **Original IR model - MULTI:CPU,GPU** diff --git a/docs/notebooks/301-tensorflow-training-openvino-nncf-with-output_files/301-tensorflow-training-openvino-nncf-with-output_2_15.png b/docs/notebooks/301-tensorflow-training-openvino-nncf-with-output_files/301-tensorflow-training-openvino-nncf-with-output_2_15.png new file mode 100644 index 00000000000000..09061d0705b33c --- /dev/null +++ b/docs/notebooks/301-tensorflow-training-openvino-nncf-with-output_files/301-tensorflow-training-openvino-nncf-with-output_2_15.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a41bbc2b0309358b862a4c796fd2954d9ae2f12202b02f4fb6a31f7710364434 +size 56373 diff --git a/docs/notebooks/301-tensorflow-training-openvino-nncf-with-output_files/301-tensorflow-training-openvino-nncf-with-output_2_13.png b/docs/notebooks/301-tensorflow-training-openvino-nncf-with-output_files/301-tensorflow-training-openvino-nncf-with-output_2_22.png similarity index 100% rename from docs/notebooks/301-tensorflow-training-openvino-nncf-with-output_files/301-tensorflow-training-openvino-nncf-with-output_2_13.png rename to docs/notebooks/301-tensorflow-training-openvino-nncf-with-output_files/301-tensorflow-training-openvino-nncf-with-output_2_22.png diff --git a/docs/notebooks/301-tensorflow-training-openvino-nncf-with-output_files/301-tensorflow-training-openvino-nncf-with-output_2_4.png b/docs/notebooks/301-tensorflow-training-openvino-nncf-with-output_files/301-tensorflow-training-openvino-nncf-with-output_2_5.png similarity index 100% rename from docs/notebooks/301-tensorflow-training-openvino-nncf-with-output_files/301-tensorflow-training-openvino-nncf-with-output_2_4.png rename to docs/notebooks/301-tensorflow-training-openvino-nncf-with-output_files/301-tensorflow-training-openvino-nncf-with-output_2_5.png diff --git a/docs/notebooks/301-tensorflow-training-openvino-nncf-with-output_files/301-tensorflow-training-openvino-nncf-with-output_2_6.png b/docs/notebooks/301-tensorflow-training-openvino-nncf-with-output_files/301-tensorflow-training-openvino-nncf-with-output_2_6.png deleted file mode 100644 index a911275a32ea15..00000000000000 --- a/docs/notebooks/301-tensorflow-training-openvino-nncf-with-output_files/301-tensorflow-training-openvino-nncf-with-output_2_6.png +++ /dev/null @@ -1,3 +0,0 @@ -version https://git-lfs.github.com/spec/v1 -oid sha256:8302cee83409eb402c9f8c3cb9cf9fbe23040c150441df5bef3dfe53ce1fd2b4 -size 1023105 diff --git a/docs/notebooks/301-tensorflow-training-openvino-nncf-with-output_files/301-tensorflow-training-openvino-nncf-with-output_2_8.png b/docs/notebooks/301-tensorflow-training-openvino-nncf-with-output_files/301-tensorflow-training-openvino-nncf-with-output_2_8.png deleted file mode 100644 index 9fc7aaa2c7d523..00000000000000 --- a/docs/notebooks/301-tensorflow-training-openvino-nncf-with-output_files/301-tensorflow-training-openvino-nncf-with-output_2_8.png +++ /dev/null @@ -1,3 +0,0 @@ -version https://git-lfs.github.com/spec/v1 -oid sha256:9e712ca63dc0da92137baa0d0f1e414932bccd22cfe332c0864c16ec2a58c034 -size 56298 diff --git a/docs/notebooks/301-tensorflow-training-openvino-nncf-with-output_files/301-tensorflow-training-openvino-nncf-with-output_2_9.png b/docs/notebooks/301-tensorflow-training-openvino-nncf-with-output_files/301-tensorflow-training-openvino-nncf-with-output_2_9.png new file mode 100644 index 00000000000000..46b1acbeb283ff --- /dev/null +++ b/docs/notebooks/301-tensorflow-training-openvino-nncf-with-output_files/301-tensorflow-training-openvino-nncf-with-output_2_9.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:80161c74ba6c2ea386afa3780898f843a83bda54e5461744765c0044ebed1db2 +size 1042692 diff --git a/docs/notebooks/301-tensorflow-training-openvino-nncf-with-output_files/index.html b/docs/notebooks/301-tensorflow-training-openvino-nncf-with-output_files/index.html index eb2637e5c8ce5c..5860170090f90f 100644 --- a/docs/notebooks/301-tensorflow-training-openvino-nncf-with-output_files/index.html +++ b/docs/notebooks/301-tensorflow-training-openvino-nncf-with-output_files/index.html @@ -1,11 +1,11 @@ -Index of /projects/ov-notebook/0.1.0-latest/20231030220807/dist/rst_files/301-tensorflow-training-openvino-nncf-with-output_files/ +Index of /projects/ov-notebook/0.1.0-latest/20231114220808/dist/rst_files/301-tensorflow-training-openvino-nncf-with-output_files/ -

Index of /projects/ov-notebook/0.1.0-latest/20231030220807/dist/rst_files/301-tensorflow-training-openvino-nncf-with-output_files/


../
-301-tensorflow-training-openvino-nncf-with-outp..> 31-Oct-2023 00:35              143412
-301-tensorflow-training-openvino-nncf-with-outp..> 31-Oct-2023 00:35              143412
-301-tensorflow-training-openvino-nncf-with-outp..> 31-Oct-2023 00:35              941151
-301-tensorflow-training-openvino-nncf-with-outp..> 31-Oct-2023 00:35             1023105
-301-tensorflow-training-openvino-nncf-with-outp..> 31-Oct-2023 00:35               56298
+

Index of /projects/ov-notebook/0.1.0-latest/20231114220808/dist/rst_files/301-tensorflow-training-openvino-nncf-with-output_files/


../
+301-tensorflow-training-openvino-nncf-with-outp..> 15-Nov-2023 00:43              143412
+301-tensorflow-training-openvino-nncf-with-outp..> 15-Nov-2023 00:43               56373
+301-tensorflow-training-openvino-nncf-with-outp..> 15-Nov-2023 00:43              143412
+301-tensorflow-training-openvino-nncf-with-outp..> 15-Nov-2023 00:43              941151
+301-tensorflow-training-openvino-nncf-with-outp..> 15-Nov-2023 00:43             1042692
 

diff --git a/docs/notebooks/301-tensorflow-training-openvino-with-output.rst b/docs/notebooks/301-tensorflow-training-openvino-with-output.rst deleted file mode 100644 index 657b6a1cd7bc48..00000000000000 --- a/docs/notebooks/301-tensorflow-training-openvino-with-output.rst +++ /dev/null @@ -1,941 +0,0 @@ -From Training to Deployment with TensorFlow and OpenVINO™ -========================================================= - -**Table of contents:** - - -- `TensorFlow Image Classification - Training <#tensorflow-image-classification-training>`__ -- `Import TensorFlow and Other - Libraries <#import-tensorflow-and-other-libraries>`__ -- `Download and Explore the - Dataset <#download-and-explore-the-dataset>`__ -- `Load Using - keras.preprocessing <#load-using-keraspreprocessing>`__ -- `Create a Dataset <#create-a-dataset>`__ -- `Visualize the Data <#visualize-the-data>`__ -- `Configure the Dataset for - Performance <#configure-the-dataset-for-performance>`__ -- `Standardize the Data <#standardize-the-data>`__ -- `Create the Model <#create-the-model>`__ -- `Compile the Model <#compile-the-model>`__ -- `Model Summary <#model-summary>`__ -- `Train the Model <#train-the-model>`__ -- `Visualize Training Results <#visualize-training-results>`__ -- `Overfitting <#overfitting>`__ -- `Data Augmentation <#data-augmentation>`__ -- `Dropout <#dropout>`__ -- `Compile and Train the - Model <#compile-and-train-the-model>`__ -- `Visualize Training Results <#visualize-training-results>`__ -- `Predict on New Data <#predict-on-new-data>`__ -- `Save the TensorFlow Model <#save-the-tensorflow-model>`__ -- `Convert the TensorFlow model with OpenVINO Model Conversion - API <#convert-the-tensorflow-model-with-openvino-model-conversion-api>`__ -- `Preprocessing Image - Function <#preprocessing-image-function>`__ -- `OpenVINO Runtime Setup <#openvino-runtime-setup>`__ - - - `Select inference device <#select-inference-device>`__ - -- `Run the Inference Step <#run-the-inference-step>`__ -- `The Next Steps <#the-next-steps>`__ - -.. code:: ipython3 - - # @title Licensed under the Apache License, Version 2.0 (the "License"); - # you may not use this file except in compliance with the License. - # You may obtain a copy of the License at - # - # https://www.apache.org/licenses/LICENSE-2.0 - # - # Unless required by applicable law or agreed to in writing, software - # distributed under the License is distributed on an "AS IS" BASIS, - # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - # See the License for the specific language governing permissions and - # limitations under the License. - - # Copyright 2018 The TensorFlow Authors - # - # Modified for OpenVINO Notebooks - -This tutorial demonstrates how to train, convert, and deploy an image -classification model with TensorFlow and OpenVINO. This particular -notebook shows the process where we perform the inference step on the -freshly trained model that is converted to OpenVINO IR with model -conversion API. For faster inference speed on the model created in this -notebook, check out the `Post-Training Quantization with TensorFlow -Classification Model <./301-tensorflow-training-openvino-nncf.ipynb>`__ -notebook. - -This training code comprises the official `TensorFlow Image -Classification -Tutorial `__ -in its entirety. - -The ``flower_ir.bin`` and ``flower_ir.xml`` (pre-trained models) can be -obtained by executing the code with ‘Runtime->Run All’ or the -``Ctrl+F9`` command. - -.. code:: ipython3 - - %pip install -q "openvino>=2023.1.0" - - -.. parsed-literal:: - - DEPRECATION: pytorch-lightning 1.6.5 has a non-standard dependency specifier torch>=1.8.*. pip 24.0 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of pytorch-lightning or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063 - Note: you may need to restart the kernel to use updated packages. - - -TensorFlow Image Classification Training ----------------------------------------------------------------------------------- - -The first part of the tutorial shows how to classify images of flowers -(based on the TensorFlow’s official tutorial). It creates an image -classifier using a ``keras.Sequential`` model, and loads data using -``preprocessing.image_dataset_from_directory``. You will gain practical -experience with the following concepts: - -- Efficiently loading a dataset off disk. -- Identifying overfitting and applying techniques to mitigate it, - including data augmentation and Dropout. - -This tutorial follows a basic machine learning workflow: - -1. Examine and understand data -2. Build an input pipeline -3. Build the model -4. Train the model -5. Test the model - -Import TensorFlow and Other Libraries -------------------------------------------------------------------------------- - -.. code:: ipython3 - - import os - import sys - from pathlib import Path - - import PIL - import matplotlib.pyplot as plt - import numpy as np - import tensorflow as tf - from PIL import Image - import openvino as ov - from tensorflow import keras - from tensorflow.keras import layers - from tensorflow.keras.models import Sequential - - sys.path.append("../utils") - from notebook_utils import download_file - - -.. parsed-literal:: - - 2023-10-31 00:13:25.408072: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`. - 2023-10-31 00:13:25.442949: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. - To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. - 2023-10-31 00:13:25.953408: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT - - -Download and Explore the Dataset --------------------------------------------------------------------------- - -This tutorial uses a dataset of about 3,700 photos of flowers. The -dataset contains 5 sub-directories, one per class: - -:: - - flower_photo/ - daisy/ - dandelion/ - roses/ - sunflowers/ - tulips/ - -.. code:: ipython3 - - import pathlib - dataset_url = "https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz" - data_dir = tf.keras.utils.get_file('flower_photos', origin=dataset_url, untar=True) - data_dir = pathlib.Path(data_dir) - -After downloading, you should now have a copy of the dataset available. -There are 3,670 total images: - -.. code:: ipython3 - - image_count = len(list(data_dir.glob('*/*.jpg'))) - print(image_count) - - -.. parsed-literal:: - - 3670 - - -Here are some roses: - -.. code:: ipython3 - - roses = list(data_dir.glob('roses/*')) - PIL.Image.open(str(roses[0])) - - - - -.. image:: 301-tensorflow-training-openvino-with-output_files/301-tensorflow-training-openvino-with-output_14_0.png - - - -.. code:: ipython3 - - PIL.Image.open(str(roses[1])) - - - - -.. image:: 301-tensorflow-training-openvino-with-output_files/301-tensorflow-training-openvino-with-output_15_0.png - - - -And some tulips: - -.. code:: ipython3 - - tulips = list(data_dir.glob('tulips/*')) - PIL.Image.open(str(tulips[0])) - - - - -.. image:: 301-tensorflow-training-openvino-with-output_files/301-tensorflow-training-openvino-with-output_17_0.png - - - -.. code:: ipython3 - - PIL.Image.open(str(tulips[1])) - - - - -.. image:: 301-tensorflow-training-openvino-with-output_files/301-tensorflow-training-openvino-with-output_18_0.png - - - -Load Using keras.preprocessing ------------------------------------------------------------------------- - -Let’s load these images off disk using the helpful -`image_dataset_from_directory `__ -utility. This will take you from a directory of images on disk to a -``tf.data.Dataset`` in just a couple lines of code. If you like, you can -also write your own data loading code from scratch by visiting the `load -images `__ -tutorial. - -Create a Dataset ----------------------------------------------------------- - -Define some parameters for the loader: - -.. code:: ipython3 - - batch_size = 32 - img_height = 180 - img_width = 180 - -It’s good practice to use a validation split when developing your model. -Let’s use 80% of the images for training, and 20% for validation. - -.. code:: ipython3 - - train_ds = tf.keras.preprocessing.image_dataset_from_directory( - data_dir, - validation_split=0.2, - subset="training", - seed=123, - image_size=(img_height, img_width), - batch_size=batch_size) - - -.. parsed-literal:: - - Found 3670 files belonging to 5 classes. - Using 2936 files for training. - - -.. parsed-literal:: - - 2023-10-31 00:13:27.260838: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1960] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform. - Skipping registering GPU devices... - - -.. code:: ipython3 - - val_ds = tf.keras.preprocessing.image_dataset_from_directory( - data_dir, - validation_split=0.2, - subset="validation", - seed=123, - image_size=(img_height, img_width), - batch_size=batch_size) - - -.. parsed-literal:: - - Found 3670 files belonging to 5 classes. - Using 734 files for validation. - - -You can find the class names in the ``class_names`` attribute on these -datasets. These correspond to the directory names in alphabetical order. - -.. code:: ipython3 - - class_names = train_ds.class_names - print(class_names) - - -.. parsed-literal:: - - ['daisy', 'dandelion', 'roses', 'sunflowers', 'tulips'] - - -Visualize the Data ------------------------------------------------------------- - -Here are the first 9 images from the training dataset. - -.. code:: ipython3 - - plt.figure(figsize=(10, 10)) - for images, labels in train_ds.take(1): - for i in range(9): - ax = plt.subplot(3, 3, i + 1) - plt.imshow(images[i].numpy().astype("uint8")) - plt.title(class_names[labels[i]]) - plt.axis("off") - - - -.. image:: 301-tensorflow-training-openvino-with-output_files/301-tensorflow-training-openvino-with-output_29_0.png - - -You will train a model using these datasets by passing them to -``model.fit`` in a moment. If you like, you can also manually iterate -over the dataset and retrieve batches of images: - -.. code:: ipython3 - - for image_batch, labels_batch in train_ds: - print(image_batch.shape) - print(labels_batch.shape) - break - - -.. parsed-literal:: - - (32, 180, 180, 3) - (32,) - - -The ``image_batch`` is a tensor of the shape ``(32, 180, 180, 3)``. This -is a batch of 32 images of shape ``180x180x3`` (the last dimension -refers to color channels RGB). The ``label_batch`` is a tensor of the -shape ``(32,)``, these are corresponding labels to the 32 images. - -You can call ``.numpy()`` on the ``image_batch`` and ``labels_batch`` -tensors to convert them to a ``numpy.ndarray``. - -Configure the Dataset for Performance -------------------------------------------------------------------------------- - -Let’s make sure to use buffered prefetching so you can yield data from -disk without having I/O become blocking. These are two important methods -you should use when loading data. - -``Dataset.cache()`` keeps the images in memory after they’re loaded off -disk during the first epoch. This will ensure the dataset does not -become a bottleneck while training your model. If your dataset is too -large to fit into memory, you can also use this method to create a -performant on-disk cache. - -``Dataset.prefetch()`` overlaps data preprocessing and model execution -while training. - -Interested readers can learn more about both methods, as well as how to -cache data to disk in the `data performance -guide `__. - -.. code:: ipython3 - - AUTOTUNE = tf.data.AUTOTUNE - train_ds = train_ds.cache().shuffle(1000).prefetch(buffer_size=AUTOTUNE) - val_ds = val_ds.cache().prefetch(buffer_size=AUTOTUNE) - -Standardize the Data --------------------------------------------------------------- - -The RGB channel values are in the ``[0, 255]`` range. This is not ideal -for a neural network; in general you should seek to make your input -values small. Here, you will standardize values to be in the ``[0, 1]`` -range by using a Rescaling layer. - -.. code:: ipython3 - - normalization_layer = layers.Rescaling(1./255) - -Note: The Keras Preprocessing utilities and layers introduced in this -section are currently experimental and may change. - -There are two ways to use this layer. You can apply it to the dataset by -calling map: - -.. code:: ipython3 - - normalized_ds = train_ds.map(lambda x, y: (normalization_layer(x), y)) - image_batch, labels_batch = next(iter(normalized_ds)) - first_image = image_batch[0] - # Notice the pixels values are now in `[0,1]`. - print(np.min(first_image), np.max(first_image)) - - -.. parsed-literal:: - - 0.0 1.0 - - -Or, you can include the layer inside your model definition, which can -simplify deployment. Let’s use the second approach here. - -Note: you previously resized images using the ``image_size`` argument of -``image_dataset_from_directory``. If you want to include the resizing -logic in your model as well, you can use the -`Resizing `__ -layer. - -Create the Model ----------------------------------------------------------- - -The model consists of three convolution blocks with a max pool layer in -each of them. There’s a fully connected layer with 128 units on top of -it that is activated by a ``relu`` activation function. This model has -not been tuned for high accuracy, the goal of this tutorial is to show a -standard approach. - -.. code:: ipython3 - - num_classes = 5 - - model = Sequential([ - layers.experimental.preprocessing.Rescaling(1./255, input_shape=(img_height, img_width, 3)), - layers.Conv2D(16, 3, padding='same', activation='relu'), - layers.MaxPooling2D(), - layers.Conv2D(32, 3, padding='same', activation='relu'), - layers.MaxPooling2D(), - layers.Conv2D(64, 3, padding='same', activation='relu'), - layers.MaxPooling2D(), - layers.Flatten(), - layers.Dense(128, activation='relu'), - layers.Dense(num_classes) - ]) - -Compile the Model ------------------------------------------------------------ - -For this tutorial, choose the ``optimizers.Adam`` optimizer and -``losses.SparseCategoricalCrossentropy`` loss function. To view training -and validation accuracy for each training epoch, pass the ``metrics`` -argument. - -.. code:: ipython3 - - model.compile(optimizer='adam', - loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), - metrics=['accuracy']) - -Model Summary -------------------------------------------------------- - -View all the layers of the network using the model’s ``summary`` method. - - **NOTE:** This section is commented out for performance reasons. - Please feel free to uncomment these to compare the results. - -.. code:: ipython3 - - # model.summary() - -Train the Model ---------------------------------------------------------- - -.. code:: ipython3 - - # epochs=10 - # history = model.fit( - # train_ds, - # validation_data=val_ds, - # epochs=epochs - # ) - -Visualize Training Results --------------------------------------------------------------------- - -Create plots of loss and accuracy on the training and validation sets. - -.. code:: ipython3 - - # acc = history.history['accuracy'] - # val_acc = history.history['val_accuracy'] - - # loss = history.history['loss'] - # val_loss = history.history['val_loss'] - - # epochs_range = range(epochs) - - # plt.figure(figsize=(8, 8)) - # plt.subplot(1, 2, 1) - # plt.plot(epochs_range, acc, label='Training Accuracy') - # plt.plot(epochs_range, val_acc, label='Validation Accuracy') - # plt.legend(loc='lower right') - # plt.title('Training and Validation Accuracy') - - # plt.subplot(1, 2, 2) - # plt.plot(epochs_range, loss, label='Training Loss') - # plt.plot(epochs_range, val_loss, label='Validation Loss') - # plt.legend(loc='upper right') - # plt.title('Training and Validation Loss') - # plt.show() - -As you can see from the plots, training accuracy and validation accuracy -are off by large margin and the model has achieved only around 60% -accuracy on the validation set. - -Let’s look at what went wrong and try to increase the overall -performance of the model. - -Overfitting ------------------------------------------------------ - -In the plots above, the training accuracy is increasing linearly over -time, whereas validation accuracy stalls around 60% in the training -process. Also, the difference in accuracy between training and -validation accuracy is noticeable — a sign of -`overfitting `__. - -When there are a small number of training examples, the model sometimes -learns from noises or unwanted details from training examples—to an -extent that it negatively impacts the performance of the model on new -examples. This phenomenon is known as overfitting. It means that the -model will have a difficult time generalizing on a new dataset. - -There are multiple ways to fight overfitting in the training process. In -this tutorial, you’ll use *data augmentation* and add *Dropout* to your -model. - -Data Augmentation ------------------------------------------------------------ - -Overfitting generally occurs when there are a small number of training -examples. `Data -augmentation `__ -takes the approach of generating additional training data from your -existing examples by augmenting them using random transformations that -yield believable-looking images. This helps expose the model to more -aspects of the data and generalize better. - -You will implement data augmentation using the layers from -``tf.keras.layers.experimental.preprocessing``. These can be included -inside your model like other layers, and run on the GPU. - -.. code:: ipython3 - - data_augmentation = keras.Sequential( - [ - layers.RandomFlip("horizontal", - input_shape=(img_height, - img_width, - 3)), - layers.RandomRotation(0.1), - layers.RandomZoom(0.1), - ] - ) - -Let’s visualize what a few augmented examples look like by applying data -augmentation to the same image several times: - -.. code:: ipython3 - - plt.figure(figsize=(10, 10)) - for images, _ in train_ds.take(1): - for i in range(9): - augmented_images = data_augmentation(images) - ax = plt.subplot(3, 3, i + 1) - plt.imshow(augmented_images[0].numpy().astype("uint8")) - plt.axis("off") - - - -.. image:: 301-tensorflow-training-openvino-with-output_files/301-tensorflow-training-openvino-with-output_57_0.png - - -You will use data augmentation to train a model in a moment. - -Dropout -------------------------------------------------- - -Another technique to reduce overfitting is to introduce -`Dropout `__ -to the network, a form of *regularization*. - -When you apply Dropout to a layer it randomly drops out (by setting the -activation to zero) a number of output units from the layer during the -training process. Dropout takes a fractional number as its input value, -in the form such as 0.1, 0.2, 0.4, etc. This means dropping out 10%, 20% -or 40% of the output units randomly from the applied layer. - -Let’s create a new neural network using ``layers.Dropout``, then train -it using augmented images. - -.. code:: ipython3 - - model = Sequential([ - data_augmentation, - layers.Rescaling(1./255), - layers.Conv2D(16, 3, padding='same', activation='relu'), - layers.MaxPooling2D(), - layers.Conv2D(32, 3, padding='same', activation='relu'), - layers.MaxPooling2D(), - layers.Conv2D(64, 3, padding='same', activation='relu'), - layers.MaxPooling2D(), - layers.Dropout(0.2), - layers.Flatten(), - layers.Dense(128, activation='relu'), - layers.Dense(num_classes, name="outputs") - ]) - -Compile and Train the Model ---------------------------------------------------------------------- - -.. code:: ipython3 - - model.compile(optimizer='adam', - loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), - metrics=['accuracy']) - -.. code:: ipython3 - - model.summary() - - -.. parsed-literal:: - - Model: "sequential_2" - _________________________________________________________________ - Layer (type) Output Shape Param # - ================================================================= - sequential_1 (Sequential) (None, 180, 180, 3) 0 - - rescaling_2 (Rescaling) (None, 180, 180, 3) 0 - - conv2d_3 (Conv2D) (None, 180, 180, 16) 448 - - max_pooling2d_3 (MaxPoolin (None, 90, 90, 16) 0 - g2D) - - conv2d_4 (Conv2D) (None, 90, 90, 32) 4640 - - max_pooling2d_4 (MaxPoolin (None, 45, 45, 32) 0 - g2D) - - conv2d_5 (Conv2D) (None, 45, 45, 64) 18496 - - max_pooling2d_5 (MaxPoolin (None, 22, 22, 64) 0 - g2D) - - dropout (Dropout) (None, 22, 22, 64) 0 - - flatten_1 (Flatten) (None, 30976) 0 - - dense_2 (Dense) (None, 128) 3965056 - - outputs (Dense) (None, 5) 645 - - ================================================================= - Total params: 3989285 (15.22 MB) - Trainable params: 3989285 (15.22 MB) - Non-trainable params: 0 (0.00 Byte) - _________________________________________________________________ - - -.. code:: ipython3 - - epochs = 15 - history = model.fit( - train_ds, - validation_data=val_ds, - epochs=epochs - ) - - -.. parsed-literal:: - - Epoch 1/15 - 92/92 [==============================] - 6s 60ms/step - loss: 1.2433 - accuracy: 0.4673 - val_loss: 1.1335 - val_accuracy: 0.5627 - Epoch 2/15 - 92/92 [==============================] - 5s 57ms/step - loss: 1.0251 - accuracy: 0.5974 - val_loss: 0.9890 - val_accuracy: 0.5995 - Epoch 3/15 - 92/92 [==============================] - 5s 57ms/step - loss: 0.9141 - accuracy: 0.6451 - val_loss: 0.8673 - val_accuracy: 0.6580 - Epoch 4/15 - 92/92 [==============================] - 5s 58ms/step - loss: 0.8439 - accuracy: 0.6829 - val_loss: 0.8107 - val_accuracy: 0.6798 - Epoch 5/15 - 92/92 [==============================] - 5s 57ms/step - loss: 0.7845 - accuracy: 0.6962 - val_loss: 0.8639 - val_accuracy: 0.6798 - Epoch 6/15 - 92/92 [==============================] - 5s 58ms/step - loss: 0.7458 - accuracy: 0.7231 - val_loss: 0.7516 - val_accuracy: 0.7125 - Epoch 7/15 - 92/92 [==============================] - 5s 57ms/step - loss: 0.7045 - accuracy: 0.7299 - val_loss: 0.7731 - val_accuracy: 0.7016 - Epoch 8/15 - 92/92 [==============================] - 5s 58ms/step - loss: 0.6876 - accuracy: 0.7265 - val_loss: 0.7341 - val_accuracy: 0.7153 - Epoch 9/15 - 92/92 [==============================] - 5s 57ms/step - loss: 0.6440 - accuracy: 0.7514 - val_loss: 0.7189 - val_accuracy: 0.7289 - Epoch 10/15 - 92/92 [==============================] - 5s 57ms/step - loss: 0.6063 - accuracy: 0.7660 - val_loss: 0.8212 - val_accuracy: 0.6975 - Epoch 11/15 - 92/92 [==============================] - 5s 57ms/step - loss: 0.5727 - accuracy: 0.7830 - val_loss: 0.7362 - val_accuracy: 0.7330 - Epoch 12/15 - 92/92 [==============================] - 5s 57ms/step - loss: 0.5634 - accuracy: 0.7888 - val_loss: 0.7458 - val_accuracy: 0.7153 - Epoch 13/15 - 92/92 [==============================] - 5s 58ms/step - loss: 0.5492 - accuracy: 0.7922 - val_loss: 0.7176 - val_accuracy: 0.7439 - Epoch 14/15 - 92/92 [==============================] - 5s 58ms/step - loss: 0.5193 - accuracy: 0.8025 - val_loss: 0.7529 - val_accuracy: 0.7371 - Epoch 15/15 - 92/92 [==============================] - 5s 57ms/step - loss: 0.4890 - accuracy: 0.8123 - val_loss: 0.7434 - val_accuracy: 0.7302 - - -Visualize Training Results --------------------------------------------------------------------- - -After applying data augmentation and Dropout, there is less overfitting -than before, and training and validation accuracy are closer aligned. - -.. code:: ipython3 - - acc = history.history['accuracy'] - val_acc = history.history['val_accuracy'] - - loss = history.history['loss'] - val_loss = history.history['val_loss'] - - epochs_range = range(epochs) - - plt.figure(figsize=(8, 8)) - plt.subplot(1, 2, 1) - plt.plot(epochs_range, acc, label='Training Accuracy') - plt.plot(epochs_range, val_acc, label='Validation Accuracy') - plt.legend(loc='lower right') - plt.title('Training and Validation Accuracy') - - plt.subplot(1, 2, 2) - plt.plot(epochs_range, loss, label='Training Loss') - plt.plot(epochs_range, val_loss, label='Validation Loss') - plt.legend(loc='upper right') - plt.title('Training and Validation Loss') - plt.show() - - - -.. image:: 301-tensorflow-training-openvino-with-output_files/301-tensorflow-training-openvino-with-output_66_0.png - - -Predict on New Data -------------------------------------------------------------- - -Finally, let us use the model to classify an image that was not included -in the training or validation sets. - - **Note**: Data augmentation and Dropout layers are inactive at - inference time. - -.. code:: ipython3 - - sunflower_url = "https://storage.googleapis.com/download.tensorflow.org/example_images/592px-Red_sunflower.jpg" - sunflower_path = tf.keras.utils.get_file('Red_sunflower', origin=sunflower_url) - - img = keras.preprocessing.image.load_img( - sunflower_path, target_size=(img_height, img_width) - ) - img_array = keras.preprocessing.image.img_to_array(img) - img_array = tf.expand_dims(img_array, 0) # Create a batch - - predictions = model.predict(img_array) - score = tf.nn.softmax(predictions[0]) - - print( - "This image most likely belongs to {} with a {:.2f} percent confidence." - .format(class_names[np.argmax(score)], 100 * np.max(score)) - ) - - -.. parsed-literal:: - - 1/1 [==============================] - 0s 72ms/step - This image most likely belongs to sunflowers with a 99.00 percent confidence. - - -Save the TensorFlow Model -------------------------------------------------------------------- - -.. code:: ipython3 - - #save the trained model - a new folder flower will be created - #and the file "saved_model.pb" is the pre-trained model - model_dir = "model" - saved_model_dir = f"{model_dir}/flower/saved_model" - model.save(saved_model_dir) - - -.. parsed-literal:: - - INFO:tensorflow:Assets written to: model/flower/saved_model/assets - - -.. parsed-literal:: - - INFO:tensorflow:Assets written to: model/flower/saved_model/assets - - -Convert the TensorFlow model with OpenVINO Model Conversion API ---------------------------------------------------------------------------------------------------------- - -To convert the model to OpenVINO IR with ``FP16`` precision, use model -conversion Python API. - -.. code:: ipython3 - - # Convert the model to ir model format and save it. - ir_model_path = Path("model/flower") - ir_model_path.mkdir(parents=True, exist_ok=True) - ir_model = ov.convert_model(saved_model_dir, input=[1,180,180,3]) - ov.save_model(ir_model, ir_model_path / "flower_ir.xml") - -Preprocessing Image Function ----------------------------------------------------------------------- - -.. code:: ipython3 - - def pre_process_image(imagePath, img_height=180): - # Model input format - n, h, w, c = [1, img_height, img_height, 3] - image = Image.open(imagePath) - image = image.resize((h, w), resample=Image.BILINEAR) - - # Convert to array and change data layout from HWC to CHW - image = np.array(image) - input_image = image.reshape((n, h, w, c)) - - return input_image - -OpenVINO Runtime Setup ----------------------------------------------------------------- - -Select inference device -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -select device from dropdown list for running inference using OpenVINO - -.. code:: ipython3 - - import ipywidgets as widgets - - # Initialize OpenVINO runtime - core = ov.Core() - device = widgets.Dropdown( - options=core.available_devices + ["AUTO"], - value='AUTO', - description='Device:', - disabled=False, - ) - - device - - - - -.. parsed-literal:: - - Dropdown(description='Device:', index=1, options=('CPU', 'AUTO'), value='AUTO') - - - -.. code:: ipython3 - - class_names=["daisy", "dandelion", "roses", "sunflowers", "tulips"] - - compiled_model = core.compile_model(model=ir_model, device_name=device.value) - - del ir_model - - input_layer = compiled_model.input(0) - output_layer = compiled_model.output(0) - -Run the Inference Step ----------------------------------------------------------------- - -.. code:: ipython3 - - # Run inference on the input image... - inp_img_url = "https://upload.wikimedia.org/wikipedia/commons/4/48/A_Close_Up_Photo_of_a_Dandelion.jpg" - OUTPUT_DIR = "output" - inp_file_name = f"A_Close_Up_Photo_of_a_Dandelion.jpg" - file_path = Path(OUTPUT_DIR)/Path(inp_file_name) - - os.makedirs(OUTPUT_DIR, exist_ok=True) - - # Download the image - download_file(inp_img_url, inp_file_name, directory=OUTPUT_DIR) - - # Pre-process the image and get it ready for inference. - input_image = pre_process_image(file_path) - - print(input_image.shape) - print(input_layer.shape) - res = compiled_model([input_image])[output_layer] - - score = tf.nn.softmax(res[0]) - - # Show the results - image = Image.open(file_path) - plt.imshow(image) - print( - "This image most likely belongs to {} with a {:.2f} percent confidence." - .format(class_names[np.argmax(score)], 100 * np.max(score)) - ) - - -.. parsed-literal:: - - 'output/A_Close_Up_Photo_of_a_Dandelion.jpg' already exists. - (1, 180, 180, 3) - [1,180,180,3] - This image most likely belongs to dandelion with a 99.82 percent confidence. - - - -.. image:: 301-tensorflow-training-openvino-with-output_files/301-tensorflow-training-openvino-with-output_79_1.png - - -The Next Steps --------------------------------------------------------- - -This tutorial showed how to train a TensorFlow model, how to convert -that model to OpenVINO’s IR format, and how to do inference on the -converted model. For faster inference speed, you can quantize the IR -model. To see how to quantize this model with OpenVINO’s `Post-training -Quantization with NNCF -Tool `__, -check out the `Post-Training Quantization with TensorFlow Classification -Model <./301-tensorflow-training-openvino-nncf.ipynb>`__ notebook. diff --git a/docs/notebooks/301-tensorflow-training-openvino-with-output_files/301-tensorflow-training-openvino-with-output_14_0.jpg b/docs/notebooks/301-tensorflow-training-openvino-with-output_files/301-tensorflow-training-openvino-with-output_14_0.jpg deleted file mode 100644 index 532ff55c1d94fc..00000000000000 --- a/docs/notebooks/301-tensorflow-training-openvino-with-output_files/301-tensorflow-training-openvino-with-output_14_0.jpg +++ /dev/null @@ -1,3 +0,0 @@ -version https://git-lfs.github.com/spec/v1 -oid sha256:43949a084d6557772310458a3ff6a6921a4752faf0d74ac20fc81204efaf9434 -size 7042 diff --git a/docs/notebooks/301-tensorflow-training-openvino-with-output_files/301-tensorflow-training-openvino-with-output_14_0.png b/docs/notebooks/301-tensorflow-training-openvino-with-output_files/301-tensorflow-training-openvino-with-output_14_0.png deleted file mode 100644 index 3ea370c52289f6..00000000000000 --- a/docs/notebooks/301-tensorflow-training-openvino-with-output_files/301-tensorflow-training-openvino-with-output_14_0.png +++ /dev/null @@ -1,3 +0,0 @@ -version https://git-lfs.github.com/spec/v1 -oid sha256:a969ea86bf49ca484394adedc3bfc631e125c1c54472a37089ef3b094651e1cf -size 64525 diff --git a/docs/notebooks/301-tensorflow-training-openvino-with-output_files/301-tensorflow-training-openvino-with-output_15_0.jpg b/docs/notebooks/301-tensorflow-training-openvino-with-output_files/301-tensorflow-training-openvino-with-output_15_0.jpg deleted file mode 100644 index 87ae42741c0fc7..00000000000000 --- a/docs/notebooks/301-tensorflow-training-openvino-with-output_files/301-tensorflow-training-openvino-with-output_15_0.jpg +++ /dev/null @@ -1,3 +0,0 @@ -version https://git-lfs.github.com/spec/v1 -oid sha256:e5c4ddb54a36fe095f708f3da0093643f629c4c45f80df539a24db47b849def9 -size 20653 diff --git a/docs/notebooks/301-tensorflow-training-openvino-with-output_files/301-tensorflow-training-openvino-with-output_15_0.png b/docs/notebooks/301-tensorflow-training-openvino-with-output_files/301-tensorflow-training-openvino-with-output_15_0.png deleted file mode 100644 index b60e204aeb1f95..00000000000000 --- a/docs/notebooks/301-tensorflow-training-openvino-with-output_files/301-tensorflow-training-openvino-with-output_15_0.png +++ /dev/null @@ -1,3 +0,0 @@ -version https://git-lfs.github.com/spec/v1 -oid sha256:70ae783b513ce08e778cf53b0f0daea47c6032a737fa07330e66bcfd6f742943 -size 167334 diff --git a/docs/notebooks/301-tensorflow-training-openvino-with-output_files/301-tensorflow-training-openvino-with-output_17_0.jpg b/docs/notebooks/301-tensorflow-training-openvino-with-output_files/301-tensorflow-training-openvino-with-output_17_0.jpg deleted file mode 100644 index c398ad4d168401..00000000000000 --- a/docs/notebooks/301-tensorflow-training-openvino-with-output_files/301-tensorflow-training-openvino-with-output_17_0.jpg +++ /dev/null @@ -1,3 +0,0 @@ -version https://git-lfs.github.com/spec/v1 -oid sha256:744d21693b7566bd1dfeef72f10f6e208b40fed0778b08c031860d41324e6eb1 -size 15872 diff --git a/docs/notebooks/301-tensorflow-training-openvino-with-output_files/301-tensorflow-training-openvino-with-output_17_0.png b/docs/notebooks/301-tensorflow-training-openvino-with-output_files/301-tensorflow-training-openvino-with-output_17_0.png deleted file mode 100644 index 7a1e3c16793d57..00000000000000 --- a/docs/notebooks/301-tensorflow-training-openvino-with-output_files/301-tensorflow-training-openvino-with-output_17_0.png +++ /dev/null @@ -1,3 +0,0 @@ -version https://git-lfs.github.com/spec/v1 -oid sha256:5213720caf7341165b7dc44f2f492d93f14a1542d2cd39ffbb807a8938adfdf8 -size 225545 diff --git a/docs/notebooks/301-tensorflow-training-openvino-with-output_files/301-tensorflow-training-openvino-with-output_18_0.jpg b/docs/notebooks/301-tensorflow-training-openvino-with-output_files/301-tensorflow-training-openvino-with-output_18_0.jpg deleted file mode 100644 index 4e33a6cc9f0f1c..00000000000000 --- a/docs/notebooks/301-tensorflow-training-openvino-with-output_files/301-tensorflow-training-openvino-with-output_18_0.jpg +++ /dev/null @@ -1,3 +0,0 @@ -version https://git-lfs.github.com/spec/v1 -oid sha256:d9fe90ab463c6396fe1a5cdc3e42168b4be4e3454a7695b93ac624b78fa2c5a8 -size 23154 diff --git a/docs/notebooks/301-tensorflow-training-openvino-with-output_files/301-tensorflow-training-openvino-with-output_18_0.png b/docs/notebooks/301-tensorflow-training-openvino-with-output_files/301-tensorflow-training-openvino-with-output_18_0.png deleted file mode 100644 index f43f12f10342a0..00000000000000 --- a/docs/notebooks/301-tensorflow-training-openvino-with-output_files/301-tensorflow-training-openvino-with-output_18_0.png +++ /dev/null @@ -1,3 +0,0 @@ -version https://git-lfs.github.com/spec/v1 -oid sha256:bd366a9bac123cbd99f91c71c19ebd2f23816b6a48f93b34e185163f7bd52cb9 -size 154227 diff --git a/docs/notebooks/301-tensorflow-training-openvino-with-output_files/301-tensorflow-training-openvino-with-output_29_0.png b/docs/notebooks/301-tensorflow-training-openvino-with-output_files/301-tensorflow-training-openvino-with-output_29_0.png deleted file mode 100644 index bcafec97ae4947..00000000000000 --- a/docs/notebooks/301-tensorflow-training-openvino-with-output_files/301-tensorflow-training-openvino-with-output_29_0.png +++ /dev/null @@ -1,3 +0,0 @@ -version https://git-lfs.github.com/spec/v1 -oid sha256:2f5c735c29db167e115ad07c4088ba6fa3b7ea2466f79bc3f47ed1a5d7772b17 -size 941151 diff --git a/docs/notebooks/301-tensorflow-training-openvino-with-output_files/301-tensorflow-training-openvino-with-output_57_0.png b/docs/notebooks/301-tensorflow-training-openvino-with-output_files/301-tensorflow-training-openvino-with-output_57_0.png deleted file mode 100644 index 164d702a3f1841..00000000000000 --- a/docs/notebooks/301-tensorflow-training-openvino-with-output_files/301-tensorflow-training-openvino-with-output_57_0.png +++ /dev/null @@ -1,3 +0,0 @@ -version https://git-lfs.github.com/spec/v1 -oid sha256:c6740b40acf1f7bb69f36d2628bf714cec62f93554bc756f9ffabc321867d179 -size 530251 diff --git a/docs/notebooks/301-tensorflow-training-openvino-with-output_files/301-tensorflow-training-openvino-with-output_66_0.png b/docs/notebooks/301-tensorflow-training-openvino-with-output_files/301-tensorflow-training-openvino-with-output_66_0.png deleted file mode 100644 index 46b74805f68f73..00000000000000 --- a/docs/notebooks/301-tensorflow-training-openvino-with-output_files/301-tensorflow-training-openvino-with-output_66_0.png +++ /dev/null @@ -1,3 +0,0 @@ -version https://git-lfs.github.com/spec/v1 -oid sha256:2105df5b195b2a9257d53c0dac7350af567654ada65c25d535ffced47e9ede16 -size 57480 diff --git a/docs/notebooks/301-tensorflow-training-openvino-with-output_files/301-tensorflow-training-openvino-with-output_79_1.png b/docs/notebooks/301-tensorflow-training-openvino-with-output_files/301-tensorflow-training-openvino-with-output_79_1.png deleted file mode 100644 index cc0469a39df0ea..00000000000000 --- a/docs/notebooks/301-tensorflow-training-openvino-with-output_files/301-tensorflow-training-openvino-with-output_79_1.png +++ /dev/null @@ -1,3 +0,0 @@ -version https://git-lfs.github.com/spec/v1 -oid sha256:fac9751b1b6d0d05f499bb346765698d4d1ed3d3de8831dffdc312e07405e24a -size 143412 diff --git a/docs/notebooks/301-tensorflow-training-openvino-with-output_files/index.html b/docs/notebooks/301-tensorflow-training-openvino-with-output_files/index.html deleted file mode 100644 index 880a6d8aaddb53..00000000000000 --- a/docs/notebooks/301-tensorflow-training-openvino-with-output_files/index.html +++ /dev/null @@ -1,18 +0,0 @@ - -Index of /projects/ov-notebook/0.1.0-latest/20231030220807/dist/rst_files/301-tensorflow-training-openvino-with-output_files/ - -

Index of /projects/ov-notebook/0.1.0-latest/20231030220807/dist/rst_files/301-tensorflow-training-openvino-with-output_files/


../
-301-tensorflow-training-openvino-with-output_14..> 31-Oct-2023 00:35                7042
-301-tensorflow-training-openvino-with-output_14..> 31-Oct-2023 00:35               64525
-301-tensorflow-training-openvino-with-output_15..> 31-Oct-2023 00:35               20653
-301-tensorflow-training-openvino-with-output_15..> 31-Oct-2023 00:35              167334
-301-tensorflow-training-openvino-with-output_17..> 31-Oct-2023 00:35               15872
-301-tensorflow-training-openvino-with-output_17..> 31-Oct-2023 00:35              225545
-301-tensorflow-training-openvino-with-output_18..> 31-Oct-2023 00:35               23154
-301-tensorflow-training-openvino-with-output_18..> 31-Oct-2023 00:35              154227
-301-tensorflow-training-openvino-with-output_29..> 31-Oct-2023 00:35              941151
-301-tensorflow-training-openvino-with-output_57..> 31-Oct-2023 00:35              530251
-301-tensorflow-training-openvino-with-output_66..> 31-Oct-2023 00:35               57480
-301-tensorflow-training-openvino-with-output_79..> 31-Oct-2023 00:35              143412
-

- diff --git a/docs/notebooks/305-tensorflow-quantization-aware-training-with-output.rst b/docs/notebooks/305-tensorflow-quantization-aware-training-with-output.rst index 05108191ba577b..f8e575bdcf3b80 100644 --- a/docs/notebooks/305-tensorflow-quantization-aware-training-with-output.rst +++ b/docs/notebooks/305-tensorflow-quantization-aware-training-with-output.rst @@ -23,24 +23,23 @@ download time. **Table of contents:** - - `Imports and Settings <#imports-and-settings>`__ - `Dataset Preprocessing <#dataset-preprocessing>`__ -- `Define a Floating-Point - Model <#define-a-floating-point-model>`__ +- `Define a Floating-Point Model <#define-a-floating-point-model>`__ - `Pre-train a Floating-Point Model <#pre-train-a-floating-point-model>`__ - `Create and Initialize Quantization <#create-and-initialize-quantization>`__ -- `Fine-tune the Compressed - Model <#fine-tune-the-compressed-model>`__ +- `Fine-tune the Compressed Model <#fine-tune-the-compressed-model>`__ - `Export Models to OpenVINO Intermediate Representation (IR) <#export-models-to-openvino-intermediate-representation-ir>`__ - `Benchmark Model Performance by Computing Inference Time <#benchmark-model-performance-by-computing-inference-time>`__ -Imports and Settings --------------------------------------------------------------- +Imports and Settings +-------------------- + + Import NNCF and all auxiliary packages from your Python code. Set a name for the model, input image size, used batch size, and the learning rate. @@ -74,7 +73,6 @@ models will be stored. ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. onnxconverter-common 1.14.0 requires protobuf==3.20.2, but you have protobuf 3.20.3 which is incompatible. pytorch-lightning 1.6.5 requires protobuf<=3.20.1, but you have protobuf 3.20.3 which is incompatible. - tensorflow 2.13.1 requires typing-extensions<4.6.0,>=3.6.6, but you have typing-extensions 4.8.0 which is incompatible. Note: you may need to restart the kernel to use updated packages. @@ -124,24 +122,25 @@ models will be stored. .. parsed-literal:: - 2023-10-31 00:22:02.092134: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`. - 2023-10-31 00:22:02.126560: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. + 2023-11-15 00:29:06.329749: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`. + 2023-11-15 00:29:06.363853: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. - 2023-10-31 00:22:02.723114: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT + 2023-11-15 00:29:06.956739: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT .. parsed-literal:: INFO:nncf:NNCF initialized successfully. Supported frameworks detected: torch, tensorflow, onnx, openvino - WARNING:nncf:NNCF provides best results with tensorflow==2.12.*, while current tensorflow version is 2.13.1. If you encounter issues, consider switching to tensorflow==2.12.* Downloading data from https://storage.openvinotoolkit.org/repositories/nncf/openvino_notebook_ckpts/305_resnet18_imagenette_fp32_v1.h5 - 134604992/134604992 [==============================] - 36s 0us/step + 134604992/134604992 [==============================] - 38s 0us/step Absolute path where the model weights are saved: - /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-534/.workspace/scm/ov-notebook/notebooks/305-tensorflow-quantization-aware-training/model/ResNet-18_fp32.h5 + /opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-545/.workspace/scm/ov-notebook/notebooks/305-tensorflow-quantization-aware-training/model/ResNet-18_fp32.h5 + + +Dataset Preprocessing +--------------------- -Dataset Preprocessing ---------------------------------------------------------------- Download and prepare Imagenette 160px dataset. - Number of classes: 10 - Download size: 94.18 MiB @@ -163,9 +162,17 @@ Download size: 94.18 MiB .. parsed-literal:: - 2023-10-31 00:22:41.251776: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1960] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform. - Skipping registering GPU devices... - 2023-10-31 00:22:41.423281: W tensorflow/core/kernels/data/cache_dataset_ops.cc:854] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead. + 2023-11-15 00:29:49.433840: E tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:266] failed call to cuInit: CUDA_ERROR_COMPAT_NOT_SUPPORTED_ON_DEVICE: forward compatibility was attempted on non supported HW + 2023-11-15 00:29:49.433872: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:168] retrieving CUDA diagnostic information for host: iotg-dev-workstation-07 + 2023-11-15 00:29:49.433876: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:175] hostname: iotg-dev-workstation-07 + 2023-11-15 00:29:49.434026: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:199] libcuda reported version is: 470.223.2 + 2023-11-15 00:29:49.434042: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:203] kernel reported version is: 470.182.3 + 2023-11-15 00:29:49.434046: E tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:312] kernel version 470.182.3 does not match DSO version 470.223.2 -- cannot find working devices in this configuration + 2023-11-15 00:29:49.527173: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_4' with dtype int64 and shape [1] + [[{{node Placeholder/_4}}]] + 2023-11-15 00:29:49.527491: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_0' with dtype string and shape [1] + [[{{node Placeholder/_0}}]] + 2023-11-15 00:29:49.604302: W tensorflow/core/kernels/data/cache_dataset_ops.cc:856] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead. @@ -190,8 +197,10 @@ Download size: 94.18 MiB .batch(BATCH_SIZE) .prefetch(tf.data.experimental.AUTOTUNE)) -Define a Floating-Point Model ------------------------------------------------------------------------ +Define a Floating-Point Model +----------------------------- + + .. code:: ipython3 @@ -265,8 +274,10 @@ Define a Floating-Point Model IMG_SHAPE = IMG_SIZE + (3,) fp32_model = ResNet18(input_shape=IMG_SHAPE) -Pre-train a Floating-Point Model --------------------------------------------------------------------------- +Pre-train a Floating-Point Model +-------------------------------- + + Using NNCF for model compression assumes that the user has a pre-trained model and a training pipeline. @@ -296,13 +307,23 @@ model and a training pipeline. .. parsed-literal:: - 4/4 [==============================] - 1s 161ms/sample - loss: 0.9807 - acc@1: 0.8220 + 2023-11-15 00:29:50.670388: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_1' with dtype string and shape [1] + [[{{node Placeholder/_1}}]] + 2023-11-15 00:29:50.670801: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_2' with dtype string and shape [1] + [[{{node Placeholder/_2}}]] + + +.. parsed-literal:: + + 4/4 [==============================] - 1s 235ms/sample - loss: 0.9807 - acc@1: 0.8220 Accuracy of FP32 model: 0.822 -Create and Initialize Quantization ----------------------------------------------------------------------------- +Create and Initialize Quantization +---------------------------------- + + NNCF enables compression-aware training by integrating into regular training pipelines. The framework is designed so that modifications to @@ -342,9 +363,13 @@ scenario and requires only 3 modifications. .. parsed-literal:: - 2023-10-31 00:22:45.577314: W tensorflow/core/kernels/data/cache_dataset_ops.cc:854] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead. - 2023-10-31 00:22:46.107962: W tensorflow/core/kernels/data/cache_dataset_ops.cc:854] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead. - 2023-10-31 00:22:52.452611: W tensorflow/core/kernels/data/cache_dataset_ops.cc:854] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead. + 2023-11-15 00:29:53.164614: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_1' with dtype string and shape [1] + [[{{node Placeholder/_1}}]] + 2023-11-15 00:29:53.164992: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_4' with dtype int64 and shape [1] + [[{{node Placeholder/_4}}]] + 2023-11-15 00:29:54.320146: W tensorflow/core/kernels/data/cache_dataset_ops.cc:856] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead. + 2023-11-15 00:29:54.969869: W tensorflow/core/kernels/data/cache_dataset_ops.cc:856] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead. + 2023-11-15 00:30:03.554536: W tensorflow/core/kernels/data/cache_dataset_ops.cc:856] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead. Evaluate the new model on the validation set after initialization of @@ -370,11 +395,13 @@ demonstrated here. .. parsed-literal:: - 4/4 [==============================] - 1s 254ms/sample - loss: 0.9773 - acc@1: 0.8060 + 4/4 [==============================] - 1s 308ms/sample - loss: 0.9766 - acc@1: 0.8120 + + +Fine-tune the Compressed Model +------------------------------ -Fine-tune the Compressed Model ------------------------------------------------------------------------- At this step, a regular fine-tuning process is applied to further improve quantized model accuracy. Normally, several epochs of tuning are @@ -400,20 +427,22 @@ training pipeline are required. Here is a simple example. .. parsed-literal:: - Accuracy of INT8 model after initialization: 0.806 + Accuracy of INT8 model after initialization: 0.812 Epoch 1/2 - 101/101 [==============================] - 41s 341ms/step - loss: 0.7136 - acc@1: 0.9297 + 101/101 [==============================] - 48s 408ms/step - loss: 0.7134 - acc@1: 0.9299 Epoch 2/2 - 101/101 [==============================] - 33s 327ms/step - loss: 0.6803 - acc@1: 0.9500 - 4/4 [==============================] - 0s 92ms/sample - loss: 0.9780 - acc@1: 0.8220 + 101/101 [==============================] - 42s 416ms/step - loss: 0.6807 - acc@1: 0.9489 + 4/4 [==============================] - 1s 146ms/sample - loss: 0.9760 - acc@1: 0.8160 - Accuracy of INT8 model after fine-tuning: 0.822 + Accuracy of INT8 model after fine-tuning: 0.816 - Accuracy drop of tuned INT8 model over pre-trained FP32 model: 0.000 + Accuracy drop of tuned INT8 model over pre-trained FP32 model: 0.006 + + +Export Models to OpenVINO Intermediate Representation (IR) +---------------------------------------------------------- -Export Models to OpenVINO Intermediate Representation (IR) ----------------------------------------------------------------------------------------------------- Use model conversion Python API to convert the models to OpenVINO IR. @@ -441,8 +470,10 @@ Executing this command may take a while. model_ir_int8 = ov.convert_model(int8_model) -Benchmark Model Performance by Computing Inference Time -------------------------------------------------------------------------------------------------- +Benchmark Model Performance by Computing Inference Time +------------------------------------------------------- + + Finally, measure the inference performance of the ``FP32`` and ``INT8`` models, using `Benchmark @@ -484,10 +515,10 @@ throughput (frames per second) values. .. parsed-literal:: Benchmark FP32 model (IR) - [ INFO ] Throughput: 2851.18 FPS + [ INFO ] Throughput: 2859.55 FPS Benchmark INT8 model (IR) - [ INFO ] Throughput: 11461.97 FPS + [ INFO ] Throughput: 11646.57 FPS Show CPU Information for reference. diff --git a/docs/notebooks/305-tensorflow-quantization-aware-training-with-output_files/index.html b/docs/notebooks/305-tensorflow-quantization-aware-training-with-output_files/index.html index 7cc6dbfbfb7b98..4359e988d975b1 100644 --- a/docs/notebooks/305-tensorflow-quantization-aware-training-with-output_files/index.html +++ b/docs/notebooks/305-tensorflow-quantization-aware-training-with-output_files/index.html @@ -1,7 +1,7 @@ -Index of /projects/ov-notebook/0.1.0-latest/20231030220807/dist/rst_files/305-tensorflow-quantization-aware-training-with-output_files/ +Index of /projects/ov-notebook/0.1.0-latest/20231114220808/dist/rst_files/305-tensorflow-quantization-aware-training-with-output_files/ -

Index of /projects/ov-notebook/0.1.0-latest/20231030220807/dist/rst_files/305-tensorflow-quantization-aware-training-with-output_files/


../
-305-tensorflow-quantization-aware-training-with..> 31-Oct-2023 00:35              519560
+

Index of /projects/ov-notebook/0.1.0-latest/20231114220808/dist/rst_files/305-tensorflow-quantization-aware-training-with-output_files/


../
+305-tensorflow-quantization-aware-training-with..> 15-Nov-2023 00:43              519560
 

diff --git a/docs/notebooks/402-pose-estimation-with-output.rst b/docs/notebooks/402-pose-estimation-with-output.rst index 05c0c65e80db80..4f7a44ebc4b00c 100644 --- a/docs/notebooks/402-pose-estimation-with-output.rst +++ b/docs/notebooks/402-pose-estimation-with-output.rst @@ -16,7 +16,6 @@ Additionally, you can also upload a video file. **Table of contents:** ---- - `Imports <#imports>`__ - `The model <#the-model>`__ diff --git a/docs/notebooks/404-style-transfer-with-output.rst b/docs/notebooks/404-style-transfer-with-output.rst index 6c719a0f1b8070..192590c000649e 100644 --- a/docs/notebooks/404-style-transfer-with-output.rst +++ b/docs/notebooks/404-style-transfer-with-output.rst @@ -29,7 +29,6 @@ Additionally, you can also upload a video file. video file. **Table of contents:** ---- - `Preparation <#preparation>`__ - `Install requirements <#install-requirements>`__ diff --git a/docs/notebooks/406-3D-pose-estimation-with-output.rst b/docs/notebooks/406-3D-pose-estimation-with-output.rst index 1d99bdf996b7de..851fbbbc926006 100644 --- a/docs/notebooks/406-3D-pose-estimation-with-output.rst +++ b/docs/notebooks/406-3D-pose-estimation-with-output.rst @@ -8,11 +8,9 @@ from `Open Model Zoo `__. At the end of this notebook, you will see live inference results from your webcam (if available). Alternatively, you can also upload a video file to test -out the algorithms. **Make sure you have properly installed -the**\ `Jupyter -extension `__\ **and -been using JupyterLab to run the demo as suggested in the -``README.md``** +out the algorithms. **Make sure you have properly installed the** +`Jupyter extension `__ +**and been using JupyterLab to run the demo as suggested in the README.md** **NOTE**: *To use a webcam, you must run this Jupyter notebook on a computer with a webcam. If you run on a remote server, the webcam @@ -54,7 +52,7 @@ Windows: Chrome* *macOS: Safari* Prerequisites ------------------------------------------------------- -**The ``pythreejs`` extension may not display properly when using the +**The "pythreejs" extension may not display properly when using the latest Jupyter Notebook release (2.4.1). Therefore, it is recommended to use Jupyter Lab instead.** diff --git a/docs/notebooks/all_notebooks_paths.txt b/docs/notebooks/all_notebooks_paths.txt new file mode 100644 index 00000000000000..f5932dc140a303 --- /dev/null +++ b/docs/notebooks/all_notebooks_paths.txt @@ -0,0 +1,131 @@ +notebooks/001-hello-world/001-hello-world.ipynb +notebooks/002-openvino-api/002-openvino-api.ipynb +notebooks/003-hello-segmentation/003-hello-segmentation.ipynb +notebooks/004-hello-detection/004-hello-detection.ipynb +notebooks/101-tensorflow-classification-to-openvino/101-tensorflow-classification-to-openvino.ipynb +notebooks/102-pytorch-to-openvino/102-pytorch-onnx-to-openvino.ipynb +notebooks/102-pytorch-to-openvino/102-pytorch-to-openvino.ipynb +notebooks/103-paddle-to-openvino/103-paddle-to-openvino-classification.ipynb +notebooks/104-model-tools/104-model-tools.ipynb +notebooks/105-language-quantize-bert/105-language-quantize-bert.ipynb +notebooks/106-auto-device/106-auto-device.ipynb +notebooks/107-speech-recognition-quantization/107-speech-recognition-quantization-data2vec.ipynb +notebooks/107-speech-recognition-quantization/107-speech-recognition-quantization-wav2vec2.ipynb +notebooks/108-gpu-device/108-gpu-device.ipynb +notebooks/109-performance-tricks/109-latency-tricks.ipynb +notebooks/109-performance-tricks/109-throughput-tricks.ipynb +notebooks/110-ct-segmentation-quantize/110-ct-scan-live-inference.ipynb +notebooks/110-ct-segmentation-quantize/110-ct-segmentation-quantize-nncf.ipynb +notebooks/110-ct-segmentation-quantize/data-preparation-ct-scan.ipynb +notebooks/110-ct-segmentation-quantize/pytorch-monai-training.ipynb +notebooks/111-yolov5-quantization-migration/111-yolov5-quantization-migration.ipynb +notebooks/112-pytorch-post-training-quantization-nncf/112-pytorch-post-training-quantization-nncf.ipynb +notebooks/113-image-classification-quantization/113-image-classification-quantization.ipynb +notebooks/115-async-api/115-async-api.ipynb +notebooks/116-sparsity-optimization/116-sparsity-optimization.ipynb +notebooks/117-model-server/117-model-server.ipynb +notebooks/118-optimize-preprocessing/118-optimize-preprocessing.ipynb +notebooks/119-tflite-to-openvino/119-tflite-to-openvino.ipynb +notebooks/120-tensorflow-object-detection-to-openvino/120-tensorflow-instance-segmentation-to-openvino.ipynb +notebooks/120-tensorflow-object-detection-to-openvino/120-tensorflow-object-detection-to-openvino.ipynb +notebooks/121-convert-to-openvino/121-convert-to-openvino.ipynb +notebooks/122-quantizing-model-with-accuracy-control/122-speech-recognition-quantization-wav2vec2.ipynb +notebooks/122-quantizing-model-with-accuracy-control/122-yolov8-quantization-with-accuracy-control.ipynb +notebooks/123-detectron2-to-openvino/123-detectron2-to-openvino.ipynb +notebooks/124-hugging-face-hub/124-hugging-face-hub.ipynb +notebooks/125-torchvision-zoo-to-openvino/125-convnext-classification.ipynb +notebooks/125-torchvision-zoo-to-openvino/125-lraspp-segmentation.ipynb +notebooks/126-tensorflow-hub/126-tensorflow-hub.ipynb +notebooks/201-vision-monodepth/201-vision-monodepth.ipynb +notebooks/202-vision-superresolution/202-vision-superresolution-image.ipynb +notebooks/202-vision-superresolution/202-vision-superresolution-video.ipynb +notebooks/203-meter-reader/203-meter-reader.ipynb +notebooks/204-segmenter-semantic-segmentation/204-segmenter-semantic-segmentation.ipynb +notebooks/205-vision-background-removal/205-vision-background-removal.ipynb +notebooks/206-vision-paddlegan-anime/206-vision-paddlegan-anime.ipynb +notebooks/207-vision-paddlegan-superresolution/207-vision-paddlegan-superresolution.ipynb +notebooks/208-optical-character-recognition/208-optical-character-recognition.ipynb +notebooks/209-handwritten-ocr/209-handwritten-ocr.ipynb +notebooks/210-slowfast-video-recognition/210-slowfast-video-recognition.ipynb +notebooks/211-speech-to-text/211-speech-to-text.ipynb +notebooks/212-pyannote-speaker-diarization/212-pyannote-speaker-diarization.ipynb +notebooks/213-question-answering/213-question-answering.ipynb +notebooks/214-grammar-correction/214-grammar-correction.ipynb +notebooks/215-image-inpainting/215-image-inpainting.ipynb +notebooks/216-attention-center/216-attention-center.ipynb +notebooks/217-vision-deblur/217-vision-deblur.ipynb +notebooks/218-vehicle-detection-and-recognition/218-vehicle-detection-and-recognition.ipynb +notebooks/219-knowledge-graphs-conve/219-knowledge-graphs-conve.ipynb +notebooks/220-cross-lingual-books-alignment/220-cross-lingual-books-alignment.ipynb +notebooks/221-machine-translation/221-machine-translation.ipynb +notebooks/222-vision-image-colorization/222-vision-image-colorization.ipynb +notebooks/223-text-prediction/223-text-prediction.ipynb +notebooks/224-3D-segmentation-point-clouds/224-3D-segmentation-point-clouds.ipynb +notebooks/225-stable-diffusion-text-to-image/225-stable-diffusion-text-to-image.ipynb +notebooks/226-yolov7-optimization/226-yolov7-optimization.ipynb +notebooks/227-whisper-subtitles-generation/227-whisper-convert.ipynb +notebooks/227-whisper-subtitles-generation/227-whisper-nncf-quantize.ipynb +notebooks/228-clip-zero-shot-image-classification/228-clip-zero-shot-convert.ipynb +notebooks/228-clip-zero-shot-image-classification/228-clip-zero-shot-quantize.ipynb +notebooks/229-distilbert-sequence-classification/229-distilbert-sequence-classification.ipynb +notebooks/230-yolov8-optimization/230-yolov8-instance-segmentation.ipynb +notebooks/230-yolov8-optimization/230-yolov8-keypoint-detection.ipynb +notebooks/230-yolov8-optimization/230-yolov8-object-detection.ipynb +notebooks/231-instruct-pix2pix-image-editing/231-instruct-pix2pix-image-editing.ipynb +notebooks/232-clip-language-saliency-map/232-clip-language-saliency-map.ipynb +notebooks/233-blip-visual-language-processing/233-blip-convert.ipynb +notebooks/233-blip-visual-language-processing/233-blip-optimize.ipynb +notebooks/234-encodec-audio-compression/234-encodec-audio-compression.ipynb +notebooks/235-controlnet-stable-diffusion/235-controlnet-stable-diffusion.ipynb +notebooks/236-stable-diffusion-v2/236-stable-diffusion-v2-infinite-zoom.ipynb +notebooks/236-stable-diffusion-v2/236-stable-diffusion-v2-optimum-demo-comparison.ipynb +notebooks/236-stable-diffusion-v2/236-stable-diffusion-v2-optimum-demo.ipynb +notebooks/236-stable-diffusion-v2/236-stable-diffusion-v2-text-to-image-demo.ipynb +notebooks/236-stable-diffusion-v2/236-stable-diffusion-v2-text-to-image.ipynb +notebooks/237-segment-anything/237-segment-anything.ipynb +notebooks/238-deepfloyd-if/238-deep-floyd-if-convert.ipynb +notebooks/238-deepfloyd-if/238-deep-floyd-if-optimize.ipynb +notebooks/239-image-bind/239-image-bind-convert.ipynb +notebooks/239-image-bind/239-image-bind-quantize.ipynb +notebooks/240-dolly-2-instruction-following/240-dolly-2-instruction-following.ipynb +notebooks/241-riffusion-text-to-music/241-riffusion-text-to-music.ipynb +notebooks/242-freevc-voice-conversion/242-freevc-voice-conversion.ipynb +notebooks/243-tflite-selfie-segmentation/243-tflite-selfie-segmentation.ipynb +notebooks/244-named-entity-recognition/244-named-entity-recognition.ipynb +notebooks/245-typo-detector/245-typo-detector.ipynb +notebooks/246-depth-estimation-videpth/246-depth-estimation-videpth.ipynb +notebooks/247-code-language-id/247-code-language-id.ipynb +notebooks/248-stable-diffusion-xl/248-stable-diffusion-xl.ipynb +notebooks/249-oneformer-segmentation/249-oneformer-segmentation.ipynb +notebooks/250-music-generation/250-music-generation.ipynb +notebooks/251-tiny-sd-image-generation/251-tiny-sd-image-generation.ipynb +notebooks/252-fastcomposer-image-generation/252-fastcomposer-image-generation.ipynb +notebooks/253-zeroscope-text2video/253-zeroscope-text2video.ipynb +notebooks/254-llm-chatbot/254-llm-chatbot.ipynb +notebooks/255-mms-massively-multilingual-speech/255-mms-massively-multilingual-speech.ipynb +notebooks/256-bark-text-to-audio/256-bark-text-to-audio.ipynb +notebooks/257-llava-multimodal-chatbot/257-llava-multimodal-chatbot.ipynb +notebooks/258-blip-diffusion-subject-generation/258-blip-diffusion-subject-generation.ipynb +notebooks/259-decidiffusion-image-generation/259-decidiffusion-image-generation.ipynb +notebooks/260-pix2struct-docvqa/260-pix2struct-docvqa.ipynb +notebooks/261-fast-segment-anything/261-fast-segment-anything.ipynb +notebooks/262-softvc-voice-conversion/262-softvc-voice-conversion.ipynb +notebooks/263-latent-consistency-models-image-generation/263-latent-consistency-models-image-generation.ipynb +notebooks/264-qrcode-monster/264-qrcode-monster.ipynb +notebooks/265-wuerstchen-image-generation/265-wuerstchen-image-generation.ipynb +notebooks/266-speculative-sampling/266-speculative-sampling.ipynb +notebooks/267-distil-whisper-asr/267-distil-whisper-asr.ipynb +notebooks/268-table-question-answering/268-table-question-answering.ipynb +notebooks/269-film-slowmo/269-film-slowmo.ipynb +notebooks/301-tensorflow-training-openvino/301-tensorflow-training-openvino.ipynb +notebooks/301-tensorflow-training-openvino/301-tensorflow-training-openvino-nncf.ipynb +notebooks/302-pytorch-quantization-aware-training/302-pytorch-quantization-aware-training.ipynb +notebooks/305-tensorflow-quantization-aware-training/305-tensorflow-quantization-aware-training.ipynb +notebooks/401-object-detection-webcam/401-object-detection.ipynb +notebooks/402-pose-estimation-webcam/402-pose-estimation.ipynb +notebooks/403-action-recognition-webcam/403-action-recognition-webcam.ipynb +notebooks/404-style-transfer-webcam/404-style-transfer.ipynb +notebooks/405-paddle-ocr-webcam/405-paddle-ocr-webcam.ipynb +notebooks/406-3D-pose-estimation-webcam/406-3D-pose-estimation.ipynb +notebooks/407-person-tracking-webcam/407-person-tracking.ipynb +notebooks/utils/notebook_utils.ipynb diff --git a/docs/notebooks/notebooks_tags.json b/docs/notebooks/notebooks_tags.json index 1a102761264322..3979ba34021132 100644 --- a/docs/notebooks/notebooks_tags.json +++ b/docs/notebooks/notebooks_tags.json @@ -420,6 +420,7 @@ "Transformers" ], "261-fast-segment-anything": [ + "NNCF", "ONNX", "Pytorch" ], @@ -427,9 +428,34 @@ "Pytorch" ], "263-latent-consistency-models-image-generation": [ + "Benchmark Model", + "NNCF", + "Pytorch", + "Transformers" + ], + "264-qrcode-monster": [ + "Pytorch", + "Transformers" + ], + "265-wuerstchen-image-generation": [ + "Pytorch" + ], + "266-speculative-sampling": [ + "Pytorch", + "Transformers" + ], + "267-distil-whisper-asr": [ + "Async Inference", + "NNCF", + "Transformers" + ], + "268-table-question-answering": [ "Pytorch", "Transformers" ], + "269-film-slowmo": [ + "Tensorflow" + ], "301-tensorflow-training-openvino-nncf": [ "Benchmark Model", "NNCF", diff --git a/docs/notebooks/notebooks_with_colab_buttons.txt b/docs/notebooks/notebooks_with_colab_buttons.txt index 5f9967df05dd15..bb87d012079e6e 100644 --- a/docs/notebooks/notebooks_with_colab_buttons.txt +++ b/docs/notebooks/notebooks_with_colab_buttons.txt @@ -51,6 +51,7 @@ 251-tiny-sd-image-generation 260-pix2struct-docvqa 261-fast-segment-anything +268-table-question-answering 305-tensorflow-quantization-aware-training 401-object-detection 404-style-transfer diff --git a/docs/openvino_sphinx_theme/openvino_sphinx_theme/__init__.py b/docs/openvino_sphinx_theme/openvino_sphinx_theme/__init__.py index 59da0bba5241b8..eed2e43c047ab6 100644 --- a/docs/openvino_sphinx_theme/openvino_sphinx_theme/__init__.py +++ b/docs/openvino_sphinx_theme/openvino_sphinx_theme/__init__.py @@ -9,7 +9,7 @@ from bs4 import BeautifulSoup from sphinx.util import logging from pydata_sphinx_theme import index_toctree -from .directives.code import DoxygenSnippet, Scrollbox, Nodescrollbox, visit_scrollbox, depart_scrollbox +from .directives.code import DoxygenSnippet, Scrollbox, Nodescrollbox, visit_scrollbox, depart_scrollbox, Showcase, Nodeshowcase, visit_showcase, depart_showcase SPHINX_LOGGER = logging.getLogger(__name__) @@ -220,9 +220,15 @@ def setup(app): app.add_html_theme('openvino_sphinx_theme', theme_path) rst.directives.register_directive('doxygensnippet', DoxygenSnippet) rst.directives.register_directive('scrollbox', Scrollbox) + rst.directives.register_directive('showcase', Showcase) app.add_node( Nodescrollbox, html=(visit_scrollbox, depart_scrollbox), latex=(visit_scrollbox, depart_scrollbox) ) + app.add_node( + Nodeshowcase, + html=(visit_showcase, depart_showcase), + latex=(visit_showcase, depart_showcase) + ) return {'parallel_read_safe': True, 'parallel_write_safe': True} diff --git a/docs/openvino_sphinx_theme/openvino_sphinx_theme/directives/code.py b/docs/openvino_sphinx_theme/openvino_sphinx_theme/directives/code.py index 5ef2c5ff6675d5..8fe35b75f2d587 100644 --- a/docs/openvino_sphinx_theme/openvino_sphinx_theme/directives/code.py +++ b/docs/openvino_sphinx_theme/openvino_sphinx_theme/directives/code.py @@ -1,5 +1,6 @@ import os.path - +from pathlib import Path +import sys from sphinx.directives.code import LiteralInclude, LiteralIncludeReader, container_wrapper from sphinx.util import logging from docutils.parsers.rst import Directive, directives @@ -7,6 +8,9 @@ from docutils.nodes import Node from docutils import nodes from sphinx.util import parselinenos +import requests +import re +import json logger = logging.getLogger(__name__) @@ -132,3 +136,117 @@ def run(self): if self.content: self.state.nested_parse(self.content, self.content_offset, node) return [node] + +def visit_showcase(self, node): + attrs = {} + notebook_file = ("notebooks/" + node["title"] + "-with-output.html") if 'title' in node is not None else "" + link_title = (node["title"]) if 'title' in node is not None else "OpenVINO Interactive Tutorial" + + if "height" or "width" in node: + attrs["style"] = ( + (("height:" + "".join(c for c in str(node["height"]) if c.isdigit()) + "px!important; " ) if "height" in node is not None else "") + + (("width:" + "".join(c for c in str(node["width"]) if c.isdigit()) ) if "width" in node is not None else "") + + (("px; " if node["width"].find("px") != -1 else "%;") if "width" in node is not None else "") + ) + self.body.append("
") + self.body.append(self.starttag(node, "div", **attrs)) + self.body.append("
") + self.body.append("") + self.body.append(""+os.path.basename(node["img"]) + "
") if "img" in node is not None else "" + + self.body.append("
") + + +def depart_showcase(self, node): + notebooks_repo = "https://github.com/openvinotoolkit/openvino_notebooks/blob/main/" + notebooks_binder = "https://mybinder.org/v2/gh/openvinotoolkit/openvino_notebooks/HEAD?filepath=" + notebooks_colab = "https://colab.research.google.com/github/openvinotoolkit/openvino_notebooks/blob/main/" + git_badge = "Github" + binder_badge = "Binder" + colab_badge = "Colab" + binder_list_file = Path('../../../docs/notebooks/notebooks_with_binder_buttons.txt').resolve(strict=True) + colab_list_file = Path('../../../docs/notebooks/notebooks_with_colab_buttons.txt').resolve(strict=True) + openvino_notebooks_repo_listing = Path('../../../docs/notebooks/all_notebooks_paths.txt').resolve(strict=True) + with open(binder_list_file, 'r+', encoding='cp437') as file: + binder_buttons_list = file.read().splitlines() + with open(colab_list_file, 'r+', encoding='cp437') as file: + colab_buttons_list = file.read().splitlines() + if not os.path.exists(openvino_notebooks_repo_listing): + raise FileNotFoundError("all_notebooks_paths.txt is not found") + else: + paths_list = open(openvino_notebooks_repo_listing, 'r').readlines() + ipynb_list = [x for x in paths_list if re.match("notebooks/[0-9]{3}.*\.ipynb$", x)] + notebook_with_ext = node["title"] + ".ipynb" + matched_notebook = [match for match in ipynb_list if notebook_with_ext in match] + + + notebook_file = ("notebooks/" + node["title"] + "-with-output.html") if 'title' in node is not None else "" + link_title = (node["title"]) if 'title' in node is not None else "OpenVINO Interactive Tutorial" + + self.body.append("") + self.body.append("

" + node["title"] + "

") if 'title' in node is not None else "" + + if matched_notebook is not None: + for n in matched_notebook: + self.body.append(("" + git_badge + "") if 'title' in node is not None else "") + if node["title"] in binder_buttons_list: + self.body.append(("" + binder_badge + "" + ) if 'title' in node is not None else "") + if node["title"] in colab_buttons_list: + self.body.append(("" + colab_badge + "" + ) if 'title' in node is not None else "") + + self.body.append("
\n") + + +class Nodeshowcase(nodes.container): + def create_showcase_component( + rawtext: str = "", + **attributes, + ) -> nodes.container: + node = nodes.container(rawtext, is_div=True, **attributes) + return node + + +class Showcase(Directive): + has_content = True + required_arguments = 0 + optional_arguments = 1 + final_argument_whitespace = True + option_spec = { + 'class': directives.class_option, + 'name': directives.unchanged, + 'width': directives.length_or_percentage_or_unitless, + 'height': directives.length_or_percentage_or_unitless, + 'style': directives.unchanged, + 'img': directives.unchanged, + 'img-class': directives.unchanged, + 'title': directives.unchanged, + 'git': directives.unchanged, + } + + has_content = True + + def run(self): + + classes = ['showcase'] + node = Nodeshowcase("div", rawtext="\n".join(self.content), classes=classes) + if 'height' in self.options: + node['height'] = self.options['height'] + if 'width' in self.options: + node['width'] = self.options['width'] + if 'img' in self.options: + node['img'] = self.options['img'] + if 'img-class' in self.options: + node['img-class'] = self.options['img-class'] + if 'title' in self.options: + node['title'] = self.options['title'] + if 'git' in self.options: + node['git'] = self.options['git'] + node['classes'] += self.options.get('class', []) + self.add_name(node) + if self.content: + self.state.nested_parse(self.content, self.content_offset, node) + return [node] diff --git a/docs/openvino_sphinx_theme/openvino_sphinx_theme/static/css/openvino_sphinx_theme.css b/docs/openvino_sphinx_theme/openvino_sphinx_theme/static/css/openvino_sphinx_theme.css index a9678d9a73249a..89311f190436d7 100644 --- a/docs/openvino_sphinx_theme/openvino_sphinx_theme/static/css/openvino_sphinx_theme.css +++ b/docs/openvino_sphinx_theme/openvino_sphinx_theme/static/css/openvino_sphinx_theme.css @@ -55,6 +55,138 @@ body { border-color: rgb(var(--ost-color-primary)); } +/* Showcase Extension */ + +:root { + /* Showcase - Colors */ + --ov-color: hsl(261, 87.2%, 54.1%); + --black: hsl(0, 0%, 0%); + --white: hsl(0, 0%, 100%); + --ov-light-blue: rgba(229, 242, 255, 0.95); + --ov-blue: rgba(0, 104, 181, 0.95); + --ov-dark-blue: rgba(0, 74, 134, 0.95); + } + + .showcase-wrap { + width: 100%; + display: inline-block; + margin: 10px 10px 30px 10px; + + } + + .showcase { + width: 100%; + min-height: 150px; + vertical-align: top; + } + + .showcase-content { + min-height: inherit; + width: -webkit-calc(100% - 160px); + width: -moz-calc(100% - 160px); + width: calc(100% - 160px); + padding: 0.5rem 1.2rem 0.5rem; + display: inline-block; + position: relative; + + } + .showcase-content::after { + content: ""; + position: absolute; + height: 0.15rem; + width: 100%; + bottom: 0px; + left: 0px; + background-color: var(--ov-light-blue); + } + + .showcase-img { + object-fit: cover; + width: 150px; + height: 150px; + position: absolute; + border-radius: 0.25rem; + box-shadow: 0.25rem 0.25rem 0.5rem rgba(0,0,0,0.25); + border: 1px solid rgba(0, 0, 0, 0.25); + } + + .showcase-img-placeholder { + min-height: 150px; + width: 150px; + text-align: left; + vertical-align: top; + display: inline-block; + position: relative; + margin-right: 10px; + } + + .showcase-title { + font-size: 0.90rem; + line-height: 1; + margin: 0px 0px 10px 0px; + color: var(--black); + position: relative; + width: fit-content; + width: -moz-fit-content; + } + + .showcase-title::after { + content: ""; + position: absolute; + height: 0.15rem; + width: 100%; + bottom: -5px; + left: 0px; + background-color: var(--ov-light-blue); + } + + .showcase-content-container p { + font-size: 1.2rem; + line-height: 1.2; + color: var(--black); + position: relative; + } + + .showcase-content-container a > h2 { + display: block!important; + width: fit-content; + width: -moz-fit-content; + } + + .showcase-badge { + margin-right: 3px; + } + + .showcase-button { + width: 100px; + float: right; + border: none; + font-size: 1rem; + color: #00A3F6; + background-color: #FFF; + position: absolute; + right: 0px; + bottom: 0px; + } + + .showcase-button:focus { + outline: none!important; + } + + .showcase-button:hover { + text-decoration: underline; + } + + .showcase-badge:hover, .showcase-img:hover { + transform: scale(1.05); + } + + .showcase-img:hover { + box-shadow: 0 0 10px 2px rgba(108,36,240,0.3) !important; + } + + + /* Scrollbox Extension */ .scrollbox {