diff --git a/examples/triton_gpt2/AzureSetup.ipynb b/examples/triton_gpt2/AzureSetup.ipynb
new file mode 100644
index 0000000000..1f458b015f
--- /dev/null
+++ b/examples/triton_gpt2/AzureSetup.ipynb
@@ -0,0 +1,695 @@
+{
+ "metadata": {
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.5"
+  },
+  "orig_nbformat": 2,
+  "kernelspec": {
+   "name": "python3",
+   "display_name": "Python 3.8.5 64-bit"
+  },
+  "interpreter": {
+   "hash": "916dbcbb3f70747c44a77c7bcd40155683ae19c65e1c03b4aa3499c5328201f1"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2,
+ "cells": [
+  {
+   "source": [
+    "# Setup Azure Kubernetes Infrastructure\n",
+    "In this notebook we will \n",
+    "- Login to Aure account\n",
+    "- [Create AKS cluster with](#aks)\n",
+    "  - **GPU enabled Spot VM nodepool** for running ML elastic training\n",
+    "  - **CPU VM nodepool** for running typical workloads \n",
+    "- [Azure Storage Account for hosting model data](#storageaccount)\n",
+    "- Deploy Kubernetes Components\n",
+    "  - [Install **Azure Blob CSI Driver**](#csidriver) to map Blob storage to container as persistent volumes\n",
+    "  - [Create Kubernetes **PersistentVolume** and PersistentVolumeClaim](#pv)"
+   ],
+   "cell_type": "markdown",
+   "metadata": {}
+  },
+  {
+   "source": [
+    "## Define Variables\n",
+    "Set variables required for the project"
+   ],
+   "cell_type": "markdown",
+   "metadata": {}
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "subscription_id = \"<xxxx-xxxx-xxxx-xxxx>\"             # fill in\n",
+    "resource_group = \"seldon\"   # feel free to replace or use this default\n",
+    "region = \"eastus2\"               # ffeel free to replace or use this default\n",
+    "\n",
+    "storage_account_name = \"modeltestsgpt\"        # fill in\n",
+    "storage_container_name = \"gpt2tf\"             \n",
+    "\n",
+    "aks_name = \"modeltests\"    # feel free to replace or use this default\n",
+    "aks_gpupool = \"gpunodes\"       # feel free to replace or use this default\n",
+    "aks_cpupool = \"cpunodes\"     # feel free to replace or use this default\n",
+    "aks_gpu_sku = \"Standard_NC6s_v3\"       # feel free to replace or use this default \n",
+    "aks_cpu_sku = \"Standard_F8s_v2\""
+   ]
+  },
+  {
+   "source": [
+    "## Azure account login\n",
+    "If you are not already logged in to an Azure account, the command below will initiate a login. This will pop up a browser where you can select your login. (if no web browser is available or if the web browser fails to open, use device code flow with `az login --use-device-code` or login in WSL command  prompt and proceed to notebook)"
+   ],
+   "cell_type": "markdown",
+   "metadata": {}
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%bash\n",
+    "az login -o table\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!az account set --subscription \"$subscription_id\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!az account show"
+   ]
+  },
+  {
+   "source": [
+    "## Create Resource Group\n",
+    "Azure encourages the use of groups to organize all the Azure components you deploy. That way it is easier to find them but also we can delete a number of resources simply by deleting the group."
+   ],
+   "cell_type": "markdown",
+   "metadata": {}
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!az group create -l {region} -n {resource_group}"
+   ]
+  },
+  {
+   "source": [
+    "## Create AKS Cluster and NodePools <a id=\"aks\"/>\n",
+    "Below, we create the AKS cluster with default 1 system node (to save time, in production use more nodes as per best practices) in the resource group we created earlier. This step can take 5 or more minutes.\n"
+   ],
+   "cell_type": "markdown",
+   "metadata": {}
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%time\n",
+    "!az aks create --resource-group {resource_group} \\\n",
+    "    --name {aks_name} \\\n",
+    "    --node-vm-size Standard_D8s_v3  \\\n",
+    "    --node-count 1 \\\n",
+    "    --location {region}  \\\n",
+    "    --kubernetes-version 1.18.17 \\\n",
+    "    --node-osdisk-type Ephemeral \\    \n",
+    "    --generate-ssh-keys"
+   ]
+  },
+  {
+   "source": [
+    "## Connect to AKS Cluster\n",
+    "To configure kubectl to connect to Kubernetes cluster, run the following command"
+   ],
+   "cell_type": "markdown",
+   "metadata": {}
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!az aks get-credentials --resource-group {resource_group} --name {aks_name}"
+   ]
+  },
+  {
+   "source": [
+    "Let's verify connection by listing the nodes."
+   ],
+   "cell_type": "markdown",
+   "metadata": {}
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 34,
+   "metadata": {},
+   "outputs": [
+    {
+     "output_type": "stream",
+     "name": "stdout",
+     "text": [
+      "NAME                                STATUS   ROLES   AGE     VERSION\naks-agentpool-28613018-vmss000000   Ready    agent   28d     v1.19.9\naks-agentpool-28613018-vmss000001   Ready    agent   28d     v1.19.9\naks-agentpool-28613018-vmss000002   Ready    agent   28d     v1.19.9\naks-cpunodes-28613018-vmss000000    Ready    agent   28d     v1.19.9\naks-cpunodes-28613018-vmss000001    Ready    agent   28d     v1.19.9\naks-gpunodes-28613018-vmss000001    Ready    agent   5h27m   v1.19.9\n"
+     ]
+    }
+   ],
+   "source": [
+    "!kubectl get nodes"
+   ]
+  },
+  {
+   "source": [
+    "Taint System node with `CriticalAddonsOnly` taint so it is available only for system workloads"
+   ],
+   "cell_type": "markdown",
+   "metadata": {}
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!kubectl taint nodes -l kubernetes.azure.com/mode=system CriticalAddonsOnly=true:NoSchedule --overwrite\n"
+   ]
+  },
+  {
+   "source": [
+    "## Create GPU enabled and CPU Node Pools\n",
+    "To create GPU enabled nodepool, will use fully configured AKS image that contains the NVIDIA device plugin for Kubenetes, see [Use the AKS specialized GPU image (preview)](https://docs.microsoft.com/en-us/azure/aks/gpu-cluster#use-the-aks-specialized-gpu-image-preview). Creating nodepools could take five or more minutes."
+   ],
+   "cell_type": "markdown",
+   "metadata": {}
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%time\n",
+    "!az feature register --name GPUDedicatedVHDPreview --namespace Microsoft.ContainerService\n",
+    "!az feature list -o table --query \"[?contains(name, 'Microsoft.ContainerService/GPUDedicatedVHDPreview')].{Name:name,State:properties.state}\"\n",
+    "!az provider register --namespace Microsoft.ContainerService\n",
+    "!az extension add --name aks-preview\n"
+   ]
+  },
+  {
+   "source": [
+    "## Create  GPU NodePool with GPU taint\n",
+    "For more information on Azure Nodepools https://docs.microsoft.com/en-us/azure/aks/use-multiple-node-pools "
+   ],
+   "cell_type": "markdown",
+   "metadata": {}
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "metadata": {},
+   "outputs": [
+    {
+     "output_type": "stream",
+     "name": "stdout",
+     "text": [
+      "{\n",
+      "\u001b[33mThe behavior of this command has been altered by the following extension: aks-preview\u001b[0m\n",
+      "\u001b[91mNode pool gpunodes already exists, please try a different name, use 'aks nodepool list' to get current list of node pool\u001b[0m\n",
+      "\u001b[0mCPU times: user 275 ms, sys: 79 ms, total: 354 ms\n",
+      "Wall time: 5.38 s\n"
+     ]
+    }
+   ],
+   "source": [
+    "%%time\n",
+    "print ({aks_gpu_sku})\n",
+    "!az aks nodepool add \\\n",
+    "    --resource-group {resource_group} \\\n",
+    "    --cluster-name {aks_name} \\\n",
+    "    --name {aks_gpupool} \\\n",
+    "    --node-taints nvidia.com=gpu:NoSchedule \\\n",
+    "    --node-count 1 \\\n",
+    "    --node-vm-size  {aks_gpu_sku} \\\n",
+    "    --aks-custom-headers UseGPUDedicatedVHD=true,usegen2vm=true"
+   ]
+  },
+  {
+   "source": [
+    "## Verify GPU is available on Kubernetes Node\n",
+    "Now use the kubectl describe node command to confirm that the GPUs are schedulable. Under the Capacity section, for Standard_NC12 sku the GPU should list as `nvidia.com/gpu: 2`"
+   ],
+   "cell_type": "markdown",
+   "metadata": {}
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 33,
+   "metadata": {},
+   "outputs": [
+    {
+     "output_type": "stream",
+     "name": "stdout",
+     "text": [
+      "Name:               aks-gpunodes-28613018-vmss000001\nRoles:              agent\nLabels:             accelerator=nvidia\n                    agentpool=gpunodes\n                    beta.kubernetes.io/arch=amd64\n                    beta.kubernetes.io/instance-type=Standard_NC12\n                    beta.kubernetes.io/os=linux\n                    failure-domain.beta.kubernetes.io/region=eastus2\n--\n  cpu:                            12\n  ephemeral-storage:              129900528Ki\n  hugepages-1Gi:                  0\n  hugepages-2Mi:                  0\n  memory:                         115387540Ki\n  nvidia.com/gpu:                 2\n  pods:                           30\nAllocatable:\n  attachable-volumes-azure-disk:  48\n  cpu:                            11780m\n  ephemeral-storage:              119716326407\n  hugepages-1Gi:                  0\n  hugepages-2Mi:                  0\n  memory:                         105854100Ki\n  nvidia.com/gpu:                 2\n  pods:                           30\nSystem Info:\n  Machine ID:                 db67bd967e1441febad873ba49d35adc\n  System UUID:                f39ce4bc-11c6-8643-8a8a-dfb4998a0524\n  Boot ID:                    eb926e42-d4e7-4760-b124-9b09c0e56c57\n--\n  memory                         275Mi (0%)  850Mi (0%)\n  ephemeral-storage              0 (0%)      0 (0%)\n  hugepages-1Gi                  0 (0%)      0 (0%)\n  hugepages-2Mi                  0 (0%)      0 (0%)\n  attachable-volumes-azure-disk  0           0\n  nvidia.com/gpu                 0           0\nEvents:                          <none>\n"
+     ]
+    }
+   ],
+   "source": [
+    "!kubectl describe node -l accelerator=nvidia | grep nvidia -A 5 -B 5"
+   ]
+  },
+  {
+   "source": [
+    "## Create CPU NodePool for running regular workloads"
+   ],
+   "cell_type": "markdown",
+   "metadata": {}
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [
+    {
+     "output_type": "stream",
+     "name": "stdout",
+     "text": [
+      "\u001b[33mThe behavior of this command has been altered by the following extension: aks-preview\u001b[0m\n",
+      "{\n",
+      "  \"agentPoolType\": \"VirtualMachineScaleSets\",\n",
+      "  \"availabilityZones\": null,\n",
+      "  \"count\": 3,\n",
+      "  \"enableAutoScaling\": true,\n",
+      "  \"enableEncryptionAtHost\": false,\n",
+      "  \"enableFips\": false,\n",
+      "  \"enableNodePublicIp\": false,\n",
+      "  \"gpuInstanceProfile\": null,\n",
+      "  \"id\": \"/subscriptions/xxxx-xxxx-xxxx-xxxx-xxxxxx/resourcegroups/seldon/providers/Microsoft.ContainerService/managedClusters/modeltests/agentPools/cpunodes\",\n",
+      "  \"kubeletConfig\": null,\n",
+      "  \"kubeletDiskType\": \"OS\",\n",
+      "  \"linuxOsConfig\": null,\n",
+      "  \"maxCount\": 3,\n",
+      "  \"maxPods\": 30,\n",
+      "  \"minCount\": 1,\n",
+      "  \"mode\": \"User\",\n",
+      "  \"name\": \"cpunodes\",\n",
+      "  \"nodeImageVersion\": \"AKSUbuntu-1804gen2containerd-2021.05.08\",\n",
+      "  \"nodeLabels\": null,\n",
+      "  \"nodePublicIpPrefixId\": null,\n",
+      "  \"nodeTaints\": null,\n",
+      "  \"orchestratorVersion\": \"1.19.9\",\n",
+      "  \"osDiskSizeGb\": 128,\n",
+      "  \"osDiskType\": \"Ephemeral\",\n",
+      "  \"osSku\": \"Ubuntu\",\n",
+      "  \"osType\": \"Linux\",\n",
+      "  \"podSubnetId\": null,\n",
+      "  \"powerState\": {\n",
+      "    \"code\": \"Running\"\n",
+      "  },\n",
+      "  \"provisioningState\": \"Succeeded\",\n",
+      "  \"proximityPlacementGroupId\": null,\n",
+      "  \"resourceGroup\": \"seldon\",\n",
+      "  \"scaleSetEvictionPolicy\": null,\n",
+      "  \"scaleSetPriority\": null,\n",
+      "  \"spotMaxPrice\": null,\n",
+      "  \"tags\": null,\n",
+      "  \"type\": \"Microsoft.ContainerService/managedClusters/agentPools\",\n",
+      "  \"upgradeSettings\": {\n",
+      "    \"maxSurge\": null\n",
+      "  },\n",
+      "  \"vmSize\": \"Standard_F8s_v2\",\n",
+      "  \"vnetSubnetId\": \"/subscriptions/xxxxx-xxxx-xxxx-xxxxx-xxxxxx/resourceGroups/seldon/providers/Microsoft.Network/virtualNetworks/seldon-vnet/subnets/default\"\n",
+      "}\n",
+      "\u001b[K\u001b[0mCPU times: user 4.17 s, sys: 1.51 s, total: 5.68 s\n",
+      "Wall time: 2min 36s\n"
+     ]
+    }
+   ],
+   "source": [
+    "%%time \n",
+    "!az aks nodepool add \\\n",
+    "  --resource-group {resource_group} \\\n",
+    "    --cluster-name {aks_name} \\\n",
+    "    --name {aks_cpupool} \\\n",
+    "    --enable-cluster-autoscaler \\\n",
+    "    --node-osdisk-type Ephemeral \\\n",
+    "    --min-count 1 \\\n",
+    "    --max-count 3 \\\n",
+    "    --node-vm-size {aks_cpu_sku}  \\\n",
+    "    --node-osdisk-size 128 "
+   ]
+  },
+  {
+   "source": [
+    "## Verify Taints on the Kubernetes nodes\n",
+    "Verify that system pool and have the Taints `CriticalAddonsOnly` and `sku=gpu` respectively   \n"
+   ],
+   "cell_type": "markdown",
+   "metadata": {}
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 35,
+   "metadata": {},
+   "outputs": [
+    {
+     "output_type": "stream",
+     "name": "stdout",
+     "text": [
+      "\u001b[1;39m[\n  \u001b[1;39m{\n    \u001b[0m\u001b[34;1m\"effect\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"NoSchedule\"\u001b[0m\u001b[1;39m,\n    \u001b[0m\u001b[34;1m\"key\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"CriticalAddonsOnly\"\u001b[0m\u001b[1;39m,\n    \u001b[0m\u001b[34;1m\"value\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"true\"\u001b[0m\u001b[1;39m\n  \u001b[1;39m}\u001b[0m\u001b[1;39m\n\u001b[1;39m]\u001b[0m\n\u001b[1;39m[\n  \u001b[1;39m{\n    \u001b[0m\u001b[34;1m\"effect\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"NoSchedule\"\u001b[0m\u001b[1;39m,\n    \u001b[0m\u001b[34;1m\"key\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"CriticalAddonsOnly\"\u001b[0m\u001b[1;39m,\n    \u001b[0m\u001b[34;1m\"value\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"true\"\u001b[0m\u001b[1;39m\n  \u001b[1;39m}\u001b[0m\u001b[1;39m\n\u001b[1;39m]\u001b[0m\n\u001b[1;39m[\n  \u001b[1;39m{\n    \u001b[0m\u001b[34;1m\"effect\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"NoSchedule\"\u001b[0m\u001b[1;39m,\n    \u001b[0m\u001b[34;1m\"key\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"CriticalAddonsOnly\"\u001b[0m\u001b[1;39m,\n    \u001b[0m\u001b[34;1m\"value\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"true\"\u001b[0m\u001b[1;39m\n  \u001b[1;39m}\u001b[0m\u001b[1;39m\n\u001b[1;39m]\u001b[0m\n\u001b[1;30mnull\u001b[0m\n\u001b[1;30mnull\u001b[0m\n\u001b[1;39m[\n  \u001b[1;39m{\n    \u001b[0m\u001b[34;1m\"effect\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"NoSchedule\"\u001b[0m\u001b[1;39m,\n    \u001b[0m\u001b[34;1m\"key\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"sku\"\u001b[0m\u001b[1;39m,\n    \u001b[0m\u001b[34;1m\"value\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"gpu\"\u001b[0m\u001b[1;39m\n  \u001b[1;39m}\u001b[0m\u001b[1;39m\n\u001b[1;39m]\u001b[0m\n"
+     ]
+    }
+   ],
+   "source": [
+    "!kubectl get nodes -o json | jq '.items[].spec.taints'"
+   ]
+  },
+  {
+   "source": [
+    "# Create Storage Account for training data <a id=\"storageaccount\"/>\n",
+    "In this section of the notebook, we'll create an Azure blob storage that we'll use throughout the tutorial. This object store will be used to store input images and save checkpoints. Use `az cli` to create the account"
+   ],
+   "cell_type": "markdown",
+   "metadata": {}
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 36,
+   "metadata": {},
+   "outputs": [
+    {
+     "output_type": "stream",
+     "name": "stdout",
+     "text": [
+      "\"Succeeded\"\n",
+      "\u001b[K\u001b[0mCPU times: user 674 ms, sys: 214 ms, total: 888 ms\n",
+      "Wall time: 22 s\n"
+     ]
+    }
+   ],
+   "source": [
+    "%%time\n",
+    "!az storage account create -n {storage_account_name} -g {resource_group} --query 'provisioningState'\n"
+   ]
+  },
+  {
+   "source": [
+    "Grab the keys of the storage account that was just created.We would need them for binding Kubernetes Persistent Volume. The --quote '[0].value' part of the command simply means to select the value of the zero-th indexed of the set of keys."
+   ],
+   "cell_type": "markdown",
+   "metadata": {}
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "key = !az storage account keys list --account-name {storage_account_name} -g {resource_group} --query '[0].value' -o tsv"
+   ]
+  },
+  {
+   "source": [
+    "\n",
+    "The stdout from the command above is stored in a string array of 1. Select the element in the array."
+   ],
+   "cell_type": "markdown",
+   "metadata": {}
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "storage_account_key = key[0] "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 55,
+   "metadata": {},
+   "outputs": [
+    {
+     "output_type": "stream",
+     "name": "stdout",
+     "text": [
+      "{\n",
+      "  \"created\": true\n",
+      "}\n",
+      "\u001b[0m"
+     ]
+    }
+   ],
+   "source": [
+    "# create storage container\n",
+    "\n",
+    "!az storage container create \\\n",
+    "    --account-name {storage_account_name} \\\n",
+    "    --account-key {storage_account_key} \\\n",
+    "    --name {storage_container_name}"
+   ]
+  },
+  {
+   "source": [
+    "# Install Kubernetes Blob CSI Driver <a id=\"csidriver\"/>\n",
+    "[Azure Blob Storage CSI driver for Kubernetes](https://github.com/kubernetes-sigs/blob-csi-driver) allows Kubernetes to access Azure Storage. We will deploy it using Helm3 package manager as described in the docs https://github.com/kubernetes-sigs/blob-csi-driver/tree/master/charts"
+   ],
+   "cell_type": "markdown",
+   "metadata": {}
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!az aks get-credentials --resource-group {resource_group} --name {aks_name}"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 57,
+   "metadata": {},
+   "outputs": [
+    {
+     "output_type": "stream",
+     "name": "stdout",
+     "text": [
+      "\"blob-csi-driver\" already exists with the same configuration, skipping\n",
+      "W0527 23:11:20.183604   13719 warnings.go:70] storage.k8s.io/v1beta1 CSIDriver is deprecated in v1.19+, unavailable in v1.22+; use storage.k8s.io/v1 CSIDriver\n",
+      "W0527 23:11:20.506450   13719 warnings.go:70] storage.k8s.io/v1beta1 CSIDriver is deprecated in v1.19+, unavailable in v1.22+; use storage.k8s.io/v1 CSIDriver\n",
+      "NAME: blob-csi-driver\n",
+      "LAST DEPLOYED: Thu May 27 23:11:19 2021\n",
+      "NAMESPACE: kube-system\n",
+      "STATUS: deployed\n",
+      "REVISION: 1\n",
+      "TEST SUITE: None\n",
+      "NOTES:\n",
+      "The Azure Blob Storage CSI driver is getting deployed to your cluster.\n",
+      "\n",
+      "To check Azure Blob Storage CSI driver pods status, please run:\n",
+      "\n",
+      "  kubectl --namespace=kube-system get pods --selector=\"release=blob-csi-driver\" --watch\n"
+     ]
+    }
+   ],
+   "source": [
+    "!helm repo add blob-csi-driver https://raw.githubusercontent.com/kubernetes-sigs/blob-csi-driver/master/charts\n",
+    "!helm install blob-csi-driver blob-csi-driver/blob-csi-driver --namespace kube-system --version v1.1.0\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 59,
+   "metadata": {},
+   "outputs": [
+    {
+     "output_type": "stream",
+     "name": "stdout",
+     "text": [
+      "NAME                                   READY   STATUS    RESTARTS   AGE\ncsi-blob-controller-7b9db4967c-fbsm2   4/4     Running   0          22s\ncsi-blob-controller-7b9db4967c-hdglw   4/4     Running   0          22s\ncsi-blob-node-7tgl8                    3/3     Running   0          22s\ncsi-blob-node-89rkn                    3/3     Running   0          22s\ncsi-blob-node-nnhfh                    3/3     Running   0          22s\ncsi-blob-node-pb584                    3/3     Running   0          22s\ncsi-blob-node-q6z6t                    3/3     Running   0          22s\ncsi-blob-node-tq4mh                    3/3     Running   0          22s\n"
+     ]
+    }
+   ],
+   "source": [
+    "!kubectl -n kube-system get pods -l \"app.kubernetes.io/instance=blob-csi-driver\""
+   ]
+  },
+  {
+   "source": [
+    "## Create Persistent Volume for Azure Blob <a id=\"pv\"/>\n",
+    "For more details on creating   `PersistentVolume` using CSI driver refer to https://github.com/kubernetes-sigs/blob-csi-driver/blob/master/deploy/example/e2e_usage.md"
+   ],
+   "cell_type": "markdown",
+   "metadata": {}
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [
+    {
+     "output_type": "stream",
+     "name": "stdout",
+     "text": [
+      "secret/azure-blobsecret created\n"
+     ]
+    }
+   ],
+   "source": [
+    "\n",
+    "# Create secret to access storage account\n",
+    "!kubectl create secret generic azure-blobsecret --from-literal azurestorageaccountname={storage_account_name} --from-literal azurestorageaccountkey=\"{storage_account_key}\" --type=Opaque "
+   ]
+  },
+  {
+   "source": [
+    "Persistent Volume YAML definition is in `azure-blobfules-pv.yaml` with fields pointing to secret created above and containername we created in storage account:\n",
+    "```yaml\n",
+    "  csi:\n",
+    "    driver: blob.csi.azure.com\n",
+    "    readOnly: false\n",
+    "    volumeHandle: trainingdata  # make sure this volumeid is unique in the cluster\n",
+    "    volumeAttributes:\n",
+    "      containerName: workerdata # !! Modify if changed in Notebook\n",
+    "    nodeStageSecretRef:\n",
+    "      name: azure-blobsecret\n",
+    "     \n",
+    "```"
+   ],
+   "cell_type": "markdown",
+   "metadata": {}
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 16,
+   "metadata": {},
+   "outputs": [
+    {
+     "output_type": "stream",
+     "name": "stdout",
+     "text": [
+      "Overwriting azure-blobfuse-pv.yaml\n"
+     ]
+    }
+   ],
+   "source": [
+    "%%writefile azure-blobfuse-pv.yaml\n",
+    "apiVersion: v1\n",
+    "kind: PersistentVolume\n",
+    "metadata:\n",
+    "  name: pv-gptblob\n",
+    "  \n",
+    "spec:\n",
+    "  capacity:\n",
+    "    storage: 10Gi\n",
+    "  accessModes:\n",
+    "    - ReadWriteMany\n",
+    "  persistentVolumeReclaimPolicy: Retain  # \"Delete\" is not supported in static provisioning\n",
+    "  csi:\n",
+    "    driver: blob.csi.azure.com\n",
+    "    readOnly: false\n",
+    "    volumeHandle: trainingdata  # make sure this volumeid is unique in the cluster\n",
+    "    volumeAttributes:\n",
+    "      containerName: gpt2onnx # Modify if changed in Notebook\n",
+    "    nodeStageSecretRef:\n",
+    "      name: azure-blobsecret\n",
+    "      namespace: default\n",
+    "  mountOptions:\n",
+    "    - -o uid=8888     # user in  Pod security context\n",
+    "    - -o allow_other    \n",
+    "    \n",
+    "---\n",
+    "kind: PersistentVolumeClaim\n",
+    "apiVersion: v1\n",
+    "metadata:\n",
+    "  name: pvc-gptblob\n",
+    " \n",
+    "spec:\n",
+    "  accessModes:\n",
+    "    - ReadWriteMany\n",
+    "  resources:\n",
+    "    requests:\n",
+    "      storage: 10Gi\n",
+    "  volumeName: pv-gptblob\n",
+    "  storageClassName: \"\"\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 17,
+   "metadata": {},
+   "outputs": [
+    {
+     "output_type": "stream",
+     "name": "stdout",
+     "text": [
+      "persistentvolume/pv-gptblob created\n",
+      "persistentvolumeclaim/pvc-gptblob created\n"
+     ]
+    }
+   ],
+   "source": [
+    "\n",
+    "# Create PersistentVolume and PersistenVollumeClaim for container mounts\n",
+    "!kubectl apply -f  azure-blobfuse-pv.yaml"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 19,
+   "metadata": {},
+   "outputs": [
+    {
+     "output_type": "stream",
+     "name": "stdout",
+     "text": [
+      "NAME                          CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS        CLAIM                 STORAGECLASS   REASON   AGE\npersistentvolume/pv-blob      10Gi       RWX            Retain           Terminating   default/pvc-blob                              113m\npersistentvolume/pv-gptblob   10Gi       RWX            Retain           Bound         default/pvc-gptblob                           18s\n\nNAME                                STATUS        VOLUME       CAPACITY   ACCESS MODES   STORAGECLASS   AGE\npersistentvolumeclaim/pvc-blob      Terminating   pv-blob      10Gi       RWX                           113m\npersistentvolumeclaim/pvc-gptblob   Bound         pv-gptblob   10Gi       RWX                           17s\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Verify PVC is bound\n",
+    "!kubectl get pv,pvc "
+   ]
+  },
+  {
+   "source": [
+    "In the end of this step you will have AKS cluster and Storage account in resource group. ALK cluster will have cpu and gpu nodepools in addition to system nodepool.\n"
+   ],
+   "cell_type": "markdown",
+   "metadata": {}
+  }
+ ]
+}
\ No newline at end of file
diff --git a/examples/triton_gpt2/GPT2-ONNX-Azure.ipynb b/examples/triton_gpt2/GPT2-ONNX-Azure.ipynb
new file mode 100644
index 0000000000..e94ae07d9e
--- /dev/null
+++ b/examples/triton_gpt2/GPT2-ONNX-Azure.ipynb
@@ -0,0 +1,715 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "liked-toronto",
+   "metadata": {},
+   "source": [
+    "# Pretrained  GPT2  Model Deployment Example\n",
+    "\n",
+    "In this notebook, we will run an example of text generation using GPT2 model exported from HuggingFace and deployed with Seldon's Triton pre-packed server. the example also covers converting the model to ONNX format.\n",
+    "The implemented example below is of the Greedy approach for the next token prediction.\n",
+    "more info: https://huggingface.co/transformers/model_doc/gpt2.html?highlight=gpt2\n",
+    "\n",
+    "After we have the module deployed to Kubernetes, we will run a simple load test to evaluate the module inference performance.\n",
+    "\n",
+    "\n",
+    "## Steps:\n",
+    "1. Download pretrained GPT2 model from hugging face\n",
+    "2. Convert the model to ONNX\n",
+    "3. Store it in MinIo bucket\n",
+    "4. Setup Seldon-Core in your kubernetes cluster\n",
+    "5. Deploy the ONNX model with Seldon’s prepackaged Triton server.\n",
+    "6. Interact with the model, run a greedy alg example (generate sentence completion)\n",
+    "7. Run load test using vegeta\n",
+    "8. Clean-up\n",
+    "\n",
+    "## Basic requirements\n",
+    "* Helm v3.0.0+\n",
+    "* A Kubernetes cluster running v1.13 or above (minkube / docker-for-windows work well if enough RAM)\n",
+    "* kubectl v1.14+\n",
+    "* Python 3.6+ "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "korean-reporter",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%writefile requirements.txt\n",
+    "transformers==4.5.1\n",
+    "torch==1.8.1\n",
+    "tokenizers<0.11,>=0.10.1\n",
+    "tensorflow==2.4.1\n",
+    "tf2onnx"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "assigned-diesel",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "!pip install --trusted-host=pypi.python.org --trusted-host=pypi.org --trusted-host=files.pythonhosted.org -r requirements.txt\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "completed-evaluation",
+   "metadata": {},
+   "source": [
+    "### Export HuggingFace TFGPT2LMHeadModel pre-trained model and save it locally"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "iraqi-million",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from transformers import TFGPT2LMHeadModel, GPT2Tokenizer\n",
+    "tokenizer = GPT2Tokenizer.from_pretrained(\"gpt2\")\n",
+    "model = TFGPT2LMHeadModel.from_pretrained(\"gpt2\", from_pt=True, pad_token_id=tokenizer.eos_token_id)\n",
+    "model.save_pretrained(\"./tfgpt2model\", saved_model=True)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "further-tribute",
+   "metadata": {},
+   "source": [
+    "### Convert the TensorFlow saved model to ONNX"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "irish-mountain",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!python -m tf2onnx.convert --saved-model ./tfgpt2model/saved_model/1 --opset 13  --output model.onnx"
+   ]
+  },
+  {
+   "source": [
+    "## Azure Setup\n",
+    "We  have provided [Azure Setup Notebook](AzureSetup.ipynb) that deploys AKS cluster, Azure storage account and installs Azure Blob CSI driver. If AKS cluster already exists skip to creation of Blob Storage and CSI driver installtion steps."
+   ],
+   "cell_type": "markdown",
+   "metadata": {}
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "resource_group = \"seldon\"   # feel free to replace or use this default\n",
+    "aks_name = \"modeltests\"    \n",
+    "\n",
+    "storage_account_name = \"modeltestsgpt\"        # fill in\n",
+    "storage_container_name = \"gpt2onnx\"             "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "sunset-pantyhose",
+   "metadata": {},
+   "source": [
+    "### Copy your model to Azure Blob\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "lasting-performance",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%time\n",
+    "# Copy model file\n",
+    "!az extension add --name storage-preview\n",
+    "!az storage azcopy blob upload --container {storage_container_name} \\\n",
+    "                               --account-name {storage_account_name} \\\n",
+    "                               --source  ./model.onnx \\\n",
+    "                               --destination gpt2/1/model.onnx  "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [
+    {
+     "output_type": "stream",
+     "name": "stdout",
+     "text": [
+      "\u001b[33mThis command has been deprecated and will be removed in future release. Use 'az storage fs file list' instead. For more information go to https://github.com/Azure/azure-cli/blob/dev/src/azure-cli/azure/cli/command_modules/storage/docs/ADLS%20Gen2.md\u001b[39m\n",
+      "\u001b[33mThe behavior of this command has been altered by the following extension: storage-preview\u001b[0m\n",
+      "Name               IsDirectory    Blob Type    Blob Tier    Length     Content Type              Last Modified              Snapshot\n",
+      "-----------------  -------------  -----------  -----------  ---------  ------------------------  -------------------------  ----------\n",
+      "gpt2/1/model.onnx                 BlockBlob    Hot          652535462  application/octet-stream  2021-05-28T04:37:11+00:00\n",
+      "\u001b[0m"
+     ]
+    }
+   ],
+   "source": [
+    "#Verify Uploaded file\n",
+    "!az storage blob list \\\n",
+    "    --account-name {storage_account_name}\\\n",
+    "    --container-name {storage_container_name} \\\n",
+    "    --output table \n",
+    "    "
+   ]
+  },
+  {
+   "source": [
+    "##  Add Azure PersistentVolume and Claim <a id=\"pvc\">\n",
+    "For more details on creating PersistentVolume using CSI driver refer to https://github.com/kubernetes-sigs/blob-csi-driver/blob/master/deploy/example/e2e_usage.md\n",
+    " - Create secret\n",
+    " - Create PersistentVolume pointing to secret and Blob Container Name and `mountOptions` specifying user id for non-root containers \n",
+    " - Creare PersistentVolumeClaim to bind to volume"
+   ],
+   "cell_type": "markdown",
+   "metadata": {}
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "key = !az storage account keys list --account-name {storage_account_name} -g {resource_group} --query '[0].value' -o tsv\n",
+    "storage_account_key = key[0] "
+   ]
+  },
+  {
+   "source": [],
+   "cell_type": "markdown",
+   "metadata": {}
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Create secret to access storage account\n",
+    "!kubectl create secret generic azure-blobsecret --from-literal azurestorageaccountname={storage_account_name} --from-literal azurestorageaccountkey=\"{storage_account_key}\" --type=Opaque "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%writefile azure-blobfuse-pv.yaml\n",
+    "apiVersion: v1\n",
+    "kind: PersistentVolume\n",
+    "metadata:\n",
+    "  name: pv-gpt2blob\n",
+    "  \n",
+    "spec:\n",
+    "  capacity:\n",
+    "    storage: 10Gi\n",
+    "  accessModes:\n",
+    "    - ReadWriteMany\n",
+    "  persistentVolumeReclaimPolicy: Retain  # \"Delete\" is not supported in static provisioning\n",
+    "  csi:\n",
+    "    driver: blob.csi.azure.com\n",
+    "    readOnly: false\n",
+    "    volumeHandle: trainingdata  # make sure this volumeid is unique in the cluster\n",
+    "    volumeAttributes:\n",
+    "      containerName: gpt2onnx # Modify if changed in Notebook\n",
+    "    nodeStageSecretRef:\n",
+    "      name: azure-blobsecret\n",
+    "      namespace: default\n",
+    "  mountOptions:         # Use same user id that is used by POD security context\n",
+    "    - -o uid=8888  \n",
+    "    - -o allow_other\n",
+    "---\n",
+    "kind: PersistentVolumeClaim\n",
+    "apiVersion: v1\n",
+    "metadata:\n",
+    "  name: pvc-gpt2blob\n",
+    " \n",
+    "spec:\n",
+    "  accessModes:\n",
+    "    - ReadWriteMany\n",
+    "  resources:\n",
+    "    requests:\n",
+    "      storage: 10Gi\n",
+    "  volumeName: pv-gpt2blob\n",
+    "  storageClassName: \"\"\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [
+    {
+     "output_type": "stream",
+     "name": "stdout",
+     "text": [
+      "persistentvolume/pv-gptblob configured\n",
+      "persistentvolumeclaim/pvc-gptblob unchanged\n"
+     ]
+    }
+   ],
+   "source": [
+    "!kubectl apply -f  azure-blobfuse-pv.yaml"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [
+    {
+     "output_type": "stream",
+     "name": "stdout",
+     "text": [
+      "NAME                           CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                  STORAGECLASS   REASON   AGE\npersistentvolume/pv-gpt2blob   10Gi       RWX            Retain           Bound    default/pvc-gpt2blob                           4h54m\n\nNAME                                 STATUS   VOLUME        CAPACITY   ACCESS MODES   STORAGECLASS   AGE\npersistentvolumeclaim/pvc-gpt2blob   Bound    pv-gpt2blob   10Gi       RWX                           4h54m\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Verify PVC is bound\n",
+    "!kubectl get pv,pvc "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "convinced-syracuse",
+   "metadata": {},
+   "source": [
+    "### Run Seldon in your kubernetes cluster\n",
+    "\n",
+    "Follow the [Seldon-Core Setup notebook](https://docs.seldon.io/projects/seldon-core/en/latest/examples/seldon_core_setup.html) to Setup a cluster with Istio Ingress and install Seldon Core"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "backed-outreach",
+   "metadata": {},
+   "source": [
+    "### Deploy your model with Seldon pre-packaged Triton server"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "beneficial-anime",
+   "metadata": {},
+   "outputs": [
+    {
+     "output_type": "stream",
+     "name": "stdout",
+     "text": [
+      "Overwriting gpt2-deploy.yaml\n"
+     ]
+    }
+   ],
+   "source": [
+    "%%writefile gpt2-deploy.yaml\n",
+    "apiVersion: machinelearning.seldon.io/v1alpha2\n",
+    "kind: SeldonDeployment\n",
+    "metadata:\n",
+    "  name: gpt2\n",
+    "spec:\n",
+    "  predictors:\n",
+    "  - componentSpecs:\n",
+    "    - spec:\n",
+    "        containers:\n",
+    "        - name: gpt2\n",
+    "          resources:\n",
+    "            requests:\n",
+    "              memory: 750Mi\n",
+    "              cpu: 2\n",
+    "              #nvidia.com/gpu: 1    \n",
+    "            limits:\n",
+    "              memory: 2Gi\n",
+    "              cpu: 2\n",
+    "              #nvidia.com/gpu: 1    \n",
+    "    graph:\n",
+    "      implementation: TRITON_SERVER\n",
+    "      logger:\n",
+    "        mode: all\n",
+    "      modelUri: pvc://pvc-gpt2blob/\n",
+    "      name: gpt2\n",
+    "      type: MODEL      \n",
+    "    name: default\n",
+    "    replicas: 1\n",
+    "  protocol: kfserving"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "subjective-involvement",
+   "metadata": {},
+   "outputs": [
+    {
+     "output_type": "stream",
+     "name": "stdout",
+     "text": [
+      "seldondeployment.machinelearning.seldon.io/gpt2 created\n"
+     ]
+    }
+   ],
+   "source": [
+    "\n",
+    "!kubectl apply -f gpt2-deploy.yaml -n default"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "id": "demanding-thesaurus",
+   "metadata": {},
+   "outputs": [
+    {
+     "output_type": "stream",
+     "name": "stdout",
+     "text": [
+      "Waiting for deployment \"gpt2-default-0-gpt2\" rollout to finish: 0 of 1 updated replicas are available...\n",
+      "error: deployment \"gpt2-default-0-gpt2\" exceeded its progress deadline\n"
+     ]
+    }
+   ],
+   "source": [
+    "!kubectl rollout status deploy/$(kubectl get deploy -l seldon-deployment-id=gpt2 -o jsonpath='{.items[0].metadata.name}')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "digital-supervisor",
+   "metadata": {},
+   "source": [
+    "#### Interact with the model: get model metadata (a \"test\" request to make sure our model is available and loaded correctly)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "married-roller",
+   "metadata": {},
+   "outputs": [
+    {
+     "output_type": "stream",
+     "name": "stdout",
+     "text": [
+      "*   Trying 20.75.117.145:80...\n",
+      "* TCP_NODELAY set\n",
+      "* Connected to 20.75.117.145 (20.75.117.145) port 80 (#0)\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "* Mark bundle as not supporting multiuse\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "\n",
+      "* Connection #0 to host 20.75.117.145 left intact\n",
+      "{\"name\":\"gpt2\",\"versions\":[\"1\"],\"platform\":\"onnxruntime_onnx\",\"inputs\":[{\"name\":\"input_ids:0\",\"datatype\":\"INT32\",\"shape\":[-1,-1]},{\"name\":\"attention_mask:0\",\"datatype\":\"INT32\",\"shape\":[-1,-1]}],\"outputs\":[{\"name\":\"past_key_values\",\"datatype\":\"FP32\",\"shape\":[12,2,-1,12,-1,64]},{\"name\":\"logits\",\"datatype\":\"FP32\",\"shape\":[-1,-1,50257]}]}"
+     ]
+    }
+   ],
+   "source": [
+    "ingress_ip=!(kubectl get svc --namespace istio-system istio-ingressgateway -o jsonpath='{.status.loadBalancer.ingress[0].ip}')\n",
+    "ingress_ip = ingress_ip[0]\n",
+    "\n",
+    "#!curl -v http://{ingress_ip}:80/seldon/default/gpt2/v2/models/gpt2\n",
+    "\n",
+    "!curl -v http://{ingress_ip}:80/seldon/default/gpt2gpu/v2/models/gpt2"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "anonymous-resource",
+   "metadata": {},
+   "source": [
+    "### Run prediction test: generate a sentence completion using GPT2 model  - Greedy approach\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "modified-termination",
+   "metadata": {},
+   "outputs": [
+    {
+     "output_type": "stream",
+     "name": "stdout",
+     "text": [
+      "sending request to http://20.75.117.145/seldon/default/gpt2gpu/v2/models/gpt2/infer\n",
+      "Sentence: I love Artificial Intelligence .\n",
+      "sending request to http://20.75.117.145/seldon/default/gpt2gpu/v2/models/gpt2/infer\n",
+      "Sentence: I love Artificial Intelligence . I\n",
+      "sending request to http://20.75.117.145/seldon/default/gpt2gpu/v2/models/gpt2/infer\n",
+      "Sentence: I love Artificial Intelligence . I love\n",
+      "sending request to http://20.75.117.145/seldon/default/gpt2gpu/v2/models/gpt2/infer\n",
+      "Sentence: I love Artificial Intelligence . I love the\n",
+      "sending request to http://20.75.117.145/seldon/default/gpt2gpu/v2/models/gpt2/infer\n",
+      "Sentence: I love Artificial Intelligence . I love the way\n",
+      "sending request to http://20.75.117.145/seldon/default/gpt2gpu/v2/models/gpt2/infer\n",
+      "Sentence: I love Artificial Intelligence . I love the way it\n",
+      "sending request to http://20.75.117.145/seldon/default/gpt2gpu/v2/models/gpt2/infer\n",
+      "Sentence: I love Artificial Intelligence . I love the way it 's\n",
+      "sending request to http://20.75.117.145/seldon/default/gpt2gpu/v2/models/gpt2/infer\n",
+      "Sentence: I love Artificial Intelligence . I love the way it 's designed\n",
+      "Input: I love Artificial Intelligence\n",
+      "Output: I love Artificial Intelligence . I love the way it 's designed\n"
+     ]
+    }
+   ],
+   "source": [
+    "import requests\n",
+    "import http\n",
+    "import json\n",
+    "import numpy as np\n",
+    "from transformers import GPT2Tokenizer\n",
+    "\n",
+    "tokenizer = GPT2Tokenizer.from_pretrained(\"gpt2\")\n",
+    "input_text = 'I love Artificial Intelligence'\n",
+    "count = 0\n",
+    "max_gen_len = 8\n",
+    "gen_sentence = input_text\n",
+    "while count < max_gen_len:\n",
+    "    input_ids = tokenizer.encode(gen_sentence, return_tensors='tf')\n",
+    "    shape = input_ids.shape.as_list()\n",
+    "    payload = {\n",
+    "            \"inputs\": [\n",
+    "                {\"name\": \"input_ids:0\",\n",
+    "                 \"datatype\": \"INT32\",\n",
+    "                 \"shape\": shape,\n",
+    "                 \"data\": input_ids.numpy().tolist()\n",
+    "                 },\n",
+    "                {\"name\": \"attention_mask:0\",\n",
+    "                 \"datatype\": \"INT32\",\n",
+    "                 \"shape\": shape,\n",
+    "                 \"data\": np.ones(shape, dtype=np.int32).tolist()\n",
+    "                 }\n",
+    "                ]\n",
+    "            }\n",
+    "\n",
+    "    tfserving_url = \"http://\" + str(ingress_ip) + \"/seldon/default/gpt2gpu/v2/models/gpt2/infer\"\n",
+    "    print(f'sending request to {tfserving_url}')\n",
+    "    \n",
+    "    with requests.post(tfserving_url, json=payload) as ret: \n",
+    "        try:\n",
+    "            res = ret.json()\n",
+    "        except:\n",
+    "            continue\n",
+    "\n",
+    "    # extract logits\n",
+    "    logits = np.array(res[\"outputs\"][1][\"data\"])\n",
+    "    logits = logits.reshape(res[\"outputs\"][1][\"shape\"])\n",
+    "\n",
+    "    # take the best next token probability of the last token of input ( greedy approach)\n",
+    "    next_token = logits.argmax(axis=2)[0]\n",
+    "    next_token_str = tokenizer.decode(next_token[-1:], skip_special_tokens=True,\n",
+    "                                      clean_up_tokenization_spaces=True).strip()\n",
+    "    gen_sentence += ' ' + next_token_str\n",
+    "    print (f'Sentence: {gen_sentence}')\n",
+    "\n",
+    "    count += 1\n",
+    "\n",
+    "print(f'Input: {input_text}\\nOutput: {gen_sentence}')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "colored-status",
+   "metadata": {},
+   "source": [
+    "### Run Load Test / Performance Test using vegeta"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "exempt-discovery",
+   "metadata": {},
+   "source": [
+    "#### Install vegeta, for more details take a look in [vegeta](https://github.com/tsenart/vegeta#install) official documentation"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 20,
+   "id": "interesting-laptop",
+   "metadata": {},
+   "outputs": [
+    {
+     "output_type": "stream",
+     "name": "stdout",
+     "text": [
+      "--2021-05-28 18:40:27--  https://github.com/tsenart/vegeta/releases/download/v12.8.3/vegeta-12.8.3-linux-arm64.tar.gz\n",
+      "Resolving github.com (github.com)... 140.82.114.4\n",
+      "Connecting to github.com (github.com)|140.82.114.4|:443... connected.\n",
+      "HTTP request sent, awaiting response... 302 Found\n",
+      "Location: https://github-releases.githubusercontent.com/12080551/ba68d580-6e90-11ea-8bd2-3f43f5c08b3c?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20210528%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20210528T224014Z&X-Amz-Expires=300&X-Amz-Signature=2efad77c33f1663eea17d366986bfad1cd081128d45012c9b6e6659c4c80eff6&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=12080551&response-content-disposition=attachment%3B%20filename%3Dvegeta-12.8.3-linux-arm64.tar.gz&response-content-type=application%2Foctet-stream [following]\n",
+      "--2021-05-28 18:40:27--  https://github-releases.githubusercontent.com/12080551/ba68d580-6e90-11ea-8bd2-3f43f5c08b3c?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20210528%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20210528T224014Z&X-Amz-Expires=300&X-Amz-Signature=2efad77c33f1663eea17d366986bfad1cd081128d45012c9b6e6659c4c80eff6&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=12080551&response-content-disposition=attachment%3B%20filename%3Dvegeta-12.8.3-linux-arm64.tar.gz&response-content-type=application%2Foctet-stream\n",
+      "Resolving github-releases.githubusercontent.com (github-releases.githubusercontent.com)... 185.199.108.154, 185.199.109.154, 185.199.110.154, ...\n",
+      "Connecting to github-releases.githubusercontent.com (github-releases.githubusercontent.com)|185.199.108.154|:443... connected.\n",
+      "HTTP request sent, awaiting response... 200 OK\n",
+      "Length: 3281900 (3.1M) [application/octet-stream]\n",
+      "Saving to: ‘vegeta-12.8.3-linux-arm64.tar.gz.2’\n",
+      "\n",
+      "vegeta-12.8.3-linux 100%[===================>]   3.13M  2.95MB/s    in 1.1s    \n",
+      "\n",
+      "2021-05-28 18:40:28 (2.95 MB/s) - ‘vegeta-12.8.3-linux-arm64.tar.gz.2’ saved [3281900/3281900]\n",
+      "\n",
+      "CHANGELOG\n",
+      "LICENSE\n",
+      "README.md\n",
+      "vegeta\n"
+     ]
+    }
+   ],
+   "source": [
+    "!wget https://github.com/tsenart/vegeta/releases/download/v12.8.3/vegeta-12.8.3-linux-arm64.tar.gz\n",
+    "!tar -zxvf vegeta-12.8.3-linux-arm64.tar.gz\n",
+    "!chmod +x vegeta"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "friendly-lying",
+   "metadata": {},
+   "source": [
+    "#### Generate vegeta [target file](https://github.com/tsenart/vegeta#-targets) contains \"post\" cmd with payload in the requiered structure"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "reliable-croatia",
+   "metadata": {},
+   "outputs": [
+    {
+     "output_type": "stream",
+     "name": "stdout",
+     "text": [
+      "preparing request to http://20.75.117.145/seldon/default/gpt2gpu/v2/models/gpt2/infer\n"
+     ]
+    }
+   ],
+   "source": [
+    "from subprocess import run, Popen, PIPE\n",
+    "import json\n",
+    "import numpy as np\n",
+    "from transformers import TFGPT2LMHeadModel, GPT2Tokenizer\n",
+    "import base64\n",
+    "\n",
+    "tokenizer = GPT2Tokenizer.from_pretrained(\"gpt2\")\n",
+    "input_text = 'I enjoy working in Seldon'\n",
+    "input_ids = tokenizer.encode(input_text, return_tensors='tf')\n",
+    "shape = input_ids.shape.as_list()\n",
+    "payload = {\n",
+    "\t\t\"inputs\": [\n",
+    "\t\t\t{\"name\": \"input_ids:0\",\n",
+    "\t\t\t \"datatype\": \"INT32\",\n",
+    "\t\t\t \"shape\": shape,\n",
+    "\t\t\t \"data\": input_ids.numpy().tolist()\n",
+    "\t\t\t },\n",
+    "\t\t\t{\"name\": \"attention_mask:0\",\n",
+    "\t\t\t \"datatype\": \"INT32\",\n",
+    "\t\t\t \"shape\": shape,\n",
+    "\t\t\t \"data\": np.ones(shape, dtype=np.int32).tolist()\n",
+    "\t\t\t }\n",
+    "\t\t\t]\n",
+    "\t\t}\n",
+    "tfserving_url = \"http://\" + str(ingress_ip) + \"/seldon/default/gpt2gpu/v2/models/gpt2/infer\"\n",
+    "print(f'preparing request to {tfserving_url}')\n",
+    "\n",
+    "cmd= {\"method\": \"POST\",\n",
+    "\t\t\"header\": {\"Content-Type\": [\"application/json\"] },\n",
+    "\t\t\"url\": tfserving_url,\n",
+    "\t\t\"body\": base64.b64encode(bytes(json.dumps(payload), \"utf-8\")).decode(\"utf-8\")}\n",
+    "\n",
+    "with open(\"vegeta_target.json\", mode=\"w\") as file:\n",
+    "\tjson.dump(cmd, file)\n",
+    "\tfile.write('\\n\\n')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "tribal-statistics",
+   "metadata": {},
+   "outputs": [
+    {
+     "output_type": "stream",
+     "name": "stdout",
+     "text": [
+      "Requests      [total, rate, throughput]         60, 1.02, 0.95\nDuration      [total, attack, wait]             1m3s, 58.994s, 4.445s\nLatencies     [min, mean, 50, 90, 95, 99, max]  1.45s, 4.003s, 3.983s, 5.249s, 6.329s, 7.876s, 7.97s\nBytes In      [total, mean]                     475803960, 7930066.00\nBytes Out     [total, mean]                     13140, 219.00\nSuccess       [ratio]                           100.00%\nStatus Codes  [code:count]                      200:60  \nError Set:\n"
+     ]
+    }
+   ],
+   "source": [
+    "!./vegeta attack -targets=vegeta_target.json -rate=1 -duration=60s -format=json | ./vegeta report -type=text"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "patient-suite",
+   "metadata": {},
+   "source": [
+    "### Clean-up"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "pacific-collectible",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!kubectl delete -f gpt2-deploy.yaml -n default"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "name": "python3",
+   "display_name": "Python 3.8.5 64-bit"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.5"
+  },
+  "interpreter": {
+   "hash": "e7370f93d1d0cde622a1f8e1c04877d8463912d04d973331ad4851f04de6915a"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
\ No newline at end of file