SeldonIO · axsaucedo · Jul 9, 2021 · Jun 29, 2021 · Jun 29, 2021 · Jun 29, 2021
diff --git a/examples/triton_gpt2/GPT2-ONNX-Azure.ipynb b/examples/triton_gpt2/GPT2-ONNX-Azure.ipynb
@@ -15,14 +15,16 @@
     "\n",
     "\n",
     "## Steps:\n",
-    "1. Download pretrained GPT2 model from hugging face\n",
-    "2. Convert the model to ONNX\n",
-    "3. Store it in MinIo bucket\n",
-    "4. Setup Seldon-Core in your kubernetes cluster\n",
-    "5. Deploy the ONNX model with Seldon’s prepackaged Triton server.\n",
-    "6. Interact with the model, run a greedy alg example (generate sentence completion)\n",
-    "7. Run load test using vegeta\n",
-    "8. Clean-up\n",
+    "- [Download pretrained GPT2 model from hugging face](#hf)\n",
+    "- [Convert the model to ONNX](#onnx)\n",
+    "- [Store model in Azure Storage Blob](#blob)\n",
+    "- [Create PersistentVolume and PVC](#pv) mounting Azure Storage Blob\n",
+    "- [Setup Seldon-Core](#seldon) in your kubernetes cluster\n",
+    "- [Deploy the ONNX model](#sd) with Seldon’s prepackaged Triton server.\n",
+    "- [Run model inference](#infer), run a greedy alg example (generate sentence completion)\n",
+    "- [Monitor model with Azure Monitor](#azuremonitor)\n",
+    "- [Run load test using vegeta](#vegeta)\n",
+    "- [Clean-up](#cleanup)\n",
     "\n",
     "## Basic requirements\n",
     "* Helm v3.0.0+\n",
@@ -63,7 +65,7 @@
    "id": "completed-evaluation",
    "metadata": {},
    "source": [
-    "### Export HuggingFace TFGPT2LMHeadModel pre-trained model and save it locally"
+    "### Export HuggingFace TFGPT2LMHeadModel pre-trained model and save it locally <a id=\"hf\"/>"
    ]
   },
   {
@@ -84,7 +86,7 @@
    "id": "further-tribute",
    "metadata": {},
    "source": [
-    "### Convert the TensorFlow saved model to ONNX"
+    "### Convert the TensorFlow saved model to ONNX <a id=\"onnx\"/>"
    ]
   },
   {
@@ -100,7 +102,8 @@
   {
    "source": [
     "## Azure Setup\n",
-    "We have provided Azure Setup Notebook that deploys AKS cluster, Azure storage account and installs Azure Blob CSI driver. If AKS cluster already exists skip to creation of Blob Storage and CSI driver installtion steps."
+    "We have provided [Azure Setup Notebook](./AzureSetup.ipynb) that deploys AKS cluster, Azure storage account and installs Azure Blob CSI driver. If AKS cluster already exists skip to creation of Blob Storage and CSI driver installtion steps. Upon completion of Azure setup following infrastructure will be created:\n",
+    "![Azure](./azure.jpg)"
    ],
    "cell_type": "markdown",
    "metadata": {}
@@ -123,7 +126,7 @@
    "id": "sunset-pantyhose",
    "metadata": {},
    "source": [
-    "### Copy your model to Azure Blob\n"
+    "### Copy your model to Azure Blob <a id=\"blob\"/>\n"
    ]
   },
   {
@@ -171,7 +174,7 @@
   },
   {
    "source": [
-    "##  Add Azure PersistentVolume and Claim <a id=\"pvc\">\n",
+    "##  Add Azure PersistentVolume and Claim <a id=\"pv\">\n",
     "For more details on creating PersistentVolume using CSI driver refer to https://github.com/kubernetes-sigs/blob-csi-driver/blob/master/deploy/example/e2e_usage.md\n",
     " - Create secret\n",
     " - Create PersistentVolume pointing to secret and Blob Container Name and `mountOptions` specifying user id for non-root containers \n",
@@ -292,7 +295,7 @@
    "id": "convinced-syracuse",
    "metadata": {},
    "source": [
-    "### Run Seldon in your kubernetes cluster\n",
+    "### Run Seldon in your kubernetes cluster <a id=\"seldon\"/>\n",
     "\n",
     "Follow the [Seldon-Core Setup notebook](https://docs.seldon.io/projects/seldon-core/en/latest/examples/seldon_core_setup.html) to Setup a cluster with Istio Ingress and install Seldon Core"
    ]
@@ -302,7 +305,7 @@
    "id": "backed-outreach",
    "metadata": {},
    "source": [
-    "### Deploy your model with Seldon pre-packaged Triton server"
+    "### Deploy your model with Seldon pre-packaged Triton server <a id=\"sd\"/>"
    ]
   },
   {
@@ -324,22 +327,30 @@
     "apiVersion: machinelearning.seldon.io/v1alpha2\n",
     "kind: SeldonDeployment\n",
     "metadata:\n",
-    "  name: gpt2\n",
+    "  name: gpt2gpu\n",
     "spec:\n",
+    "  annotations:\n",
+    "    prometheus.io/port: \"8002\"           # we will explain below in Monitoring section\n",
+    "    prometheus.io/path: \"/metrics\"\n",
     "  predictors:\n",
     "  - componentSpecs:\n",
     "    - spec:\n",
     "        containers:\n",
     "        - name: gpt2\n",
     "          resources:\n",
     "            requests:\n",
-    "              memory: 750Mi\n",
-    "              cpu: 2\n",
-    "              #nvidia.com/gpu: 1    \n",
-    "            limits:\n",
     "              memory: 2Gi\n",
     "              cpu: 2\n",
-    "              #nvidia.com/gpu: 1    \n",
+    "              nvidia.com/gpu: 1    \n",
+    "            limits:\n",
+    "              memory: 4Gi\n",
+    "              cpu: 4\n",
+    "              nvidia.com/gpu: 1  \n",
+    "         tolerations:\n",
+    "          - key: \"nvidia.com\"          # to be able to run  in GPU Nodepool\n",
+    "            operator: \"Equal\"\n",
+    "            value: \"gpu\"\n",
+    "            effect: \"NoSchedule\"      \n",
     "    graph:\n",
     "      implementation: TRITON_SERVER\n",
     "      logger:\n",
@@ -373,21 +384,20 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 10,
+   "execution_count": 3,
    "id": "demanding-thesaurus",
    "metadata": {},
    "outputs": [
     {
      "output_type": "stream",
      "name": "stdout",
      "text": [
-      "Waiting for deployment \"gpt2-default-0-gpt2\" rollout to finish: 0 of 1 updated replicas are available...\n",
-      "error: deployment \"gpt2-default-0-gpt2\" exceeded its progress deadline\n"
+      "deployment \"gpt2gpu-default-0-gpt2\" successfully rolled out\n"
      ]
     }
    ],
    "source": [
-    "!kubectl rollout status deploy/$(kubectl get deploy -l seldon-deployment-id=gpt2 -o jsonpath='{.items[0].metadata.name}')"
+    "!kubectl rollout status deploy/$(kubectl get deploy -l seldon-deployment-id=gpt2gpu -o jsonpath='{.items[0].metadata.name}')"
    ]
   },
   {
@@ -438,8 +448,6 @@
     "ingress_ip=!(kubectl get svc --namespace istio-system istio-ingressgateway -o jsonpath='{.status.loadBalancer.ingress[0].ip}')\n",
     "ingress_ip = ingress_ip[0]\n",
     "\n",
-    "#!curl -v http://{ingress_ip}:80/seldon/default/gpt2/v2/models/gpt2\n",
-    "\n",
     "!curl -v http://{ingress_ip}:80/seldon/default/gpt2gpu/v2/models/gpt2"
    ]
   },
@@ -448,12 +456,12 @@
    "id": "anonymous-resource",
    "metadata": {},
    "source": [
-    "### Run prediction test: generate a sentence completion using GPT2 model  - Greedy approach\n"
+    "### Run prediction test: generate a sentence completion using GPT2 model  - Greedy approach <a id=\"infer\"/>\n"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 2,
+   "execution_count": 11,
    "id": "modified-termination",
    "metadata": {},
    "outputs": [
@@ -537,12 +545,110 @@
     "print(f'Input: {input_text}\\nOutput: {gen_sentence}')"
    ]
   },
+  {
+   "source": [
+    "## Configure Model Monitoring with Azure Monitor <a id=\"azuremonitor\"/> \n",
+    "The Azure Monitor Containers Insights provides functionality to allow collecting data from any Prometheus endpoints. It removes the need to install and operate Prometheus server and manage the monitoring data as Azure Monitor provides centralized point for collecting, displaying and alerting on monitoring data. To turn on Azure Monitor Container Insights follow steps described [here](https://docs.microsoft.com/en-us/azure/azure-monitor/containers/container-insights-onboard) and you should that you have an “omsagent” pod running."
+   ],
+   "cell_type": "markdown",
+   "metadata": {}
+  },
+  {
+   "source": [
+    "!kubectl get pods -n kube-system | grep omsagent"
+   ],
+   "cell_type": "code",
+   "metadata": {},
+   "execution_count": 5,
+   "outputs": [
+    {
+     "output_type": "stream",
+     "name": "stdout",
+     "text": [
+      "omsagent-27lk7                                1/1     Running   3          12d\nomsagent-7q49d                                1/1     Running   3          12d\nomsagent-9slf6                                1/1     Running   3          12d\nomsagent-kzbkr                                1/1     Running   3          12d\nomsagent-q85hk                                1/1     Running   3          12d\nomsagent-rs-5976fbdc8b-rgxs4                  1/1     Running   0          8d\nomsagent-tpkq2                                1/1     Running   3          12d\n"
+     ]
+    }
+   ]
+  },
+  {
+   "source": [
+    "### Configure Prometheus Metrics scraping\n",
+    "Once `omsagent` is running we need to configure it to collect metrics from Prometheus endpoints. Azure Monitor Containers Insights allows configuration to be applied on a cluster or node-wide scope and configure endpoints for monitoring on one of the following ways:\n",
+    "- Provide an array of URLs \n",
+    "- Provide an Array of Kubernetes services\n",
+    "- Enable monitoring of any pods with Prometheus annotations\n",
+    "For more details on how to configure the scraping endpoints and query collected data refer to [MS Docs on Configure scraping of Prometheus metrics with Container insights](https://docs.microsoft.com/en-us/azure/azure-monitor/containers/container-insights-prometheus-integration)\n",
+    "\n",
+    "Our deployed model metrics are availble from couple infrasture layers - [Seldon model orchestrator metrics](https://docs.seldon.io/projects/seldon-core/en/latest/analytics/analytics.html) and [Nvidia Triton Server Metrics](https://github.com/triton-inference-server/server/blob/main/docs/metrics.md). To enable scraping for both endpoints we updated Microsoft provided default  `ConfigMap` that configures `omsagent` [azure-metrics-cm.yaml](./azure-metrics-cm.yaml):\n",
+    "- **Triton Server:** update  `monitor_kubernetes_pods = true` to enable scrapting for Pods with `prometheus.io` annotations\n",
+    "    In SeldonDeployment shown above `prometheus.io/path` and `prometheus.io/port` point to default Triton metrics endpoint\n",
+    "- **Seldon Orchestrator:** add our deployed model seldon service endpoint to list of Kubernetes services to be scraped: \n",
+    "  ```yaml\n",
+    "    kubernetes_services = [\"http://gpt2gpu-default.default:8000/prometheus\"]\n",
+    "  ``` "
+   ],
+   "cell_type": "markdown",
+   "metadata": {}
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!kubectl apply -f azure-metrics-cm.yaml"
+   ]
+  },
+  {
+   "source": [
+    "## Query and Visualize collected data\n",
+    "Collected metrics are available in Logs blade of Azure Monitor in a table **InsightsMetrics**, you could see all metrics gathered by running query\n",
+    "\n",
+    "```yaml\n",
+    "InsightsMetrics\n",
+    "| where Namespace == \"prometheus\" \n",
+    "```\n",
+    "\n",
+    "To get Model Inference Requests per minute from Seldon Metrics run the following query and pin it to Dashboard or add to Azure Monitor Workbook:\n",
+    "\n",
+    "```yaml\n",
+    "InsightsMetrics \n",
+    "| where Namespace == \"prometheus\"\n",
+    "| where Name == \"seldon_api_executor_server_requests_seconds_count\"\n",
+    "| extend Model = parse_json(Tags).deployment_name\n",
+    "| where parse_json(Tags).service == \"predictions\"  \n",
+    "| order by TimeGenerated asc \n",
+    "| extend RequestsPerMin = Val - prev(Val,1)\n",
+    "| project TimeGenerated, RequestsPerMin\n",
+    "| render areachart \n",
+    "```\n",
+    "\n",
+    "\n",
+    "To get Inference Duration from Triton Metrics:\n",
+    "\n",
+    "```yaml\n",
+    "InsightsMetrics \n",
+    "| where Namespace == \"prometheus\"\n",
+    "| where Name in (\"nv_inference_request_duration_us\")\n",
+    "| order by TimeGenerated asc\n",
+    "| extend QueueDurationSec = (Val - prev(Val, 1)) / 1000\n",
+    "| project TimeGenerated, Name, QueueDurationSec\n",
+    "| render areachart   \n",
+    "```\n",
+    "\n",
+    "Here is example dashboard we created using queries above\n",
+    "\n",
+    "![dashboard](./azuredashboard.jpg) \n"
+   ],
+   "cell_type": "markdown",
+   "metadata": {}
+  },
   {
    "cell_type": "markdown",
    "id": "colored-status",
    "metadata": {},
    "source": [
-    "### Run Load Test / Performance Test using vegeta"
+    "### Run Load Test / Performance Test using vegeta <a id=\"vegeta\"/>"
    ]
   },
   {
@@ -675,7 +781,7 @@
    "id": "patient-suite",
    "metadata": {},
    "source": [
-    "### Clean-up"
+    "### Clean-up <a id=\"cleanup\"/>"
    ]
   },
   {
@@ -712,4 +818,4 @@
  },
  "nbformat": 4,
  "nbformat_minor": 5
-}
+}