Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Azure Monitoring section in GPT2 on Azure notebook #3351

Merged
merged 5 commits into from
Jul 9, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
170 changes: 138 additions & 32 deletions examples/triton_gpt2/GPT2-ONNX-Azure.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -15,14 +15,16 @@
"\n",
"\n",
"## Steps:\n",
"1. Download pretrained GPT2 model from hugging face\n",
"2. Convert the model to ONNX\n",
"3. Store it in MinIo bucket\n",
"4. Setup Seldon-Core in your kubernetes cluster\n",
"5. Deploy the ONNX model with Seldon’s prepackaged Triton server.\n",
"6. Interact with the model, run a greedy alg example (generate sentence completion)\n",
"7. Run load test using vegeta\n",
"8. Clean-up\n",
"- [Download pretrained GPT2 model from hugging face](#hf)\n",
"- [Convert the model to ONNX](#onnx)\n",
"- [Store model in Azure Storage Blob](#blob)\n",
"- [Create PersistentVolume and PVC](#pv) mounting Azure Storage Blob\n",
"- [Setup Seldon-Core](#seldon) in your kubernetes cluster\n",
"- [Deploy the ONNX model](#sd) with Seldon’s prepackaged Triton server.\n",
"- [Run model inference](#infer), run a greedy alg example (generate sentence completion)\n",
"- [Monitor model with Azure Monitor](#azuremonitor)\n",
"- [Run load test using vegeta](#vegeta)\n",
"- [Clean-up](#cleanup)\n",
"\n",
"## Basic requirements\n",
"* Helm v3.0.0+\n",
Expand Down Expand Up @@ -63,7 +65,7 @@
"id": "completed-evaluation",
"metadata": {},
"source": [
"### Export HuggingFace TFGPT2LMHeadModel pre-trained model and save it locally"
"### Export HuggingFace TFGPT2LMHeadModel pre-trained model and save it locally <a id=\"hf\"/>"
]
},
{
Expand All @@ -84,7 +86,7 @@
"id": "further-tribute",
"metadata": {},
"source": [
"### Convert the TensorFlow saved model to ONNX"
"### Convert the TensorFlow saved model to ONNX <a id=\"onnx\"/>"
]
},
{
Expand All @@ -100,7 +102,8 @@
{
"source": [
"## Azure Setup\n",
"We have provided Azure Setup Notebook that deploys AKS cluster, Azure storage account and installs Azure Blob CSI driver. If AKS cluster already exists skip to creation of Blob Storage and CSI driver installtion steps."
"We have provided [Azure Setup Notebook](./AzureSetup.ipynb) that deploys AKS cluster, Azure storage account and installs Azure Blob CSI driver. If AKS cluster already exists skip to creation of Blob Storage and CSI driver installtion steps. Upon completion of Azure setup following infrastructure will be created:\n",
"![Azure](./azure.jpg)"
],
"cell_type": "markdown",
"metadata": {}
Expand All @@ -123,7 +126,7 @@
"id": "sunset-pantyhose",
"metadata": {},
"source": [
"### Copy your model to Azure Blob\n"
"### Copy your model to Azure Blob <a id=\"blob\"/>\n"
]
},
{
Expand Down Expand Up @@ -171,7 +174,7 @@
},
{
"source": [
"## Add Azure PersistentVolume and Claim <a id=\"pvc\">\n",
"## Add Azure PersistentVolume and Claim <a id=\"pv\">\n",
"For more details on creating PersistentVolume using CSI driver refer to https://github.com/kubernetes-sigs/blob-csi-driver/blob/master/deploy/example/e2e_usage.md\n",
" - Create secret\n",
" - Create PersistentVolume pointing to secret and Blob Container Name and `mountOptions` specifying user id for non-root containers \n",
Expand Down Expand Up @@ -292,7 +295,7 @@
"id": "convinced-syracuse",
"metadata": {},
"source": [
"### Run Seldon in your kubernetes cluster\n",
"### Run Seldon in your kubernetes cluster <a id=\"seldon\"/>\n",
"\n",
"Follow the [Seldon-Core Setup notebook](https://docs.seldon.io/projects/seldon-core/en/latest/examples/seldon_core_setup.html) to Setup a cluster with Istio Ingress and install Seldon Core"
]
Expand All @@ -302,7 +305,7 @@
"id": "backed-outreach",
"metadata": {},
"source": [
"### Deploy your model with Seldon pre-packaged Triton server"
"### Deploy your model with Seldon pre-packaged Triton server <a id=\"sd\"/>"
]
},
{
Expand All @@ -324,22 +327,30 @@
"apiVersion: machinelearning.seldon.io/v1alpha2\n",
"kind: SeldonDeployment\n",
"metadata:\n",
" name: gpt2\n",
" name: gpt2gpu\n",
"spec:\n",
" annotations:\n",
" prometheus.io/port: \"8002\" # we will explain below in Monitoring section\n",
" prometheus.io/path: \"/metrics\"\n",
" predictors:\n",
" - componentSpecs:\n",
" - spec:\n",
" containers:\n",
" - name: gpt2\n",
" resources:\n",
" requests:\n",
" memory: 750Mi\n",
" cpu: 2\n",
" #nvidia.com/gpu: 1 \n",
" limits:\n",
" memory: 2Gi\n",
" cpu: 2\n",
" #nvidia.com/gpu: 1 \n",
" nvidia.com/gpu: 1 \n",
" limits:\n",
" memory: 4Gi\n",
" cpu: 4\n",
" nvidia.com/gpu: 1 \n",
" tolerations:\n",
" - key: \"nvidia.com\" # to be able to run in GPU Nodepool\n",
" operator: \"Equal\"\n",
" value: \"gpu\"\n",
" effect: \"NoSchedule\" \n",
" graph:\n",
" implementation: TRITON_SERVER\n",
" logger:\n",
Expand Down Expand Up @@ -373,21 +384,20 @@
},
{
"cell_type": "code",
"execution_count": 10,
"execution_count": 3,
"id": "demanding-thesaurus",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Waiting for deployment \"gpt2-default-0-gpt2\" rollout to finish: 0 of 1 updated replicas are available...\n",
"error: deployment \"gpt2-default-0-gpt2\" exceeded its progress deadline\n"
"deployment \"gpt2gpu-default-0-gpt2\" successfully rolled out\n"
]
}
],
"source": [
"!kubectl rollout status deploy/$(kubectl get deploy -l seldon-deployment-id=gpt2 -o jsonpath='{.items[0].metadata.name}')"
"!kubectl rollout status deploy/$(kubectl get deploy -l seldon-deployment-id=gpt2gpu -o jsonpath='{.items[0].metadata.name}')"
]
},
{
Expand Down Expand Up @@ -438,8 +448,6 @@
"ingress_ip=!(kubectl get svc --namespace istio-system istio-ingressgateway -o jsonpath='{.status.loadBalancer.ingress[0].ip}')\n",
"ingress_ip = ingress_ip[0]\n",
"\n",
"#!curl -v http://{ingress_ip}:80/seldon/default/gpt2/v2/models/gpt2\n",
"\n",
"!curl -v http://{ingress_ip}:80/seldon/default/gpt2gpu/v2/models/gpt2"
]
},
Expand All @@ -448,12 +456,12 @@
"id": "anonymous-resource",
"metadata": {},
"source": [
"### Run prediction test: generate a sentence completion using GPT2 model - Greedy approach\n"
"### Run prediction test: generate a sentence completion using GPT2 model - Greedy approach <a id=\"infer\"/>\n"
]
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": 11,
"id": "modified-termination",
"metadata": {},
"outputs": [
Expand Down Expand Up @@ -537,12 +545,110 @@
"print(f'Input: {input_text}\\nOutput: {gen_sentence}')"
]
},
{
"source": [
"## Configure Model Monitoring with Azure Monitor <a id=\"azuremonitor\"/> \n",
"The Azure Monitor Containers Insights provides functionality to allow collecting data from any Prometheus endpoints. It removes the need to install and operate Prometheus server and manage the monitoring data as Azure Monitor provides centralized point for collecting, displaying and alerting on monitoring data. To turn on Azure Monitor Container Insights follow steps described [here](https://docs.microsoft.com/en-us/azure/azure-monitor/containers/container-insights-onboard) and you should that you have an “omsagent” pod running."
],
"cell_type": "markdown",
"metadata": {}
},
{
"source": [
"!kubectl get pods -n kube-system | grep omsagent"
],
"cell_type": "code",
"metadata": {},
"execution_count": 5,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"omsagent-27lk7 1/1 Running 3 12d\nomsagent-7q49d 1/1 Running 3 12d\nomsagent-9slf6 1/1 Running 3 12d\nomsagent-kzbkr 1/1 Running 3 12d\nomsagent-q85hk 1/1 Running 3 12d\nomsagent-rs-5976fbdc8b-rgxs4 1/1 Running 0 8d\nomsagent-tpkq2 1/1 Running 3 12d\n"
]
}
]
},
{
"source": [
"### Configure Prometheus Metrics scraping\n",
"Once `omsagent` is running we need to configure it to collect metrics from Prometheus endpoints. Azure Monitor Containers Insights allows configuration to be applied on a cluster or node-wide scope and configure endpoints for monitoring on one of the following ways:\n",
"- Provide an array of URLs \n",
"- Provide an Array of Kubernetes services\n",
"- Enable monitoring of any pods with Prometheus annotations\n",
"For more details on how to configure the scraping endpoints and query collected data refer to [MS Docs on Configure scraping of Prometheus metrics with Container insights](https://docs.microsoft.com/en-us/azure/azure-monitor/containers/container-insights-prometheus-integration)\n",
"\n",
"Our deployed model metrics are availble from couple infrasture layers - [Seldon model orchestrator metrics](https://docs.seldon.io/projects/seldon-core/en/latest/analytics/analytics.html) and [Nvidia Triton Server Metrics](https://github.com/triton-inference-server/server/blob/main/docs/metrics.md). To enable scraping for both endpoints we updated Microsoft provided default `ConfigMap` that configures `omsagent` [azure-metrics-cm.yaml](./azure-metrics-cm.yaml):\n",
"- **Triton Server:** update `monitor_kubernetes_pods = true` to enable scrapting for Pods with `prometheus.io` annotations\n",
" In SeldonDeployment shown above `prometheus.io/path` and `prometheus.io/port` point to default Triton metrics endpoint\n",
"- **Seldon Orchestrator:** add our deployed model seldon service endpoint to list of Kubernetes services to be scraped: \n",
" ```yaml\n",
" kubernetes_services = [\"http://gpt2gpu-default.default:8000/prometheus\"]\n",
" ``` "
],
"cell_type": "markdown",
"metadata": {}
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!kubectl apply -f azure-metrics-cm.yaml"
]
},
{
"source": [
"## Query and Visualize collected data\n",
"Collected metrics are available in Logs blade of Azure Monitor in a table **InsightsMetrics**, you could see all metrics gathered by running query\n",
"\n",
"```yaml\n",
"InsightsMetrics\n",
"| where Namespace == \"prometheus\" \n",
"```\n",
"\n",
"To get Model Inference Requests per minute from Seldon Metrics run the following query and pin it to Dashboard or add to Azure Monitor Workbook:\n",
"\n",
"```yaml\n",
"InsightsMetrics \n",
"| where Namespace == \"prometheus\"\n",
"| where Name == \"seldon_api_executor_server_requests_seconds_count\"\n",
"| extend Model = parse_json(Tags).deployment_name\n",
"| where parse_json(Tags).service == \"predictions\" \n",
"| order by TimeGenerated asc \n",
"| extend RequestsPerMin = Val - prev(Val,1)\n",
"| project TimeGenerated, RequestsPerMin\n",
"| render areachart \n",
"```\n",
"\n",
"\n",
"To get Inference Duration from Triton Metrics:\n",
"\n",
"```yaml\n",
"InsightsMetrics \n",
"| where Namespace == \"prometheus\"\n",
"| where Name in (\"nv_inference_request_duration_us\")\n",
"| order by TimeGenerated asc\n",
"| extend QueueDurationSec = (Val - prev(Val, 1)) / 1000\n",
"| project TimeGenerated, Name, QueueDurationSec\n",
"| render areachart \n",
"```\n",
"\n",
"Here is example dashboard we created using queries above\n",
"\n",
"![dashboard](./azuredashboard.jpg) \n"
],
"cell_type": "markdown",
"metadata": {}
},
{
"cell_type": "markdown",
"id": "colored-status",
"metadata": {},
"source": [
"### Run Load Test / Performance Test using vegeta"
"### Run Load Test / Performance Test using vegeta <a id=\"vegeta\"/>"
]
},
{
Expand Down Expand Up @@ -675,7 +781,7 @@
"id": "patient-suite",
"metadata": {},
"source": [
"### Clean-up"
"### Clean-up <a id=\"cleanup\"/>"
]
},
{
Expand Down Expand Up @@ -712,4 +818,4 @@
},
"nbformat": 4,
"nbformat_minor": 5
}
}
Loading