examples of tensorflow serving on GKE and load testing (GoogleCloudPl…

…atform#570) * examples of tensorflow serving on GKE and load testing * fixed build errors Co-authored-by: Leonid Kuligin <[email protected]> Co-authored-by: Ryan McDowell <[email protected]>
hyuatpc · Oct 29, 2020 · f244a36 · f244a36
1 parent 69f9d78
commit f244a36
Show file tree

Hide file tree

Showing 17 changed files with 717 additions and 0 deletions.
diff --git a/README.md b/README.md
@@ -53,6 +53,7 @@ The examples folder contains example solutions across a variety of Google Cloud
 * [QAOA](examples/qaoa) - Examples of parsing a max-SAT problem in a proprietary format.
 * [Redis Cluster on GKE Example](examples/redis-cluster-gke) - Deploying Redis cluster on GKE.
 * [Spinnaker](examples/spinnaker) - Example pipelines for a Canary / Production deployment process.
+* [TensorFlow Serving on GKE and Load Testing](examples/tf-load-testing) - Examples how to implement Tensorflow model inference on GKE and to perform a load testing of such solution.
 * [TensorFlow Unit Testing](examples/tensorflow-unit-testing) - Examples how to write unit tests for TensorFlow ML models.
 * [Uploading files directly to Google Cloud Storage by using Signed URL](examples/direct-upload-to-gcs) - Example architecture to enable uploading files directly to GCS by using [Signed URL](https://cloud.google.com/storage/docs/access-control/signed-urls).
 

diff --git a/examples/tf-load-testing/README.md b/examples/tf-load-testing/README.md
@@ -0,0 +1,126 @@
+You can serve your TensorFlow models on Google Kubernetes Engine with
+[TensorFlow Serving](https://www.tensorflow.org/tfx/guide/serving). This
+example illustrates how to automate deployment of your trained models to GKE.
+In production setup, it's also useful to load test your models to tune
+TensorFlow Serving configuration and your whole setup, as well as to make sure
+your service can handle the required throughput.
+# Prerequisites
+## Preparing a model
+First of all, we need to train a model. You are welcome to experiment with your
+own model or you might train an example based on this
+[tutorial](https://www.tensorflow.org/tutorials/structured_data/feature_columns).  
+
+```
+cd tensorflow
+python create_model.py
+```
+would create
+## Creating GKE clusters for load testing and serving
+Now we need to deploy our model. We're going to serve our model with Tensorflow Serving
+launched in a docker container on a GKE cluster. Our _Dockerfile_ looks pretty simple:
+```
+FROM tensorflow/serving:latest
+
+ADD batching_parameters.txt /benchmark/batching_parameters.txt
+ADD models.config /benchmark/models.config
+
+ADD saved_model_regression /models/regression
+```
+We only add model(s) binaries and a few configuration files. In a `models.config` we define
+one (or many models) to be launched:
+```
+model_config_list {
+  config {
+    name: 'regression'
+    base_path: '/models/regression/'
+    model_platform: "tensorflow"
+  }
+}
+
+```
+We also need to create a GKE cluster and deploy a _tensorflow-app_ service there, that would
+expose expose 8500 and 8501 ports (both for http and grpc requets) under a load balancer. 
+```
+python experiment.py
+```
+would create a _kubernetes.yaml_ file with default serving parameters.
+
+For load testing we use a [locust](https://locust.io/) framework. We've implemented a _RegressionUser_
+inheriting from _locust.HttpUser_ and configured locust to work in a distributed mode.
+
+Now we need to create two GKE clusters . We're doing this to emulate cross-cluster network latency
+as well as being able to experiment with different hardware for TensorFlow. All our deployment are
+deployed with Cloud Build, and you can use a bash script to run e2e infrastructure creation.
+```
+export TENSORFLOW_MACHINE_TYPE=e2-highcpu-8
+export LOCUST_MACHINE_TYPE=e2-highcpu-32
+export CLUSTER_ZONE=<GCP_ZONE>
+export GCP_PROJECT=<YOUR_PROJECT>
+./create-cluster.sh
+```
+
+## Running a load test
+After a cluster has been created, you need to forward a port to localhost:
+```
+gcloud container clusters get-credentials ${LOCUST_CLUSTER_NAME} --zone ${CLUSTER_ZONE}  --project=${GCP_PROJECT}
+export LOCUST_CONTEXT="gke_${GCP_PROJECT}_${CLUSTER_ZONE}_loadtest-locust-${LOCUST_MACHINE_TYPE}"
+kubectl config use-context ${LOCUST_CONTEXT}
+kubectl port-forward svc/locust-master 8089:8089
+```
+Now you can access the locust UI at _localhost:8089_ and initiate a load test of your model.
+We've observed the following results for the example model - 8ms @p50 and 11 @p99 at 300 queries per
+second, and 13ms @p50 and 47ms @p99 at 3900 queries per second.
+
+## Experimenting with addition serving parameters
+Try to use a different hardware for Tensorflow Serving - e.g., recreate a GKE cluster using
+`n2-highcpu-8` machines. We've observed a significant increase in tail
+latency and throughput we could handle (with the same amount of nodes). 3ms @p50 and 5ms @p99 at
+300 queries per second, and 15ms @p50 and 46ms @p90 at 15000 queries per second.
+
+Another way to experiment with is to try out different [batching](https://www.tensorflow.org/tfx/serving/serving_config#batching_configuration)
+parameters (you might look at the batching tuning 
+[guide](https://github.com/tensorflow/serving/blob/master/tensorflow_serving/batching/README.md#performance-tuning))
+as well as other TensorFlow Serving parameters defined
+[here](https://github.com/tensorflow/serving/blob/master/tensorflow_serving/model_servers/main.cc#L59).
+
+One of the possible configuration might be this one:
+```
+python experiment.py --enable_batching \
+--batching_parameters_file=/benchmark/batching_parameters.txt  \
+ --max_batch_size=8000 --batch_timeout_micros=4  --num_batch_threads=4  \
+ --tensorflow_inter_op_parallelism=4 --tensorflow_intra_op_parallelism=4
+```
+In this case, your _kubernetes.yaml_ would have the following lines:
+```
+spec:
+  replicas: 3
+  selector:
+    matchLabels:
+      app: tensorflow-app
+  template:
+    metadata:
+      labels:
+        app: tensorflow-app
+    spec:
+      containers:
+      - name: tensorflow-app
+        image: gcr.io/mogr-test-277422/tensorflow-app:latest
+        env:
+        - name: MODEL_NAME
+          value: regression
+        ports:
+        - containerPort: 8500
+        - containerPort: 8501
+        args: ["--model_config_file=/benchmark/models.config", "--tensorflow_intra_op_parallelism=4",
+               "--tensorflow_inter_op_parallelism=4",
+               "--batching_parameters_file=/benchmark/batching_parameters.txt", "--enable_batching"]
+```
+And the _batching_parameters.txt_ would look like this:
+```
+max_batch_size { value: 8000 }
+batch_timeout_micros { value: 4 }
+max_enqueued_batches { value: 100 }
+num_batch_threads { value: 4 }
+```
+With this configuration, we would achieve much better performance (both higher throughput and lower
+latency).
diff --git a/examples/tf-load-testing/create-cluster.sh b/examples/tf-load-testing/create-cluster.sh
@@ -0,0 +1,73 @@
+#!/bin/bash
+set -e
+
+# add timestamps
+
+function create_cluster() {
+	local COMPONENT_NAME=$1
+	local CLUSTER_NAME=$2
+	local CLUSTER_ZONE=$3
+	local GCP_PROJECT=$4
+	local MACHINE_TYPE=$5
+	# get master cidr block of existing cluster
+	local MASTER_CIDR
+	MASTER_CIDR=$(gcloud container clusters describe "${CLUSTER_NAME}" --project="${GCP_PROJECT}" --zone="${CLUSTER_ZONE}" --format="value(privateClusterConfig.masterIpv4CidrBlock)" 2>> /dev/null)
+
+	# delete existing cluster with the same name if exists
+	gcloud container clusters delete "${CLUSTER_NAME}" --zone="${CLUSTER_ZONE}" --project="${GCP_PROJECT}" --quiet || true
+
+	# calculate cidr block for master network of new cluster
+	if [ -z "$MASTER_CIDR" ]; then
+		COUNT_CLUSTERS=$(gcloud container clusters list  --project="${GCP_PROJECT}"  --format="value(name)" | wc -l)
+		CIDR_BEGIN=$(( COUNT_CLUSTERS*16 ))
+		MASTER_CIDR="172.16.0.${CIDR_BEGIN}/28"
+	fi
+
+	echo "Kubernetes master address range: ${MASTER_CIDR}"
+	# create cluster
+	gcloud container clusters create "${CLUSTER_NAME}" --master-ipv4-cidr="${MASTER_CIDR}" --zone="${CLUSTER_ZONE}" --machine-type="${MACHINE_TYPE}" --enable-private-nodes --enable-ip-alias --no-enable-master-authorized-networks  --project="${GCP_PROJECT}"
+	# get tensorflow lb ip
+	gcloud container clusters get-credentials "${CLUSTER_NAME}" --zone="${CLUSTER_ZONE}"  --project="${GCP_PROJECT}"
+}
+
+function deploy_app() {
+	local COMPONENT_NAME=$1
+	local CLUSTER_NAME=$2
+	local CLUSTER_ZONE=$3
+	local GCP_PROJECT=$4
+	# deploy the app
+	(cd "${COMPONENT_NAME}" && gcloud builds submit --project="${GCP_PROJECT}" --substitutions=_CLOUDSDK_COMPUTE_ZONE="${CLUSTER_ZONE}" --substitutions=_CLOUDSDK_CONTAINER_CLUSTER="${CLUSTER_NAME}" --quiet)
+
+}
+
+
+# DEPLOY TF
+COMPONENT_NAME="tensorflow"
+
+# generate cluster name
+TENSORFLOW_CLUSTER_NAME="loadtest-${COMPONENT_NAME}-${TENSORFLOW_MACHINE_TYPE}"
+
+# deploy tensorflow app
+create_cluster "$COMPONENT_NAME" "$TENSORFLOW_CLUSTER_NAME" "$CLUSTER_ZONE" "$GCP_PROJECT" "$TENSORFLOW_MACHINE_TYPE"
+deploy_app "$COMPONENT_NAME" "$TENSORFLOW_CLUSTER_NAME" "$CLUSTER_ZONE" "$GCP_PROJECT"
+# remove later
+gcloud container clusters get-credentials "${TENSORFLOW_CLUSTER_NAME}" --zone="${CLUSTER_ZONE}"  --project="${GCP_PROJECT}"
+TF_SVC_IP=$(kubectl get svc tensorflow-app -o custom-columns='ip:.status.loadBalancer.ingress[0].ip' --no-headers)
+echo "${COMPONENT_NAME} service ip address: ${TF_SVC_IP}"
+
+# get cpuPlatform
+CLUSTER_NODE_POOL=$(gcloud container node-pools list --project="${GCP_PROJECT}" --cluster="${TENSORFLOW_CLUSTER_NAME}" --zone="${CLUSTER_ZONE}" --format="value(name)")
+CLUSTER_INSTANCE_GROUP=$(gcloud container node-pools describe "${CLUSTER_NODE_POOL}" --project="${GCP_PROJECT}" --cluster="${TENSORFLOW_CLUSTER_NAME}" --zone="${CLUSTER_ZONE}" --format="value(instanceGroupUrls)")
+CLUSTER_NODE_NAMES=$(gcloud compute instance-groups list-instances "${CLUSTER_INSTANCE_GROUP##*/}" --project="${GCP_PROJECT}" --zone="${CLUSTER_ZONE}"  --format="value(NAME)")
+while IFS= read -r NODE; do
+    CLUSTER_NODE_CPU_PLATFORM=$(gcloud compute instances describe "${NODE}" --project="${GCP_PROJECT}" --format="value(cpuPlatform)" --zone="${CLUSTER_ZONE}")
+    echo "NODE: ${NODE} CPU: ${CLUSTER_NODE_CPU_PLATFORM}"
+done <<< "${CLUSTER_NODE_NAMES}"
+
+
+COMPONENT_NAME="locust"
+LOCUST_CLUSTER_NAME="loadtest-${COMPONENT_NAME}-${LOCUST_MACHINE_TYPE}"
+
+# deploy locust app
+create_cluster "$COMPONENT_NAME" "$LOCUST_CLUSTER_NAME" "$CLUSTER_ZONE" "$GCP_PROJECT" "$LOCUST_MACHINE_TYPE"
+deploy_app "$COMPONENT_NAME" "$LOCUST_CLUSTER_NAME" "$CLUSTER_ZONE" "$GCP_PROJECT"
diff --git a/examples/tf-load-testing/locust/Dockerfile b/examples/tf-load-testing/locust/Dockerfile
@@ -0,0 +1,8 @@
+FROM locustio/locust:1.2.3
+
+USER root
+RUN pip install pandas
+
+ADD locustfile.py /locustfile.py
+
+ADD testdata /testdata
diff --git a/examples/tf-load-testing/locust/cloudbuild.yaml b/examples/tf-load-testing/locust/cloudbuild.yaml
@@ -0,0 +1,19 @@
+steps:
+- id: "Build the container image"
+  name: "gcr.io/cloud-builders/docker"
+  args: ["build", "-t", "gcr.io/${_PROJECT_ID}/${_IMAGE_NAME}:${_IMAGE_TAG}", "."]
+
+- id: "Push the container image"
+  name: "gcr.io/cloud-builders/docker"
+  args: ["push", "gcr.io/${_PROJECT_ID}/${_IMAGE_NAME}:${_IMAGE_TAG}"]
+- id: "Deploy the container image to GKE"
+  name: "gcr.io/cloud-builders/gke-deploy"
+  args:
+  - run
+  - --filename=kubernetes.yaml
+  - --image=gcr.io/${_PROJECT_ID}/${_IMAGE_NAME}:${_IMAGE_TAG}
+  - --location=${_CLOUDSDK_COMPUTE_ZONE}
+  - --cluster=${_CLOUDSDK_CONTAINER_CLUSTER}
+substitutions:
+  _IMAGE_NAME: locust-test
+  _IMAGE_TAG: latest
diff --git a/examples/tf-load-testing/locust/kubernetes.yaml b/examples/tf-load-testing/locust/kubernetes.yaml
@@ -0,0 +1,82 @@
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: locust-master
+  labels:
+    name: locust-master
+spec:
+  replicas: 1
+  selector:
+    matchLabels:
+      app: locust-master
+  template:
+    metadata:
+      labels:
+        app: locust-master
+    spec:
+      containers:
+      - name: locust-master
+        image: gcr.io/${GCP_PROJECT}/locust-test:latest
+        args: ["--config=/master.conf"]
+        ports:
+        - name: loc-master-web
+          containerPort: 8089
+          protocol: TCP
+        - name: loc-master-p1
+          containerPort: 5557
+          protocol: TCP
+        - name: loc-master-p2
+          containerPort: 5558
+          protocol: TCP
+        resources:
+          requests:
+            memory: "265Mi"
+            cpu: "100m"
+---
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: locust-worker
+  labels:
+    name: locust-worker
+spec:
+  replicas: 5
+  selector:
+    matchLabels:
+      app: locust-worker
+  template:
+    metadata:
+      labels:
+        app: locust-worker
+    spec:
+      containers:
+      - name: locust-worker
+GCP_PROJECT        args: ["--config=/worker.conf"]
+        resources:
+          requests:
+            memory: "128Mi"
+            cpu: "500m"
+---
+kind: Service
+apiVersion: v1
+metadata:
+  name: locust-master
+  labels:
+    app: locust-master
+spec:
+  ports:
+  - port: 8089
+    targetPort: loc-master-web
+    protocol: TCP
+    name: loc-master-web
+  - port: 5557
+    targetPort: loc-master-p1
+    protocol: TCP
+    name: loc-master-p1
+  - port: 5558
+    targetPort: loc-master-p2
+    protocol: TCP
+    name: loc-master-p2
+  selector:
+    app: locust-master
+  type: ClusterIP
diff --git a/examples/tf-load-testing/locust/locustfile.py b/examples/tf-load-testing/locust/locustfile.py
@@ -0,0 +1,42 @@
+#!/usr/bin/env python
+
+# Copyright 2020 Google Inc. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import json
+import os
+
+from locust import HttpUser, task, between
+import pandas as pd
+
+
+class RegressionUser(HttpUser):
+    wait_time = between(0., 0.2)
+
+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+        self.testdata = pd.read_csv(
+            os.environ.get('testdata_path', '/testdata/regression_test.csv'))
+        self.batch_size = os.environ.get('batch_size', 1)
+
+    @task(1)
+    def test(self):
+        rows = self.testdata.sample(self.batch_size, replace=True).iterrows()
+        instances = [{k: [v] for k, v in row.to_dict().items()}
+                     for _, row in rows]
+        data = json.dumps(
+            {"signature_name": "serving_default", 
+             "instances": instances})
+        self.client.post('v1/models/regression:predict', data=data,
+                         headers={"content-type": "application/json"})
diff --git a/examples/tf-load-testing/locust/master.conf b/examples/tf-load-testing/locust/master.conf
@@ -0,0 +1,3 @@
+master
+locustfile=/locustfile.py
+expect-workers=5
diff --git a/examples/tf-load-testing/locust/worker.conf b/examples/tf-load-testing/locust/worker.conf
@@ -0,0 +1,3 @@
+worker
+locustfile=/locustfile.py
+master-host=locust-master
diff --git a/examples/tf-load-testing/tensorflow/Dockerfile b/examples/tf-load-testing/tensorflow/Dockerfile
@@ -0,0 +1,6 @@
+FROM tensorflow/serving:latest
+
+ADD batching_parameters.txt /benchmark/batching_parameters.txt
+ADD models.config /benchmark/models.config
+
+ADD saved_model_regression /models/regression
diff --git a/examples/tf-load-testing/tensorflow/batching_parameters_default.txt b/examples/tf-load-testing/tensorflow/batching_parameters_default.txt
@@ -0,0 +1,4 @@
+max_batch_size {{ value: {max_batch_size} }}
+batch_timeout_micros {{ value: {batch_timeout_micros} }}
+max_enqueued_batches {{ value: {max_enqueued_batches} }}
+num_batch_threads {{ value: {num_batch_threads} }}