Skip to content

Commit

Permalink
examples of tensorflow serving on GKE and load testing (GoogleCloudPl…
Browse files Browse the repository at this point in the history
…atform#570)

* examples of tensorflow serving on GKE and load testing

* fixed build errors

Co-authored-by: Leonid Kuligin <[email protected]>
Co-authored-by: Ryan McDowell <[email protected]>
  • Loading branch information
3 people authored Oct 29, 2020
1 parent 69f9d78 commit f244a36
Show file tree
Hide file tree
Showing 17 changed files with 717 additions and 0 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@ The examples folder contains example solutions across a variety of Google Cloud
* [QAOA](examples/qaoa) - Examples of parsing a max-SAT problem in a proprietary format.
* [Redis Cluster on GKE Example](examples/redis-cluster-gke) - Deploying Redis cluster on GKE.
* [Spinnaker](examples/spinnaker) - Example pipelines for a Canary / Production deployment process.
* [TensorFlow Serving on GKE and Load Testing](examples/tf-load-testing) - Examples how to implement Tensorflow model inference on GKE and to perform a load testing of such solution.
* [TensorFlow Unit Testing](examples/tensorflow-unit-testing) - Examples how to write unit tests for TensorFlow ML models.
* [Uploading files directly to Google Cloud Storage by using Signed URL](examples/direct-upload-to-gcs) - Example architecture to enable uploading files directly to GCS by using [Signed URL](https://cloud.google.com/storage/docs/access-control/signed-urls).

Expand Down
126 changes: 126 additions & 0 deletions examples/tf-load-testing/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
You can serve your TensorFlow models on Google Kubernetes Engine with
[TensorFlow Serving](https://www.tensorflow.org/tfx/guide/serving). This
example illustrates how to automate deployment of your trained models to GKE.
In production setup, it's also useful to load test your models to tune
TensorFlow Serving configuration and your whole setup, as well as to make sure
your service can handle the required throughput.
# Prerequisites
## Preparing a model
First of all, we need to train a model. You are welcome to experiment with your
own model or you might train an example based on this
[tutorial](https://www.tensorflow.org/tutorials/structured_data/feature_columns).

```
cd tensorflow
python create_model.py
```
would create
## Creating GKE clusters for load testing and serving
Now we need to deploy our model. We're going to serve our model with Tensorflow Serving
launched in a docker container on a GKE cluster. Our _Dockerfile_ looks pretty simple:
```
FROM tensorflow/serving:latest
ADD batching_parameters.txt /benchmark/batching_parameters.txt
ADD models.config /benchmark/models.config
ADD saved_model_regression /models/regression
```
We only add model(s) binaries and a few configuration files. In a `models.config` we define
one (or many models) to be launched:
```
model_config_list {
config {
name: 'regression'
base_path: '/models/regression/'
model_platform: "tensorflow"
}
}
```
We also need to create a GKE cluster and deploy a _tensorflow-app_ service there, that would
expose expose 8500 and 8501 ports (both for http and grpc requets) under a load balancer.
```
python experiment.py
```
would create a _kubernetes.yaml_ file with default serving parameters.

For load testing we use a [locust](https://locust.io/) framework. We've implemented a _RegressionUser_
inheriting from _locust.HttpUser_ and configured locust to work in a distributed mode.

Now we need to create two GKE clusters . We're doing this to emulate cross-cluster network latency
as well as being able to experiment with different hardware for TensorFlow. All our deployment are
deployed with Cloud Build, and you can use a bash script to run e2e infrastructure creation.
```
export TENSORFLOW_MACHINE_TYPE=e2-highcpu-8
export LOCUST_MACHINE_TYPE=e2-highcpu-32
export CLUSTER_ZONE=<GCP_ZONE>
export GCP_PROJECT=<YOUR_PROJECT>
./create-cluster.sh
```

## Running a load test
After a cluster has been created, you need to forward a port to localhost:
```
gcloud container clusters get-credentials ${LOCUST_CLUSTER_NAME} --zone ${CLUSTER_ZONE} --project=${GCP_PROJECT}
export LOCUST_CONTEXT="gke_${GCP_PROJECT}_${CLUSTER_ZONE}_loadtest-locust-${LOCUST_MACHINE_TYPE}"
kubectl config use-context ${LOCUST_CONTEXT}
kubectl port-forward svc/locust-master 8089:8089
```
Now you can access the locust UI at _localhost:8089_ and initiate a load test of your model.
We've observed the following results for the example model - 8ms @p50 and 11 @p99 at 300 queries per
second, and 13ms @p50 and 47ms @p99 at 3900 queries per second.

## Experimenting with addition serving parameters
Try to use a different hardware for Tensorflow Serving - e.g., recreate a GKE cluster using
`n2-highcpu-8` machines. We've observed a significant increase in tail
latency and throughput we could handle (with the same amount of nodes). 3ms @p50 and 5ms @p99 at
300 queries per second, and 15ms @p50 and 46ms @p90 at 15000 queries per second.

Another way to experiment with is to try out different [batching](https://www.tensorflow.org/tfx/serving/serving_config#batching_configuration)
parameters (you might look at the batching tuning
[guide](https://github.com/tensorflow/serving/blob/master/tensorflow_serving/batching/README.md#performance-tuning))
as well as other TensorFlow Serving parameters defined
[here](https://github.com/tensorflow/serving/blob/master/tensorflow_serving/model_servers/main.cc#L59).

One of the possible configuration might be this one:
```
python experiment.py --enable_batching \
--batching_parameters_file=/benchmark/batching_parameters.txt \
--max_batch_size=8000 --batch_timeout_micros=4 --num_batch_threads=4 \
--tensorflow_inter_op_parallelism=4 --tensorflow_intra_op_parallelism=4
```
In this case, your _kubernetes.yaml_ would have the following lines:
```
spec:
replicas: 3
selector:
matchLabels:
app: tensorflow-app
template:
metadata:
labels:
app: tensorflow-app
spec:
containers:
- name: tensorflow-app
image: gcr.io/mogr-test-277422/tensorflow-app:latest
env:
- name: MODEL_NAME
value: regression
ports:
- containerPort: 8500
- containerPort: 8501
args: ["--model_config_file=/benchmark/models.config", "--tensorflow_intra_op_parallelism=4",
"--tensorflow_inter_op_parallelism=4",
"--batching_parameters_file=/benchmark/batching_parameters.txt", "--enable_batching"]
```
And the _batching_parameters.txt_ would look like this:
```
max_batch_size { value: 8000 }
batch_timeout_micros { value: 4 }
max_enqueued_batches { value: 100 }
num_batch_threads { value: 4 }
```
With this configuration, we would achieve much better performance (both higher throughput and lower
latency).
73 changes: 73 additions & 0 deletions examples/tf-load-testing/create-cluster.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
#!/bin/bash
set -e

# add timestamps

function create_cluster() {
local COMPONENT_NAME=$1
local CLUSTER_NAME=$2
local CLUSTER_ZONE=$3
local GCP_PROJECT=$4
local MACHINE_TYPE=$5
# get master cidr block of existing cluster
local MASTER_CIDR
MASTER_CIDR=$(gcloud container clusters describe "${CLUSTER_NAME}" --project="${GCP_PROJECT}" --zone="${CLUSTER_ZONE}" --format="value(privateClusterConfig.masterIpv4CidrBlock)" 2>> /dev/null)

# delete existing cluster with the same name if exists
gcloud container clusters delete "${CLUSTER_NAME}" --zone="${CLUSTER_ZONE}" --project="${GCP_PROJECT}" --quiet || true

# calculate cidr block for master network of new cluster
if [ -z "$MASTER_CIDR" ]; then
COUNT_CLUSTERS=$(gcloud container clusters list --project="${GCP_PROJECT}" --format="value(name)" | wc -l)
CIDR_BEGIN=$(( COUNT_CLUSTERS*16 ))
MASTER_CIDR="172.16.0.${CIDR_BEGIN}/28"
fi

echo "Kubernetes master address range: ${MASTER_CIDR}"
# create cluster
gcloud container clusters create "${CLUSTER_NAME}" --master-ipv4-cidr="${MASTER_CIDR}" --zone="${CLUSTER_ZONE}" --machine-type="${MACHINE_TYPE}" --enable-private-nodes --enable-ip-alias --no-enable-master-authorized-networks --project="${GCP_PROJECT}"
# get tensorflow lb ip
gcloud container clusters get-credentials "${CLUSTER_NAME}" --zone="${CLUSTER_ZONE}" --project="${GCP_PROJECT}"
}

function deploy_app() {
local COMPONENT_NAME=$1
local CLUSTER_NAME=$2
local CLUSTER_ZONE=$3
local GCP_PROJECT=$4
# deploy the app
(cd "${COMPONENT_NAME}" && gcloud builds submit --project="${GCP_PROJECT}" --substitutions=_CLOUDSDK_COMPUTE_ZONE="${CLUSTER_ZONE}" --substitutions=_CLOUDSDK_CONTAINER_CLUSTER="${CLUSTER_NAME}" --quiet)

}


# DEPLOY TF
COMPONENT_NAME="tensorflow"

# generate cluster name
TENSORFLOW_CLUSTER_NAME="loadtest-${COMPONENT_NAME}-${TENSORFLOW_MACHINE_TYPE}"

# deploy tensorflow app
create_cluster "$COMPONENT_NAME" "$TENSORFLOW_CLUSTER_NAME" "$CLUSTER_ZONE" "$GCP_PROJECT" "$TENSORFLOW_MACHINE_TYPE"
deploy_app "$COMPONENT_NAME" "$TENSORFLOW_CLUSTER_NAME" "$CLUSTER_ZONE" "$GCP_PROJECT"
# remove later
gcloud container clusters get-credentials "${TENSORFLOW_CLUSTER_NAME}" --zone="${CLUSTER_ZONE}" --project="${GCP_PROJECT}"
TF_SVC_IP=$(kubectl get svc tensorflow-app -o custom-columns='ip:.status.loadBalancer.ingress[0].ip' --no-headers)
echo "${COMPONENT_NAME} service ip address: ${TF_SVC_IP}"

# get cpuPlatform
CLUSTER_NODE_POOL=$(gcloud container node-pools list --project="${GCP_PROJECT}" --cluster="${TENSORFLOW_CLUSTER_NAME}" --zone="${CLUSTER_ZONE}" --format="value(name)")
CLUSTER_INSTANCE_GROUP=$(gcloud container node-pools describe "${CLUSTER_NODE_POOL}" --project="${GCP_PROJECT}" --cluster="${TENSORFLOW_CLUSTER_NAME}" --zone="${CLUSTER_ZONE}" --format="value(instanceGroupUrls)")
CLUSTER_NODE_NAMES=$(gcloud compute instance-groups list-instances "${CLUSTER_INSTANCE_GROUP##*/}" --project="${GCP_PROJECT}" --zone="${CLUSTER_ZONE}" --format="value(NAME)")
while IFS= read -r NODE; do
CLUSTER_NODE_CPU_PLATFORM=$(gcloud compute instances describe "${NODE}" --project="${GCP_PROJECT}" --format="value(cpuPlatform)" --zone="${CLUSTER_ZONE}")
echo "NODE: ${NODE} CPU: ${CLUSTER_NODE_CPU_PLATFORM}"
done <<< "${CLUSTER_NODE_NAMES}"


COMPONENT_NAME="locust"
LOCUST_CLUSTER_NAME="loadtest-${COMPONENT_NAME}-${LOCUST_MACHINE_TYPE}"

# deploy locust app
create_cluster "$COMPONENT_NAME" "$LOCUST_CLUSTER_NAME" "$CLUSTER_ZONE" "$GCP_PROJECT" "$LOCUST_MACHINE_TYPE"
deploy_app "$COMPONENT_NAME" "$LOCUST_CLUSTER_NAME" "$CLUSTER_ZONE" "$GCP_PROJECT"
8 changes: 8 additions & 0 deletions examples/tf-load-testing/locust/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
FROM locustio/locust:1.2.3

USER root
RUN pip install pandas

ADD locustfile.py /locustfile.py

ADD testdata /testdata
19 changes: 19 additions & 0 deletions examples/tf-load-testing/locust/cloudbuild.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
steps:
- id: "Build the container image"
name: "gcr.io/cloud-builders/docker"
args: ["build", "-t", "gcr.io/${_PROJECT_ID}/${_IMAGE_NAME}:${_IMAGE_TAG}", "."]

- id: "Push the container image"
name: "gcr.io/cloud-builders/docker"
args: ["push", "gcr.io/${_PROJECT_ID}/${_IMAGE_NAME}:${_IMAGE_TAG}"]
- id: "Deploy the container image to GKE"
name: "gcr.io/cloud-builders/gke-deploy"
args:
- run
- --filename=kubernetes.yaml
- --image=gcr.io/${_PROJECT_ID}/${_IMAGE_NAME}:${_IMAGE_TAG}
- --location=${_CLOUDSDK_COMPUTE_ZONE}
- --cluster=${_CLOUDSDK_CONTAINER_CLUSTER}
substitutions:
_IMAGE_NAME: locust-test
_IMAGE_TAG: latest
82 changes: 82 additions & 0 deletions examples/tf-load-testing/locust/kubernetes.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: locust-master
labels:
name: locust-master
spec:
replicas: 1
selector:
matchLabels:
app: locust-master
template:
metadata:
labels:
app: locust-master
spec:
containers:
- name: locust-master
image: gcr.io/${GCP_PROJECT}/locust-test:latest
args: ["--config=/master.conf"]
ports:
- name: loc-master-web
containerPort: 8089
protocol: TCP
- name: loc-master-p1
containerPort: 5557
protocol: TCP
- name: loc-master-p2
containerPort: 5558
protocol: TCP
resources:
requests:
memory: "265Mi"
cpu: "100m"
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: locust-worker
labels:
name: locust-worker
spec:
replicas: 5
selector:
matchLabels:
app: locust-worker
template:
metadata:
labels:
app: locust-worker
spec:
containers:
- name: locust-worker
GCP_PROJECT args: ["--config=/worker.conf"]
resources:
requests:
memory: "128Mi"
cpu: "500m"
---
kind: Service
apiVersion: v1
metadata:
name: locust-master
labels:
app: locust-master
spec:
ports:
- port: 8089
targetPort: loc-master-web
protocol: TCP
name: loc-master-web
- port: 5557
targetPort: loc-master-p1
protocol: TCP
name: loc-master-p1
- port: 5558
targetPort: loc-master-p2
protocol: TCP
name: loc-master-p2
selector:
app: locust-master
type: ClusterIP
42 changes: 42 additions & 0 deletions examples/tf-load-testing/locust/locustfile.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
#!/usr/bin/env python

# Copyright 2020 Google Inc. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import json
import os

from locust import HttpUser, task, between
import pandas as pd


class RegressionUser(HttpUser):
wait_time = between(0., 0.2)

def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.testdata = pd.read_csv(
os.environ.get('testdata_path', '/testdata/regression_test.csv'))
self.batch_size = os.environ.get('batch_size', 1)

@task(1)
def test(self):
rows = self.testdata.sample(self.batch_size, replace=True).iterrows()
instances = [{k: [v] for k, v in row.to_dict().items()}
for _, row in rows]
data = json.dumps(
{"signature_name": "serving_default",
"instances": instances})
self.client.post('v1/models/regression:predict', data=data,
headers={"content-type": "application/json"})
3 changes: 3 additions & 0 deletions examples/tf-load-testing/locust/master.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
master
locustfile=/locustfile.py
expect-workers=5
3 changes: 3 additions & 0 deletions examples/tf-load-testing/locust/worker.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
worker
locustfile=/locustfile.py
master-host=locust-master
6 changes: 6 additions & 0 deletions examples/tf-load-testing/tensorflow/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
FROM tensorflow/serving:latest

ADD batching_parameters.txt /benchmark/batching_parameters.txt
ADD models.config /benchmark/models.config

ADD saved_model_regression /models/regression
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
max_batch_size {{ value: {max_batch_size} }}
batch_timeout_micros {{ value: {batch_timeout_micros} }}
max_enqueued_batches {{ value: {max_enqueued_batches} }}
num_batch_threads {{ value: {num_batch_threads} }}
Loading

0 comments on commit f244a36

Please sign in to comment.