forked from GoogleCloudPlatform/professional-services
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
examples of tensorflow serving on GKE and load testing (GoogleCloudPl…
…atform#570) * examples of tensorflow serving on GKE and load testing * fixed build errors Co-authored-by: Leonid Kuligin <[email protected]> Co-authored-by: Ryan McDowell <[email protected]>
- Loading branch information
1 parent
69f9d78
commit f244a36
Showing
17 changed files
with
717 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,126 @@ | ||
You can serve your TensorFlow models on Google Kubernetes Engine with | ||
[TensorFlow Serving](https://www.tensorflow.org/tfx/guide/serving). This | ||
example illustrates how to automate deployment of your trained models to GKE. | ||
In production setup, it's also useful to load test your models to tune | ||
TensorFlow Serving configuration and your whole setup, as well as to make sure | ||
your service can handle the required throughput. | ||
# Prerequisites | ||
## Preparing a model | ||
First of all, we need to train a model. You are welcome to experiment with your | ||
own model or you might train an example based on this | ||
[tutorial](https://www.tensorflow.org/tutorials/structured_data/feature_columns). | ||
|
||
``` | ||
cd tensorflow | ||
python create_model.py | ||
``` | ||
would create | ||
## Creating GKE clusters for load testing and serving | ||
Now we need to deploy our model. We're going to serve our model with Tensorflow Serving | ||
launched in a docker container on a GKE cluster. Our _Dockerfile_ looks pretty simple: | ||
``` | ||
FROM tensorflow/serving:latest | ||
ADD batching_parameters.txt /benchmark/batching_parameters.txt | ||
ADD models.config /benchmark/models.config | ||
ADD saved_model_regression /models/regression | ||
``` | ||
We only add model(s) binaries and a few configuration files. In a `models.config` we define | ||
one (or many models) to be launched: | ||
``` | ||
model_config_list { | ||
config { | ||
name: 'regression' | ||
base_path: '/models/regression/' | ||
model_platform: "tensorflow" | ||
} | ||
} | ||
``` | ||
We also need to create a GKE cluster and deploy a _tensorflow-app_ service there, that would | ||
expose expose 8500 and 8501 ports (both for http and grpc requets) under a load balancer. | ||
``` | ||
python experiment.py | ||
``` | ||
would create a _kubernetes.yaml_ file with default serving parameters. | ||
|
||
For load testing we use a [locust](https://locust.io/) framework. We've implemented a _RegressionUser_ | ||
inheriting from _locust.HttpUser_ and configured locust to work in a distributed mode. | ||
|
||
Now we need to create two GKE clusters . We're doing this to emulate cross-cluster network latency | ||
as well as being able to experiment with different hardware for TensorFlow. All our deployment are | ||
deployed with Cloud Build, and you can use a bash script to run e2e infrastructure creation. | ||
``` | ||
export TENSORFLOW_MACHINE_TYPE=e2-highcpu-8 | ||
export LOCUST_MACHINE_TYPE=e2-highcpu-32 | ||
export CLUSTER_ZONE=<GCP_ZONE> | ||
export GCP_PROJECT=<YOUR_PROJECT> | ||
./create-cluster.sh | ||
``` | ||
|
||
## Running a load test | ||
After a cluster has been created, you need to forward a port to localhost: | ||
``` | ||
gcloud container clusters get-credentials ${LOCUST_CLUSTER_NAME} --zone ${CLUSTER_ZONE} --project=${GCP_PROJECT} | ||
export LOCUST_CONTEXT="gke_${GCP_PROJECT}_${CLUSTER_ZONE}_loadtest-locust-${LOCUST_MACHINE_TYPE}" | ||
kubectl config use-context ${LOCUST_CONTEXT} | ||
kubectl port-forward svc/locust-master 8089:8089 | ||
``` | ||
Now you can access the locust UI at _localhost:8089_ and initiate a load test of your model. | ||
We've observed the following results for the example model - 8ms @p50 and 11 @p99 at 300 queries per | ||
second, and 13ms @p50 and 47ms @p99 at 3900 queries per second. | ||
|
||
## Experimenting with addition serving parameters | ||
Try to use a different hardware for Tensorflow Serving - e.g., recreate a GKE cluster using | ||
`n2-highcpu-8` machines. We've observed a significant increase in tail | ||
latency and throughput we could handle (with the same amount of nodes). 3ms @p50 and 5ms @p99 at | ||
300 queries per second, and 15ms @p50 and 46ms @p90 at 15000 queries per second. | ||
|
||
Another way to experiment with is to try out different [batching](https://www.tensorflow.org/tfx/serving/serving_config#batching_configuration) | ||
parameters (you might look at the batching tuning | ||
[guide](https://github.com/tensorflow/serving/blob/master/tensorflow_serving/batching/README.md#performance-tuning)) | ||
as well as other TensorFlow Serving parameters defined | ||
[here](https://github.com/tensorflow/serving/blob/master/tensorflow_serving/model_servers/main.cc#L59). | ||
|
||
One of the possible configuration might be this one: | ||
``` | ||
python experiment.py --enable_batching \ | ||
--batching_parameters_file=/benchmark/batching_parameters.txt \ | ||
--max_batch_size=8000 --batch_timeout_micros=4 --num_batch_threads=4 \ | ||
--tensorflow_inter_op_parallelism=4 --tensorflow_intra_op_parallelism=4 | ||
``` | ||
In this case, your _kubernetes.yaml_ would have the following lines: | ||
``` | ||
spec: | ||
replicas: 3 | ||
selector: | ||
matchLabels: | ||
app: tensorflow-app | ||
template: | ||
metadata: | ||
labels: | ||
app: tensorflow-app | ||
spec: | ||
containers: | ||
- name: tensorflow-app | ||
image: gcr.io/mogr-test-277422/tensorflow-app:latest | ||
env: | ||
- name: MODEL_NAME | ||
value: regression | ||
ports: | ||
- containerPort: 8500 | ||
- containerPort: 8501 | ||
args: ["--model_config_file=/benchmark/models.config", "--tensorflow_intra_op_parallelism=4", | ||
"--tensorflow_inter_op_parallelism=4", | ||
"--batching_parameters_file=/benchmark/batching_parameters.txt", "--enable_batching"] | ||
``` | ||
And the _batching_parameters.txt_ would look like this: | ||
``` | ||
max_batch_size { value: 8000 } | ||
batch_timeout_micros { value: 4 } | ||
max_enqueued_batches { value: 100 } | ||
num_batch_threads { value: 4 } | ||
``` | ||
With this configuration, we would achieve much better performance (both higher throughput and lower | ||
latency). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,73 @@ | ||
#!/bin/bash | ||
set -e | ||
|
||
# add timestamps | ||
|
||
function create_cluster() { | ||
local COMPONENT_NAME=$1 | ||
local CLUSTER_NAME=$2 | ||
local CLUSTER_ZONE=$3 | ||
local GCP_PROJECT=$4 | ||
local MACHINE_TYPE=$5 | ||
# get master cidr block of existing cluster | ||
local MASTER_CIDR | ||
MASTER_CIDR=$(gcloud container clusters describe "${CLUSTER_NAME}" --project="${GCP_PROJECT}" --zone="${CLUSTER_ZONE}" --format="value(privateClusterConfig.masterIpv4CidrBlock)" 2>> /dev/null) | ||
|
||
# delete existing cluster with the same name if exists | ||
gcloud container clusters delete "${CLUSTER_NAME}" --zone="${CLUSTER_ZONE}" --project="${GCP_PROJECT}" --quiet || true | ||
|
||
# calculate cidr block for master network of new cluster | ||
if [ -z "$MASTER_CIDR" ]; then | ||
COUNT_CLUSTERS=$(gcloud container clusters list --project="${GCP_PROJECT}" --format="value(name)" | wc -l) | ||
CIDR_BEGIN=$(( COUNT_CLUSTERS*16 )) | ||
MASTER_CIDR="172.16.0.${CIDR_BEGIN}/28" | ||
fi | ||
|
||
echo "Kubernetes master address range: ${MASTER_CIDR}" | ||
# create cluster | ||
gcloud container clusters create "${CLUSTER_NAME}" --master-ipv4-cidr="${MASTER_CIDR}" --zone="${CLUSTER_ZONE}" --machine-type="${MACHINE_TYPE}" --enable-private-nodes --enable-ip-alias --no-enable-master-authorized-networks --project="${GCP_PROJECT}" | ||
# get tensorflow lb ip | ||
gcloud container clusters get-credentials "${CLUSTER_NAME}" --zone="${CLUSTER_ZONE}" --project="${GCP_PROJECT}" | ||
} | ||
|
||
function deploy_app() { | ||
local COMPONENT_NAME=$1 | ||
local CLUSTER_NAME=$2 | ||
local CLUSTER_ZONE=$3 | ||
local GCP_PROJECT=$4 | ||
# deploy the app | ||
(cd "${COMPONENT_NAME}" && gcloud builds submit --project="${GCP_PROJECT}" --substitutions=_CLOUDSDK_COMPUTE_ZONE="${CLUSTER_ZONE}" --substitutions=_CLOUDSDK_CONTAINER_CLUSTER="${CLUSTER_NAME}" --quiet) | ||
|
||
} | ||
|
||
|
||
# DEPLOY TF | ||
COMPONENT_NAME="tensorflow" | ||
|
||
# generate cluster name | ||
TENSORFLOW_CLUSTER_NAME="loadtest-${COMPONENT_NAME}-${TENSORFLOW_MACHINE_TYPE}" | ||
|
||
# deploy tensorflow app | ||
create_cluster "$COMPONENT_NAME" "$TENSORFLOW_CLUSTER_NAME" "$CLUSTER_ZONE" "$GCP_PROJECT" "$TENSORFLOW_MACHINE_TYPE" | ||
deploy_app "$COMPONENT_NAME" "$TENSORFLOW_CLUSTER_NAME" "$CLUSTER_ZONE" "$GCP_PROJECT" | ||
# remove later | ||
gcloud container clusters get-credentials "${TENSORFLOW_CLUSTER_NAME}" --zone="${CLUSTER_ZONE}" --project="${GCP_PROJECT}" | ||
TF_SVC_IP=$(kubectl get svc tensorflow-app -o custom-columns='ip:.status.loadBalancer.ingress[0].ip' --no-headers) | ||
echo "${COMPONENT_NAME} service ip address: ${TF_SVC_IP}" | ||
|
||
# get cpuPlatform | ||
CLUSTER_NODE_POOL=$(gcloud container node-pools list --project="${GCP_PROJECT}" --cluster="${TENSORFLOW_CLUSTER_NAME}" --zone="${CLUSTER_ZONE}" --format="value(name)") | ||
CLUSTER_INSTANCE_GROUP=$(gcloud container node-pools describe "${CLUSTER_NODE_POOL}" --project="${GCP_PROJECT}" --cluster="${TENSORFLOW_CLUSTER_NAME}" --zone="${CLUSTER_ZONE}" --format="value(instanceGroupUrls)") | ||
CLUSTER_NODE_NAMES=$(gcloud compute instance-groups list-instances "${CLUSTER_INSTANCE_GROUP##*/}" --project="${GCP_PROJECT}" --zone="${CLUSTER_ZONE}" --format="value(NAME)") | ||
while IFS= read -r NODE; do | ||
CLUSTER_NODE_CPU_PLATFORM=$(gcloud compute instances describe "${NODE}" --project="${GCP_PROJECT}" --format="value(cpuPlatform)" --zone="${CLUSTER_ZONE}") | ||
echo "NODE: ${NODE} CPU: ${CLUSTER_NODE_CPU_PLATFORM}" | ||
done <<< "${CLUSTER_NODE_NAMES}" | ||
|
||
|
||
COMPONENT_NAME="locust" | ||
LOCUST_CLUSTER_NAME="loadtest-${COMPONENT_NAME}-${LOCUST_MACHINE_TYPE}" | ||
|
||
# deploy locust app | ||
create_cluster "$COMPONENT_NAME" "$LOCUST_CLUSTER_NAME" "$CLUSTER_ZONE" "$GCP_PROJECT" "$LOCUST_MACHINE_TYPE" | ||
deploy_app "$COMPONENT_NAME" "$LOCUST_CLUSTER_NAME" "$CLUSTER_ZONE" "$GCP_PROJECT" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
FROM locustio/locust:1.2.3 | ||
|
||
USER root | ||
RUN pip install pandas | ||
|
||
ADD locustfile.py /locustfile.py | ||
|
||
ADD testdata /testdata |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
steps: | ||
- id: "Build the container image" | ||
name: "gcr.io/cloud-builders/docker" | ||
args: ["build", "-t", "gcr.io/${_PROJECT_ID}/${_IMAGE_NAME}:${_IMAGE_TAG}", "."] | ||
|
||
- id: "Push the container image" | ||
name: "gcr.io/cloud-builders/docker" | ||
args: ["push", "gcr.io/${_PROJECT_ID}/${_IMAGE_NAME}:${_IMAGE_TAG}"] | ||
- id: "Deploy the container image to GKE" | ||
name: "gcr.io/cloud-builders/gke-deploy" | ||
args: | ||
- run | ||
- --filename=kubernetes.yaml | ||
- --image=gcr.io/${_PROJECT_ID}/${_IMAGE_NAME}:${_IMAGE_TAG} | ||
- --location=${_CLOUDSDK_COMPUTE_ZONE} | ||
- --cluster=${_CLOUDSDK_CONTAINER_CLUSTER} | ||
substitutions: | ||
_IMAGE_NAME: locust-test | ||
_IMAGE_TAG: latest |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,82 @@ | ||
apiVersion: apps/v1 | ||
kind: Deployment | ||
metadata: | ||
name: locust-master | ||
labels: | ||
name: locust-master | ||
spec: | ||
replicas: 1 | ||
selector: | ||
matchLabels: | ||
app: locust-master | ||
template: | ||
metadata: | ||
labels: | ||
app: locust-master | ||
spec: | ||
containers: | ||
- name: locust-master | ||
image: gcr.io/${GCP_PROJECT}/locust-test:latest | ||
args: ["--config=/master.conf"] | ||
ports: | ||
- name: loc-master-web | ||
containerPort: 8089 | ||
protocol: TCP | ||
- name: loc-master-p1 | ||
containerPort: 5557 | ||
protocol: TCP | ||
- name: loc-master-p2 | ||
containerPort: 5558 | ||
protocol: TCP | ||
resources: | ||
requests: | ||
memory: "265Mi" | ||
cpu: "100m" | ||
--- | ||
apiVersion: apps/v1 | ||
kind: Deployment | ||
metadata: | ||
name: locust-worker | ||
labels: | ||
name: locust-worker | ||
spec: | ||
replicas: 5 | ||
selector: | ||
matchLabels: | ||
app: locust-worker | ||
template: | ||
metadata: | ||
labels: | ||
app: locust-worker | ||
spec: | ||
containers: | ||
- name: locust-worker | ||
GCP_PROJECT args: ["--config=/worker.conf"] | ||
resources: | ||
requests: | ||
memory: "128Mi" | ||
cpu: "500m" | ||
--- | ||
kind: Service | ||
apiVersion: v1 | ||
metadata: | ||
name: locust-master | ||
labels: | ||
app: locust-master | ||
spec: | ||
ports: | ||
- port: 8089 | ||
targetPort: loc-master-web | ||
protocol: TCP | ||
name: loc-master-web | ||
- port: 5557 | ||
targetPort: loc-master-p1 | ||
protocol: TCP | ||
name: loc-master-p1 | ||
- port: 5558 | ||
targetPort: loc-master-p2 | ||
protocol: TCP | ||
name: loc-master-p2 | ||
selector: | ||
app: locust-master | ||
type: ClusterIP |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
#!/usr/bin/env python | ||
|
||
# Copyright 2020 Google Inc. All rights reserved. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
import json | ||
import os | ||
|
||
from locust import HttpUser, task, between | ||
import pandas as pd | ||
|
||
|
||
class RegressionUser(HttpUser): | ||
wait_time = between(0., 0.2) | ||
|
||
def __init__(self, *args, **kwargs): | ||
super().__init__(*args, **kwargs) | ||
self.testdata = pd.read_csv( | ||
os.environ.get('testdata_path', '/testdata/regression_test.csv')) | ||
self.batch_size = os.environ.get('batch_size', 1) | ||
|
||
@task(1) | ||
def test(self): | ||
rows = self.testdata.sample(self.batch_size, replace=True).iterrows() | ||
instances = [{k: [v] for k, v in row.to_dict().items()} | ||
for _, row in rows] | ||
data = json.dumps( | ||
{"signature_name": "serving_default", | ||
"instances": instances}) | ||
self.client.post('v1/models/regression:predict', data=data, | ||
headers={"content-type": "application/json"}) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
master | ||
locustfile=/locustfile.py | ||
expect-workers=5 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
worker | ||
locustfile=/locustfile.py | ||
master-host=locust-master |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
FROM tensorflow/serving:latest | ||
|
||
ADD batching_parameters.txt /benchmark/batching_parameters.txt | ||
ADD models.config /benchmark/models.config | ||
|
||
ADD saved_model_regression /models/regression |
4 changes: 4 additions & 0 deletions
4
examples/tf-load-testing/tensorflow/batching_parameters_default.txt
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
max_batch_size {{ value: {max_batch_size} }} | ||
batch_timeout_micros {{ value: {batch_timeout_micros} }} | ||
max_enqueued_batches {{ value: {max_enqueued_batches} }} | ||
num_batch_threads {{ value: {num_batch_threads} }} |
Oops, something went wrong.