diff --git a/docs/readthedocs/source/doc/Serving/Example/cluster-serving-http-example.ipynb b/docs/readthedocs/source/doc/Serving/Example/cluster-serving-http-example.ipynb
new file mode 100644
index 00000000000..231cff66bae
--- /dev/null
+++ b/docs/readthedocs/source/doc/Serving/Example/cluster-serving-http-example.ipynb
@@ -0,0 +1,858 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "In this example, we will use tensorflow.keras package to create a keras image classification application using model MobileNetV2, and transfer the application to Cluster Serving step by step."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Original Keras application\n",
+    "We will first show an original Keras application, which download the data and preprocess it, then create the MobileNetV2 model to predict."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import tensorflow as tf\n",
+    "import os\n",
+    "import PIL"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "'2.4.1'"
+      ]
+     },
+     "execution_count": 2,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "tf.__version__"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Found 1000 images belonging to 2 classes.\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Obtain data from url:\"https://storage.googleapis.com/mledu-datasets/cats_and_dogs_filtered.zip\"\n",
+    "zip_file = tf.keras.utils.get_file(origin=\"https://storage.googleapis.com/mledu-datasets/cats_and_dogs_filtered.zip\",\n",
+    "                                   fname=\"cats_and_dogs_filtered.zip\", extract=True)\n",
+    "\n",
+    "# Find the directory of validation set\n",
+    "base_dir, _ = os.path.splitext(zip_file)\n",
+    "test_dir = os.path.join(base_dir, 'validation')\n",
+    "# Set images size to 160x160x3\n",
+    "image_size = 160\n",
+    "\n",
+    "# Rescale all images by 1./255 and apply image augmentation\n",
+    "test_datagen = tf.keras.preprocessing.image.ImageDataGenerator(rescale=1./255)\n",
+    "\n",
+    "# Flow images using generator to the test_generator\n",
+    "test_generator = test_datagen.flow_from_directory(\n",
+    "                test_dir,\n",
+    "                target_size=(image_size, image_size),\n",
+    "                batch_size=1,\n",
+    "                class_mode='binary')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Create the base model from the pre-trained model MobileNet V2\n",
+    "IMG_SHAPE=(160,160,3)\n",
+    "model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE,\n",
+    "                                               include_top=False,\n",
+    "                                               weights='imagenet')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "In keras, input could be ndarray, or generator. We could just use `model.predict(test_generator)`. But to simplify, here we just input the first record to model."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[[[[0.         0.         0.         ... 0.         0.\n",
+      "    0.        ]\n",
+      "   [0.         0.         0.         ... 0.         0.\n",
+      "    0.        ]\n",
+      "   [0.         0.         0.         ... 0.         0.\n",
+      "    0.        ]\n",
+      "   [0.         0.         0.         ... 0.         0.\n",
+      "    0.        ]\n",
+      "   [0.         0.         0.         ... 0.         0.\n",
+      "    0.        ]]\n",
+      "\n",
+      "  [[0.         0.         0.         ... 0.         0.\n",
+      "    0.        ]\n",
+      "   [0.997349   0.         0.         ... 0.         0.96874905\n",
+      "    0.        ]\n",
+      "   [1.8385804  0.3380084  2.4926844  ... 0.         0.14267397\n",
+      "    0.        ]\n",
+      "   [0.         0.         3.576158   ... 0.         0.\n",
+      "    0.        ]\n",
+      "   [0.         0.         0.         ... 0.         0.\n",
+      "    0.        ]]\n",
+      "\n",
+      "  [[0.         0.         0.         ... 0.         0.\n",
+      "    0.        ]\n",
+      "   [0.         0.0062952  0.         ... 0.         0.15311003\n",
+      "    0.        ]\n",
+      "   [0.         1.7324333  1.1691046  ... 0.         0.9847245\n",
+      "    0.        ]\n",
+      "   [0.         0.84404707 3.2351522  ... 0.         0.\n",
+      "    0.        ]\n",
+      "   [0.         0.         0.         ... 0.         0.\n",
+      "    0.        ]]\n",
+      "\n",
+      "  [[0.         0.         0.         ... 0.         0.\n",
+      "    0.3681116 ]\n",
+      "   [0.         3.3440204  0.5372138  ... 0.         0.\n",
+      "    0.79515934]\n",
+      "   [0.         3.0932055  3.5937624  ... 0.         0.\n",
+      "    0.66862965]\n",
+      "   [0.         1.4007983  0.         ... 0.         0.\n",
+      "    2.8901892 ]\n",
+      "   [0.         0.         0.         ... 0.         0.\n",
+      "    0.        ]]\n",
+      "\n",
+      "  [[0.         0.         0.         ... 0.         0.\n",
+      "    0.73307323]\n",
+      "   [0.         0.         0.         ... 0.         0.\n",
+      "    2.9129057 ]\n",
+      "   [0.         0.         0.6134901  ... 0.         0.\n",
+      "    2.7102432 ]\n",
+      "   [0.         0.         0.         ... 0.         0.\n",
+      "    1.8489733 ]\n",
+      "   [0.         0.         0.         ... 0.         0.\n",
+      "    0.22623205]]]]\n"
+     ]
+    }
+   ],
+   "source": [
+    "prediction=model.predict(test_generator.next()[0])\n",
+    "print(prediction)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Great! Now the Keras application is completed. "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Export TensorFlow Saved Model\n",
+    "Next, we transfer the application to Cluster Serving. The first step is to save the model to SavedModel format."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "WARNING:tensorflow:From /home/user/anaconda3/envs/rec/lib/python3.6/site-packages/tensorflow/python/ops/resource_variable_ops.py:1817: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.\n",
+      "Instructions for updating:\n",
+      "If using Keras pass *_constraint arguments to layers.\n",
+      "INFO:tensorflow:Assets written to: /tmp/transfer_learning_mobilenetv2/assets\n",
+      "assets\tsaved_model.pb\tvariables\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Save trained model to ./transfer_learning_mobilenetv2\n",
+    "model.save('/tmp/transfer_learning_mobilenetv2')\n",
+    "! ls /tmp/transfer_learning_mobilenetv2"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Deploy Cluster Serving\n",
+    "After model prepared, we start to deploy it on Cluster Serving.\n",
+    "\n",
+    "First install Cluster Serving"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Looking in indexes: http://10.239.45.10:8081/repository/pypi-group/simple, https://pypi.tuna.tsinghua.edu.cn/simple\n",
+      "Requirement already satisfied: bigdl-serving in /home/user/anaconda3/envs/rec/lib/python3.6/site-packages (0.9.0)\n",
+      "Requirement already satisfied: opencv-python in /home/user/anaconda3/envs/rec/lib/python3.6/site-packages (from bigdl-serving) (4.5.1.48)\n",
+      "Requirement already satisfied: httpx in /home/user/anaconda3/envs/rec/lib/python3.6/site-packages (from bigdl-serving) (0.16.1)\n",
+      "Requirement already satisfied: pyarrow in /home/user/.local/lib/python3.6/site-packages (from bigdl-serving) (1.0.1)\n",
+      "Requirement already satisfied: redis in /home/user/anaconda3/envs/rec/lib/python3.6/site-packages (from bigdl-serving) (3.5.3)\n",
+      "Requirement already satisfied: pyyaml in /home/user/anaconda3/envs/rec/lib/python3.6/site-packages (from bigdl-serving) (5.4.1)\n",
+      "Requirement already satisfied: rfc3986[idna2008]<2,>=1.3 in /home/user/anaconda3/envs/rec/lib/python3.6/site-packages (from httpx->bigdl-serving) (1.4.0)\n",
+      "Requirement already satisfied: certifi in /home/user/anaconda3/envs/rec/lib/python3.6/site-packages (from httpx->bigdl-serving) (2020.12.5)\n",
+      "Requirement already satisfied: httpcore==0.12.* in /home/user/anaconda3/envs/rec/lib/python3.6/site-packages (from httpx->bigdl-serving) (0.12.3)\n",
+      "Requirement already satisfied: sniffio in /home/user/anaconda3/envs/rec/lib/python3.6/site-packages (from httpx->bigdl-serving) (1.2.0)\n",
+      "Requirement already satisfied: h11==0.* in /home/user/anaconda3/envs/rec/lib/python3.6/site-packages (from httpcore==0.12.*->httpx->bigdl-serving) (0.12.0)\n",
+      "Requirement already satisfied: contextvars>=2.1 in /home/user/anaconda3/envs/rec/lib/python3.6/site-packages (from sniffio->httpx->bigdl-serving) (2.4)\n",
+      "Requirement already satisfied: immutables>=0.9 in /home/user/anaconda3/envs/rec/lib/python3.6/site-packages (from contextvars>=2.1->sniffio->httpx->bigdl-serving) (0.14)\n",
+      "Requirement already satisfied: idna in /home/user/anaconda3/envs/rec/lib/python3.6/site-packages (from rfc3986[idna2008]<2,>=1.3->httpx->bigdl-serving) (2.10)\n",
+      "Requirement already satisfied: numpy>=1.13.3 in /home/user/anaconda3/envs/rec/lib/python3.6/site-packages (from opencv-python->bigdl-serving) (1.19.2)\n",
+      "\u001b[33mWARNING: You are using pip version 20.3.3; however, version 21.0.1 is available.\n",
+      "You should consider upgrading via the '/home/user/anaconda3/envs/rec/bin/python -m pip install --upgrade pip' command.\u001b[0m\n"
+     ]
+    }
+   ],
+   "source": [
+    "! pip install bigdl-serving"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Cluster Serving has been properly set up.\n",
+      "You did not specify ANALYTICS_ZOO_VERSION, will download 0.9.0\n",
+      "ANALYTICS_ZOO_VERSION is 0.9.0\n",
+      "BIGDL_VERSION is 0.12.1\n",
+      "SPARK_VERSION is 2.4.3\n",
+      "2.4\n",
+      "--2021-02-07 10:01:46--  https://repo1.maven.org/maven2/com/intel/analytics/bigdl/bigdl-bigdl_0.12.1-spark_2.4.3/0.9.0/bigdl-bigdl_0.12.1-spark_2.4.3-0.9.0-serving.jar\n",
+      "Resolving child-prc.intel.com (child-prc.intel.com)... You are installing Cluster Serving by pip, downloading...\n",
+      "\n",
+      "SIGHUP received.\n",
+      "Redirecting output to ‘wget-log.2’.\n"
+     ]
+    }
+   ],
+   "source": [
+    "# we go to a new directory and initialize the environment\n",
+    "! mkdir cluster-serving\n",
+    "os.chdir('cluster-serving')\n",
+    "! cluster-serving-init"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "  2150K .......... .......... .......... .......... ..........  0% 27.0K 5h37m\r\n",
+      "  2200K .......... .......... .......... .......... ..........  0% 33.6K 5h36m\r\n",
+      "  2250K .......... .......... .......... .......... ..........  0% 27.3K 5h37m\r\n",
+      "  2300K .......... .......... .......... .......... ..........  0% 30.3K 5h36m\r\n",
+      "  2350K .......... .......... .......... .......... ..........  0% 29.7K 5h36m\r\n",
+      "  2400K .......... .......... .......... .......... ..........  0% 23.7K 5h38m\r\n",
+      "  2450K .......... .......... .......... .......... ..........  0% 23.4K 5h39m\r\n",
+      "  2500K .......... .......... .......... .......... ..........  0% 23.4K 5h41m\r\n",
+      "  2550K .......... .......... .......... .......... ..........  0% 22.3K 5h43m\r\n",
+      "  2600K .......... .......... .......... ....."
+     ]
+    }
+   ],
+   "source": [
+    "! tail wget-log.2"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# if you encounter slow download issue like above, you can just use following command to download\n",
+    "# ! wget https://repo1.maven.org/maven2/com/intel/analytics/bigdl/bigdl-bigdl_0.12.1-spark_2.4.3/0.9.0/bigdl-bigdl_0.12.1-spark_2.4.3-0.9.0-serving.jar\n",
+    "\n",
+    "# if you are using wget to download, call mv *serving.jar bigdl.jar again after downloaded."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "config.yaml  bigdl.jar\r\n"
+     ]
+    }
+   ],
+   "source": [
+    "# After initialization finished, check the directory\n",
+    "! ls"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We config the model path in `config.yaml` to following (the detail of config is at [Cluster Serving Configuration](https://github.com/intel-analytics/bigdl/blob/master/docs/docs/ClusterServingGuide/ProgrammingGuide.md#2-configuration))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "## BigDL Cluster Serving\n",
+    "\n",
+    "model:\n",
+    "  # model path must be provided\n",
+    "  path: /tmp/transfer_learning_mobilenetv2"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "## BigDL Cluster Serving\r\n",
+      "\r\n",
+      "model:\r\n",
+      "  # model path must be provided\r\n",
+      "  path: /tmp/transfer_learning_mobilenetv2\r\n",
+      "  # name, default is serving_stream, you need to specify if running multiple servings\r\n",
+      "  name:\r\n",
+      "data:\r\n",
+      "  # default, localhost:6379\r\n",
+      "  src:\r\n"
+     ]
+    }
+   ],
+   "source": [
+    "! head config.yaml"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Start Cluster Serving\n",
+    "\n",
+    "Cluster Serving requires Flink and Redis installed, and corresponded environment variables set, check [Cluster Serving Installation Guide](https://github.com/intel-analytics/bigdl/blob/master/docs/docs/ClusterServingGuide/ProgrammingGuide.md#1-installation) for detail.\n",
+    "\n",
+    "Flink cluster should start before Cluster Serving starts, if Flink cluster is not started, call following to start a local Flink cluster."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 16,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Starting cluster.\n",
+      "Starting standalonesession daemon on host my-PC.\n",
+      "Starting taskexecutor daemon on host my-PC.\n"
+     ]
+    }
+   ],
+   "source": [
+    "! $FLINK_HOME/bin/start-cluster.sh"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "After configuration, start Cluster Serving by `cluster-serving-start` (the detail is at [Cluster Serving Programming Guide](https://github.com/intel-analytics/bigdl/blob/master/docs/docs/ClusterServingGuide/ProgrammingGuide.md#3-launching-service))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 18,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "model_path=\"/tmp/transfer_learning_mobilenetv2\"\n",
+      "redis_timeout=\"5000\"\n",
+      "Redis maxmemory is not set, using default value 8G\n",
+      "redis server started, please check log in redis.log\n",
+      "OK\n",
+      "OK\n",
+      "OK\n",
+      "redis config maxmemory set to 8G\n",
+      "OK\n",
+      "OK\n",
+      "SLF4J: Class path contains multiple SLF4J bindings.\n",
+      "SLF4J: Found binding in [jar:file:/home/user/dep/flink-1.11.2/lib/bigdl-bigdl_0.12.0-spark_2.4.3-0.9.0-SNAPSHOT-serving.jar!/org/slf4j/impl/StaticLoggerBinder.class]\n",
+      "SLF4J: Found binding in [jar:file:/home/user/dep/flink-1.11.2/lib/log4j-slf4j-impl-2.12.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]\n",
+      "SLF4J: Found binding in [jar:file:/home/user/dep/flink-1.11.2/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]\n",
+      "SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.\n",
+      "SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]\n",
+      "log4j:WARN No appenders could be found for logger (org.apache.flink.client.cli.CliFrontend).\n",
+      "log4j:WARN Please initialize the log4j system properly.\n",
+      "log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.\n",
+      "Starting new Cluster Serving job.\n",
+      "Cluster Serving job submitted, check log in log-cluster_serving-serving_stream.txt\n",
+      "To list Cluster Serving job status, use cluster-serving-cli list\n",
+      "SLF4J: Class path contains multiple SLF4J bindings.\n",
+      "SLF4J: Found binding in [jar:file:/home/user/dep/flink-1.11.2/lib/bigdl-bigdl_0.12.0-spark_2.4.3-0.9.0-SNAPSHOT-serving.jar!/org/slf4j/impl/StaticLoggerBinder.class]\n",
+      "SLF4J: Found binding in [jar:file:/home/user/dep/flink-1.11.2/lib/log4j-slf4j-impl-2.12.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]\n",
+      "SLF4J: Found binding in [jar:file:/home/user/dep/flink-1.11.2/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]\n",
+      "SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.\n",
+      "SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]\n",
+      "log4j:WARN No appenders could be found for logger (org.apache.flink.client.cli.CliFrontend).\n",
+      "log4j:WARN Please initialize the log4j system properly.\n",
+      "log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.\n",
+      "[Full GC (Metadata GC Threshold)  32304K->20432K(1030144K), 0.0213821 secs]\n"
+     ]
+    }
+   ],
+   "source": [
+    "! cluster-serving-start"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Prediction using Cluster Serving\n",
+    "Next we start Cluster Serving code at python client."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 19,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "redis group exist, will not create new one\n",
+      "redis group exist, will not create new one\n"
+     ]
+    }
+   ],
+   "source": [
+    "from bigdl.serving.client import InputQueue, OutputQueue\n",
+    "input_queue = InputQueue()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "In Cluster Serving, only NdArray is supported as input. Thus, we first transform the generator to ndarray (If you do not know how to transform your input to NdArray, you may get help at [data transform guide](https://github.com/intel-analytics/bigdl/tree/master/docs/docs/ClusterServingGuide/OtherFrameworkUsers#data))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 20,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "array([[[[0.41176474, 0.50980395, 0.5882353 ],\n",
+       "         [0.42352945, 0.47450984, 0.50980395],\n",
+       "         [0.4901961 , 0.5058824 , 0.5019608 ],\n",
+       "         ...,\n",
+       "         [0.5764706 , 0.6392157 , 0.7019608 ],\n",
+       "         [0.454902  , 0.5176471 , 0.5803922 ],\n",
+       "         [0.3647059 , 0.427451  , 0.4784314 ]],\n",
+       "\n",
+       "        [[0.31764707, 0.38431376, 0.4156863 ],\n",
+       "         [0.35686275, 0.38431376, 0.40784317],\n",
+       "         [0.34509805, 0.34509805, 0.3529412 ],\n",
+       "         ...,\n",
+       "         [0.5803922 , 0.64705884, 0.6862745 ],\n",
+       "         [0.48627454, 0.5529412 , 0.5921569 ],\n",
+       "         [0.48235297, 0.54509807, 0.59607846]],\n",
+       "\n",
+       "        [[0.4039216 , 0.4431373 , 0.44705886],\n",
+       "         [0.35686275, 0.36078432, 0.37647063],\n",
+       "         [0.46274513, 0.4431373 , 0.47058827],\n",
+       "         ...,\n",
+       "         [0.53333336, 0.6       , 0.6313726 ],\n",
+       "         [0.47450984, 0.5411765 , 0.5686275 ],\n",
+       "         [0.5137255 , 0.5764706 , 0.627451  ]],\n",
+       "\n",
+       "        ...,\n",
+       "\n",
+       "        [[0.44705886, 0.5019608 , 0.54509807],\n",
+       "         [0.42352945, 0.48627454, 0.5372549 ],\n",
+       "         [0.37647063, 0.43921572, 0.49803925],\n",
+       "         ...,\n",
+       "         [0.69411767, 0.69411767, 0.69411767],\n",
+       "         [0.6745098 , 0.6745098 , 0.68235296],\n",
+       "         [0.6392157 , 0.63529414, 0.6666667 ]],\n",
+       "\n",
+       "        [[0.3647059 , 0.41960788, 0.454902  ],\n",
+       "         [0.35686275, 0.427451  , 0.47450984],\n",
+       "         [0.3254902 , 0.3921569 , 0.454902  ],\n",
+       "         ...,\n",
+       "         [0.5647059 , 0.5647059 , 0.5647059 ],\n",
+       "         [0.627451  , 0.627451  , 0.63529414],\n",
+       "         [0.7176471 , 0.70980394, 0.76470596]],\n",
+       "\n",
+       "        [[0.34117648, 0.40784317, 0.43529415],\n",
+       "         [0.29803923, 0.37254903, 0.427451  ],\n",
+       "         [0.31764707, 0.3921569 , 0.45882356],\n",
+       "         ...,\n",
+       "         [0.454902  , 0.454902  , 0.46274513],\n",
+       "         [0.5803922 , 0.57254905, 0.6156863 ],\n",
+       "         [0.5137255 , 0.5019608 , 0.58431375]]]], dtype=float32)"
+      ]
+     },
+     "execution_count": 20,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "arr = test_generator.next()[0]\n",
+    "arr"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 23,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Write to Redis successful\n",
+      "redis group exist, will not create new one\n",
+      "Write to Redis successful\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Use async api to put and get, you have pass a name arg and use the name to get\n",
+    "input_queue.enqueue('my-input', t=arr)\n",
+    "output_queue = OutputQueue()\n",
+    "prediction = output_queue.query('my-input')\n",
+    "# Use sync api to predict, this will block until the result is get or timeout\n",
+    "prediction = input_queue.predict(arr)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 24,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "array([[[0.        , 0.        , 0.        , ..., 0.        ,\n",
+       "         0.        , 0.        ],\n",
+       "        [0.        , 0.        , 0.        , ..., 0.        ,\n",
+       "         0.        , 0.        ],\n",
+       "        [0.        , 0.        , 0.        , ..., 0.        ,\n",
+       "         0.        , 0.        ],\n",
+       "        [0.        , 0.        , 0.        , ..., 1.3543907 ,\n",
+       "         0.        , 0.        ],\n",
+       "        [0.        , 0.        , 0.        , ..., 4.1898136 ,\n",
+       "         0.        , 0.        ]],\n",
+       "\n",
+       "       [[0.        , 0.        , 0.        , ..., 0.        ,\n",
+       "         0.        , 0.        ],\n",
+       "        [0.        , 3.286649  , 0.        , ..., 0.        ,\n",
+       "         0.        , 0.        ],\n",
+       "        [0.        , 4.0817494 , 0.        , ..., 0.        ,\n",
+       "         0.        , 0.        ],\n",
+       "        [0.        , 3.3224926 , 0.        , ..., 1.4220613 ,\n",
+       "         0.        , 0.        ],\n",
+       "        [0.        , 0.        , 0.        , ..., 4.9100547 ,\n",
+       "         0.        , 0.        ]],\n",
+       "\n",
+       "       [[0.        , 0.        , 0.        , ..., 0.        ,\n",
+       "         0.        , 0.        ],\n",
+       "        [0.        , 1.5577714 , 0.        , ..., 0.        ,\n",
+       "         0.        , 0.        ],\n",
+       "        [0.        , 1.767426  , 0.        , ..., 0.        ,\n",
+       "         0.        , 0.        ],\n",
+       "        [0.        , 2.3534465 , 0.        , ..., 0.        ,\n",
+       "         0.        , 0.        ],\n",
+       "        [0.        , 0.        , 0.        , ..., 0.21401057,\n",
+       "         0.        , 0.        ]],\n",
+       "\n",
+       "       [[0.        , 0.        , 0.        , ..., 0.        ,\n",
+       "         0.34797698, 0.        ],\n",
+       "        [0.        , 1.4496232 , 0.        , ..., 0.        ,\n",
+       "         1.6221215 , 0.        ],\n",
+       "        [0.        , 0.6171873 , 0.        , ..., 0.        ,\n",
+       "         0.        , 0.        ],\n",
+       "        [0.        , 0.        , 0.        , ..., 0.        ,\n",
+       "         0.        , 0.        ],\n",
+       "        [0.        , 0.        , 0.        , ..., 0.        ,\n",
+       "         0.        , 0.        ]],\n",
+       "\n",
+       "       [[0.        , 0.        , 0.        , ..., 0.        ,\n",
+       "         0.        , 0.        ],\n",
+       "        [0.        , 0.        , 0.        , ..., 0.        ,\n",
+       "         1.192298  , 0.        ],\n",
+       "        [0.        , 0.        , 0.        , ..., 0.        ,\n",
+       "         0.        , 0.        ],\n",
+       "        [0.        , 0.        , 0.        , ..., 0.        ,\n",
+       "         0.        , 0.        ],\n",
+       "        [0.        , 0.        , 0.        , ..., 0.        ,\n",
+       "         0.        , 0.        ]]], dtype=float32)"
+      ]
+     },
+     "execution_count": 24,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "prediction"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "If everything works well, the result `prediction` would be the exactly the same NdArray object with the output of original Keras model."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Next is the way to use http service through python."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# start the http server via jar\n",
+    "# ! java -jar bigdl-bigdl_0.10.0-spark_2.4.3-0.9.0-SNAPSHOT-http.jar"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "If you do not know how to find the jar or other http service, you may get help at [Cluster Serving http guide](https://github.com/intel-analytics/bigdl/blob/master/docs/docs/ClusterServingGuide/ProgrammingGuide.md#3-launching-service)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 25,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "welcome to BigDL web serving frontend"
+     ]
+    }
+   ],
+   "source": [
+    "! curl http://localhost:10020"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Cluster Serving provides an Python util `http_response_to_ndarray` which let user parse http response directly to ndarray, as following."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 26,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import json\n",
+    "import requests\n",
+    "import numpy as np\n",
+    "from bigdl.serving.client import http_response_to_ndarray"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 28,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "array([[[0.        , 0.        , 0.        , ..., 0.        ,\n",
+       "         0.        , 0.        ],\n",
+       "        [0.        , 0.        , 0.        , ..., 0.        ,\n",
+       "         0.        , 0.        ],\n",
+       "        [0.        , 0.        , 0.        , ..., 0.        ,\n",
+       "         0.        , 0.        ],\n",
+       "        [0.        , 0.        , 0.        , ..., 0.        ,\n",
+       "         0.        , 0.        ],\n",
+       "        [0.        , 0.        , 0.        , ..., 0.        ,\n",
+       "         0.        , 0.        ]],\n",
+       "\n",
+       "       [[0.        , 0.        , 0.        , ..., 0.        ,\n",
+       "         0.        , 0.        ],\n",
+       "        [0.        , 0.        , 0.        , ..., 0.        ,\n",
+       "         0.        , 0.        ],\n",
+       "        [0.        , 0.        , 0.        , ..., 0.        ,\n",
+       "         0.        , 0.        ],\n",
+       "        [0.        , 0.        , 0.        , ..., 0.        ,\n",
+       "         0.        , 0.        ],\n",
+       "        [0.        , 0.        , 0.        , ..., 0.        ,\n",
+       "         0.        , 0.        ]],\n",
+       "\n",
+       "       [[0.        , 0.        , 0.        , ..., 0.        ,\n",
+       "         0.        , 0.        ],\n",
+       "        [0.        , 0.        , 0.        , ..., 0.        ,\n",
+       "         0.        , 0.        ],\n",
+       "        [0.        , 0.        , 0.        , ..., 0.        ,\n",
+       "         0.        , 0.        ],\n",
+       "        [0.        , 0.        , 0.        , ..., 0.        ,\n",
+       "         0.        , 0.        ],\n",
+       "        [0.        , 0.        , 0.        , ..., 0.        ,\n",
+       "         0.        , 0.        ]],\n",
+       "\n",
+       "       [[0.        , 0.        , 0.        , ..., 0.        ,\n",
+       "         0.        , 0.        ],\n",
+       "        [0.        , 0.        , 0.7070324 , ..., 0.        ,\n",
+       "         0.        , 0.        ],\n",
+       "        [0.        , 0.        , 1.9520156 , ..., 0.        ,\n",
+       "         0.        , 0.        ],\n",
+       "        [0.        , 0.        , 0.        , ..., 0.        ,\n",
+       "         0.        , 0.        ],\n",
+       "        [0.        , 0.        , 0.        , ..., 0.        ,\n",
+       "         0.        , 0.        ]],\n",
+       "\n",
+       "       [[0.        , 0.        , 0.        , ..., 0.        ,\n",
+       "         0.        , 0.        ],\n",
+       "        [0.        , 0.        , 0.        , ..., 0.        ,\n",
+       "         0.        , 0.45007578],\n",
+       "        [0.        , 0.        , 0.        , ..., 0.        ,\n",
+       "         0.        , 0.        ],\n",
+       "        [0.        , 0.        , 0.        , ..., 0.        ,\n",
+       "         0.        , 0.        ],\n",
+       "        [0.        , 0.        , 0.        , ..., 0.        ,\n",
+       "         0.        , 0.        ]]])"
+      ]
+     },
+     "execution_count": 28,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "url = 'http://localhost:10020/predict'\n",
+    "d = json.dumps({\"instances\":[{\"floatTensor\": arr.tolist()}]})\n",
+    "r = requests.post(url, data=d)\n",
+    "\n",
+    "http_prediction = http_response_to_ndarray(r)\n",
+    "http_prediction"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 29,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# don't forget to delete the model you save for this tutorial\n",
+    "! rm -rf /tmp/transfer_learning_mobilenetv2"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "This is the end of this tutorial. If you have any question, you could raise an issue at [BigDL Github](https://github.com/intel-analytics/bigdl/issues)."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.12"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/docs/readthedocs/source/doc/Serving/Example/example.md b/docs/readthedocs/source/doc/Serving/Example/example.md
new file mode 100644
index 00000000000..94ec156b570
--- /dev/null
+++ b/docs/readthedocs/source/doc/Serving/Example/example.md
@@ -0,0 +1,119 @@
+# BigDL Cluster Serving Example
+
+There are some examples provided for new user or existing Tensorflow user.
+## End-to-end Example
+### TFDataSet: 
+[l08c08_forecasting_with_lstm.py](https://github.com/intel-analytics/bigdl/tree/master/docs/docs/ClusterServingGuide/OtherFrameworkUsers/l08c08_forecasting_with_lstm.py)
+### Tokenizer: 
+[l10c03_nlp_constructing_text_generation_model.py](https://github.com/intel-analytics/bigdl/tree/master/docs/docs/ClusterServingGuide/OtherFrameworkUsers/l10c03_nlp_constructing_text_generation_model.py) 
+### ImageDataGenerator: 
+[transfer_learning.py](https://github.com/intel-analytics/bigdl/tree/master/docs/docs/ClusterServingGuide/OtherFrameworkUsers/transfer_learning.py)
+
+## Model/Data Convert Guide
+This guide is for users who:
+
+* have written local code of Tensorflow, Pytorch(to be added)
+* have used specified data type of a specific framework, e.g. TFDataSet
+* want to deploy the local code on Cluster Serving but do not know how to write client code (Cluster Serving takes Numpy Ndarray as input, other types need to transform in advance).
+
+**If you have the above needs but fail to find the solution below, please [create issue here](https://github.com/intel-analytics/bigdl/issues)
+
+## Tensorflow
+
+Model - includes savedModel, Frozen Graph (savedModel is recommended).
+
+Data - includes [TFDataSet](#tfdataset), [Tokenizer](#tokenizer), [ImageDataGenerator](#imagedatagenerator)
+
+Notes - includes tips to notice, includes [savedModel tips](#notes---use-savedmodel)
+
+### Model - ckpt to savedModel
+#### tensorflow all version
+This method works in all version of TF
+
+You need to create the graph, get the output layer, create place holder for input, load the ckpt then save the model
+```
+# --- code you need to write
+input_layer = tf.placeholder(...)
+model = YourModel(...)
+output_layer = model.your_output_layer()
+# --- code you need to write
+with tf.Session() as sess:
+    saver = tf.train.Saver()
+    saver.restore(sess, tf.train.latest_checkpoint(FLAGS.ckpt_path))
+    tf.saved_model.simple_save(sess,
+                               FLAGS.export_path,
+                               inputs={
+                                   'input_layer': input_layer
+                               },
+                               outputs={"output_layer": output_layer})
+```
+
+#### tensorflow >= 1.15
+This method works if you are familiar with savedModel signature, and tensorflow >= 1.15
+
+model graph could be load via `.meta`, and load ckpt then save the model, signature_def_map is required to provide
+```
+# provide signature first
+inputs = tf.placeholder(...)
+outputs = tf.add(inputs, inputs)
+tensor_info_input = tf.saved_model.utils.build_tensor_info(inputs)
+tensor_info_output = tf.saved_model.utils.build_tensor_info(outputs)
+
+prediction_signature = (
+  tf.saved_model.signature_def_utils.build_signature_def(
+      inputs={'x_input': tensor_info_input},
+      outputs={'y_output': tensor_info_output},
+      method_name=tf.saved_model.signature_constants.PREDICT_METHOD_NAME))
+
+      
+# Your ckpt file is prefix.meta, prefix.index, etc
+ckpt_prefix = 'model/model.ckpt-xxxx'
+export_dir = 'saved_model'
+
+loaded_graph = tf.Graph()
+with tf.Session(graph=loaded_graph) as sess:
+    # load
+    loader = tf.train.import_meta_graph(ckpt_prefix + '.meta')
+    loader.restore(sess, ckpt_prefix)
+
+    # export
+    builder = tf.saved_model.builder.SavedModelBuilder(export_dir)
+    builder.add_meta_graph_and_variables(sess,
+                                         [tf.saved_model.tag_constants.TRAINING, tf.saved_model.tag_constants.SERVING],signature_def_map={
+      tf.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY:
+          prediction_signature 
+      }
+    )
+    builder.save()
+```
+### Model - Keras to savedModel
+#### tensorflow > 2.0
+```
+model = tf.keras.models.load_model("./model.h5")
+tf.saved_model.save(model, "saved_model")
+```
+### Model - ckpt to Frozen Graph
+[freeze checkpoint example](https://github.com/intel-analytics/bigdl/tree/master/pyzoo/bigdl/examples/tensorflow/freeze_checkpoint)
+### Notes - Use SavedModel
+If model has single tensor input, then nothing to notice.
+
+**If model has multiple input, please notice following.**
+
+When export, savedModel would store the inputs in alphabetical order. Use `saved_model_cli show --dir . --all` to see the order. e.g.
+```
+signature_def['serving_default']:
+  The given SavedModel SignatureDef contains the following input(s):
+    inputs['id1'] tensor_info:
+        dtype: DT_INT32
+        shape: (-1, 512)
+        name: id1:0
+    inputs['id2'] tensor_info:
+        dtype: DT_INT32
+        shape: (-1, 512)
+        name: id2:0
+
+```
+
+when enqueue to Cluster Serving, follow this order
+### Data
+To transform following data type to Numpy Ndarray, following examples are provided
diff --git a/docs/readthedocs/source/doc/Serving/Example/keras-to-cluster-serving-example.ipynb b/docs/readthedocs/source/doc/Serving/Example/keras-to-cluster-serving-example.ipynb
new file mode 100644
index 00000000000..dab5e65c978
--- /dev/null
+++ b/docs/readthedocs/source/doc/Serving/Example/keras-to-cluster-serving-example.ipynb
@@ -0,0 +1,720 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "In this example, we will use tensorflow.keras package to create a keras image classification application using model MobileNetV2, and transfer the application to Cluster Serving step by step."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Original Keras application\n",
+    "We will first show an original Keras application, which download the data and preprocess it, then create the MobileNetV2 model to predict."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import tensorflow as tf\n",
+    "import os\n",
+    "import PIL"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "'2.2.0'"
+      ]
+     },
+     "execution_count": 2,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "tf.__version__"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Found 1000 images belonging to 2 classes.\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Obtain data from url:\"https://storage.googleapis.com/mledu-datasets/cats_and_dogs_filtered.zip\"\n",
+    "zip_file = tf.keras.utils.get_file(origin=\"https://storage.googleapis.com/mledu-datasets/cats_and_dogs_filtered.zip\",\n",
+    "                                   fname=\"cats_and_dogs_filtered.zip\", extract=True)\n",
+    "\n",
+    "# Find the directory of validation set\n",
+    "base_dir, _ = os.path.splitext(zip_file)\n",
+    "test_dir = os.path.join(base_dir, 'validation')\n",
+    "# Set images size to 160x160x3\n",
+    "image_size = 160\n",
+    "\n",
+    "# Rescale all images by 1./255 and apply image augmentation\n",
+    "test_datagen = tf.keras.preprocessing.image.ImageDataGenerator(rescale=1./255)\n",
+    "\n",
+    "# Flow images using generator to the test_generator\n",
+    "test_generator = test_datagen.flow_from_directory(\n",
+    "                test_dir,\n",
+    "                target_size=(image_size, image_size),\n",
+    "                batch_size=1,\n",
+    "                class_mode='binary')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Create the base model from the pre-trained model MobileNet V2\n",
+    "IMG_SHAPE=(160,160,3)\n",
+    "model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE,\n",
+    "                                               include_top=False,\n",
+    "                                               weights='imagenet')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "In keras, input could be ndarray, or generator. We could just use `model.predict(test_generator)`. But to simplify, here we just input the first record to model."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[[[[0.         0.         0.         ... 0.         0.\n",
+      "    0.        ]\n",
+      "   [0.         0.         0.         ... 0.         0.\n",
+      "    0.        ]\n",
+      "   [0.         0.         0.         ... 0.         0.\n",
+      "    0.        ]\n",
+      "   [0.         0.         0.8406992  ... 0.         0.\n",
+      "    0.        ]\n",
+      "   [0.         0.         0.         ... 0.         0.\n",
+      "    0.        ]]\n",
+      "\n",
+      "  [[0.         0.         0.         ... 0.         0.\n",
+      "    0.        ]\n",
+      "   [0.         0.         0.         ... 0.         0.81465054\n",
+      "    0.        ]\n",
+      "   [0.         0.         0.         ... 0.         0.6572695\n",
+      "    0.23970175]\n",
+      "   [0.         0.         0.         ... 0.         1.2423501\n",
+      "    0.8024192 ]\n",
+      "   [0.         0.         0.         ... 0.         0.\n",
+      "    0.        ]]\n",
+      "\n",
+      "  [[0.         0.         0.         ... 0.         0.\n",
+      "    0.        ]\n",
+      "   [0.         0.         0.         ... 0.         5.185735\n",
+      "    0.21723604]\n",
+      "   [0.         0.         0.         ... 0.         4.6399093\n",
+      "    0.40124178]\n",
+      "   [0.3284886  0.         0.         ... 0.         5.295811\n",
+      "    3.4133787 ]\n",
+      "   [0.         0.         0.         ... 0.         0.\n",
+      "    0.        ]]\n",
+      "\n",
+      "  [[0.         0.         0.         ... 0.         0.\n",
+      "    0.        ]\n",
+      "   [0.         0.         0.         ... 0.         0.52712107\n",
+      "    0.20341969]\n",
+      "   [0.         0.         0.         ... 0.         0.8279238\n",
+      "    0.42696333]\n",
+      "   [0.         0.         0.         ... 0.         1.0344229\n",
+      "    1.5225778 ]\n",
+      "   [0.         0.         0.         ... 0.         0.\n",
+      "    0.        ]]\n",
+      "\n",
+      "  [[0.         0.         0.         ... 0.         0.\n",
+      "    0.        ]\n",
+      "   [0.         0.         0.         ... 0.         0.\n",
+      "    1.3237557 ]\n",
+      "   [0.         0.         0.         ... 0.         0.\n",
+      "    1.3395147 ]\n",
+      "   [0.         0.         0.         ... 0.         0.\n",
+      "    0.        ]\n",
+      "   [0.         0.         0.         ... 0.         0.\n",
+      "    0.        ]]]]\n"
+     ]
+    }
+   ],
+   "source": [
+    "prediction=model.predict(test_generator.next()[0])\n",
+    "print(prediction)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Great! Now the Keras application is completed. "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Export TensorFlow SavedModel\n",
+    "Next, we transfer the application to Cluster Serving. The first step is to save the model to SavedModel format."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "WARNING:tensorflow:From /home/user/anaconda3/envs/rec/lib/python3.6/site-packages/tensorflow/python/ops/resource_variable_ops.py:1817: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.\n",
+      "Instructions for updating:\n",
+      "If using Keras pass *_constraint arguments to layers.\n",
+      "INFO:tensorflow:Assets written to: /tmp/transfer_learning_mobilenetv2/assets\n",
+      "assets\tsaved_model.pb\tvariables\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Save trained model to ./transfer_learning_mobilenetv2\n",
+    "model.save('/tmp/transfer_learning_mobilenetv2')\n",
+    "! ls /tmp/transfer_learning_mobilenetv2"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Deploy Cluster Serving\n",
+    "After model prepared, we start to deploy it on Cluster Serving.\n",
+    "\n",
+    "First install Cluster Serving"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Looking in indexes: http://10.239.45.10:8081/repository/pypi-group/simple, https://pypi.tuna.tsinghua.edu.cn/simple\n",
+      "Requirement already satisfied: bigdl-serving in /home/user/anaconda3/envs/rec/lib/python3.6/site-packages (0.9.0)\n",
+      "Requirement already satisfied: opencv-python in /home/user/anaconda3/envs/rec/lib/python3.6/site-packages (from bigdl-serving) (4.5.1.48)\n",
+      "Requirement already satisfied: httpx in /home/user/anaconda3/envs/rec/lib/python3.6/site-packages (from bigdl-serving) (0.16.1)\n",
+      "Requirement already satisfied: pyarrow in /home/user/.local/lib/python3.6/site-packages (from bigdl-serving) (1.0.1)\n",
+      "Requirement already satisfied: redis in /home/user/anaconda3/envs/rec/lib/python3.6/site-packages (from bigdl-serving) (3.5.3)\n",
+      "Requirement already satisfied: pyyaml in /home/user/anaconda3/envs/rec/lib/python3.6/site-packages (from bigdl-serving) (5.4.1)\n",
+      "Requirement already satisfied: rfc3986[idna2008]<2,>=1.3 in /home/user/anaconda3/envs/rec/lib/python3.6/site-packages (from httpx->bigdl-serving) (1.4.0)\n",
+      "Requirement already satisfied: certifi in /home/user/anaconda3/envs/rec/lib/python3.6/site-packages (from httpx->bigdl-serving) (2020.12.5)\n",
+      "Requirement already satisfied: httpcore==0.12.* in /home/user/anaconda3/envs/rec/lib/python3.6/site-packages (from httpx->bigdl-serving) (0.12.3)\n",
+      "Requirement already satisfied: sniffio in /home/user/anaconda3/envs/rec/lib/python3.6/site-packages (from httpx->bigdl-serving) (1.2.0)\n",
+      "Requirement already satisfied: h11==0.* in /home/user/anaconda3/envs/rec/lib/python3.6/site-packages (from httpcore==0.12.*->httpx->bigdl-serving) (0.12.0)\n",
+      "Requirement already satisfied: contextvars>=2.1 in /home/user/anaconda3/envs/rec/lib/python3.6/site-packages (from sniffio->httpx->bigdl-serving) (2.4)\n",
+      "Requirement already satisfied: immutables>=0.9 in /home/user/anaconda3/envs/rec/lib/python3.6/site-packages (from contextvars>=2.1->sniffio->httpx->bigdl-serving) (0.14)\n",
+      "Requirement already satisfied: idna in /home/user/anaconda3/envs/rec/lib/python3.6/site-packages (from rfc3986[idna2008]<2,>=1.3->httpx->bigdl-serving) (2.10)\n",
+      "Requirement already satisfied: numpy>=1.13.3 in /home/user/anaconda3/envs/rec/lib/python3.6/site-packages (from opencv-python->bigdl-serving) (1.19.2)\n",
+      "\u001b[33mWARNING: You are using pip version 20.3.3; however, version 21.0.1 is available.\n",
+      "You should consider upgrading via the '/home/user/anaconda3/envs/rec/bin/python -m pip install --upgrade pip' command.\u001b[0m\n"
+     ]
+    }
+   ],
+   "source": [
+    "! pip install bigdl-serving"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Cluster Serving has been properly set up.\n",
+      "You did not specify ANALYTICS_ZOO_VERSION, will download 0.9.0\n",
+      "ANALYTICS_ZOO_VERSION is 0.9.0\n",
+      "BIGDL_VERSION is 0.12.1\n",
+      "SPARK_VERSION is 2.4.3\n",
+      "2.4\n",
+      "--2021-02-07 10:01:46--  https://repo1.maven.org/maven2/com/intel/analytics/bigdl/bigdl-bigdl_0.12.1-spark_2.4.3/0.9.0/bigdl-bigdl_0.12.1-spark_2.4.3-0.9.0-serving.jar\n",
+      "Resolving child-prc.intel.com (child-prc.intel.com)... You are installing Cluster Serving by pip, downloading...\n",
+      "\n",
+      "SIGHUP received.\n",
+      "Redirecting output to ‘wget-log.2’.\n"
+     ]
+    }
+   ],
+   "source": [
+    "# we go to a new directory and initialize the environment\n",
+    "! mkdir cluster-serving\n",
+    "os.chdir('cluster-serving')\n",
+    "! cluster-serving-init"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "  2150K .......... .......... .......... .......... ..........  0% 27.0K 5h37m\r\n",
+      "  2200K .......... .......... .......... .......... ..........  0% 33.6K 5h36m\r\n",
+      "  2250K .......... .......... .......... .......... ..........  0% 27.3K 5h37m\r\n",
+      "  2300K .......... .......... .......... .......... ..........  0% 30.3K 5h36m\r\n",
+      "  2350K .......... .......... .......... .......... ..........  0% 29.7K 5h36m\r\n",
+      "  2400K .......... .......... .......... .......... ..........  0% 23.7K 5h38m\r\n",
+      "  2450K .......... .......... .......... .......... ..........  0% 23.4K 5h39m\r\n",
+      "  2500K .......... .......... .......... .......... ..........  0% 23.4K 5h41m\r\n",
+      "  2550K .......... .......... .......... .......... ..........  0% 22.3K 5h43m\r\n",
+      "  2600K .......... .......... .......... ....."
+     ]
+    }
+   ],
+   "source": [
+    "! tail wget-log.2"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# if you encounter slow download issue like above, you can just use following command to download\n",
+    "# ! wget https://repo1.maven.org/maven2/com/intel/analytics/bigdl/bigdl-bigdl_0.12.1-spark_2.4.3/0.9.0/bigdl-bigdl_0.12.1-spark_2.4.3-0.9.0-serving.jar\n",
+    "\n",
+    "# if you are using wget to download, or get \"bigdl-xxx-serving.jar\" after \"ls\", please call mv *serving.jar bigdl.jar after downloaded."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "config.yaml  bigdl.jar\r\n"
+     ]
+    }
+   ],
+   "source": [
+    "# After initialization finished, check the directory\n",
+    "! ls"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We config the model path in `config.yaml` to following (the detail of config is at [Cluster Serving Configuration](https://github.com/intel-analytics/bigdl/blob/master/docs/docs/ClusterServingGuide/ProgrammingGuide.md#2-configuration))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "## BigDL Cluster Serving\n",
+    "\n",
+    "model:\n",
+    "  # model path must be provided\n",
+    "  path: /tmp/transfer_learning_mobilenetv2"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "## BigDL Cluster Serving\r\n",
+      "\r\n",
+      "model:\r\n",
+      "  # model path must be provided\r\n",
+      "  path: /tmp/transfer_learning_mobilenetv2\r\n",
+      "  # name, default is serving_stream, you need to specify if running multiple servings\r\n",
+      "  name:\r\n",
+      "data:\r\n",
+      "  # default, localhost:6379\r\n",
+      "  src:\r\n"
+     ]
+    }
+   ],
+   "source": [
+    "! head config.yaml"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Start Cluster Serving\n",
+    "\n",
+    "Cluster Serving requires Flink and Redis installed, and corresponded environment variables set, check [Cluster Serving Installation Guide](https://github.com/intel-analytics/bigdl/blob/master/docs/docs/ClusterServingGuide/ProgrammingGuide.md#1-installation) for detail.\n",
+    "\n",
+    "Flink cluster should start before Cluster Serving starts, if Flink cluster is not started, call following to start a local Flink cluster."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 16,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Starting cluster.\n",
+      "Starting standalonesession daemon on host my-PC.\n",
+      "Starting taskexecutor daemon on host my-PC.\n"
+     ]
+    }
+   ],
+   "source": [
+    "! $FLINK_HOME/bin/start-cluster.sh"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "After configuration, start Cluster Serving by `cluster-serving-start` (the detail is at [Cluster Serving Programming Guide](https://github.com/intel-analytics/bigdl/blob/master/docs/docs/ClusterServingGuide/ProgrammingGuide.md#3-launching-service))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 18,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "model_path=\"/tmp/transfer_learning_mobilenetv2\"\n",
+      "redis_timeout=\"5000\"\n",
+      "Redis maxmemory is not set, using default value 8G\n",
+      "redis server started, please check log in redis.log\n",
+      "OK\n",
+      "OK\n",
+      "OK\n",
+      "redis config maxmemory set to 8G\n",
+      "OK\n",
+      "OK\n",
+      "SLF4J: Class path contains multiple SLF4J bindings.\n",
+      "SLF4J: Found binding in [jar:file:/home/user/dep/flink-1.11.2/lib/bigdl-bigdl_0.12.0-spark_2.4.3-0.9.0-SNAPSHOT-serving.jar!/org/slf4j/impl/StaticLoggerBinder.class]\n",
+      "SLF4J: Found binding in [jar:file:/home/user/dep/flink-1.11.2/lib/log4j-slf4j-impl-2.12.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]\n",
+      "SLF4J: Found binding in [jar:file:/home/user/dep/flink-1.11.2/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]\n",
+      "SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.\n",
+      "SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]\n",
+      "log4j:WARN No appenders could be found for logger (org.apache.flink.client.cli.CliFrontend).\n",
+      "log4j:WARN Please initialize the log4j system properly.\n",
+      "log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.\n",
+      "Starting new Cluster Serving job.\n",
+      "Cluster Serving job submitted, check log in log-cluster_serving-serving_stream.txt\n",
+      "To list Cluster Serving job status, use cluster-serving-cli list\n",
+      "SLF4J: Class path contains multiple SLF4J bindings.\n",
+      "SLF4J: Found binding in [jar:file:/home/user/dep/flink-1.11.2/lib/bigdl-bigdl_0.12.0-spark_2.4.3-0.9.0-SNAPSHOT-serving.jar!/org/slf4j/impl/StaticLoggerBinder.class]\n",
+      "SLF4J: Found binding in [jar:file:/home/user/dep/flink-1.11.2/lib/log4j-slf4j-impl-2.12.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]\n",
+      "SLF4J: Found binding in [jar:file:/home/user/dep/flink-1.11.2/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]\n",
+      "SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.\n",
+      "SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]\n",
+      "log4j:WARN No appenders could be found for logger (org.apache.flink.client.cli.CliFrontend).\n",
+      "log4j:WARN Please initialize the log4j system properly.\n",
+      "log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.\n",
+      "[Full GC (Metadata GC Threshold)  32304K->20432K(1030144K), 0.0213821 secs]\n"
+     ]
+    }
+   ],
+   "source": [
+    "! cluster-serving-start"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Prediction using Cluster Serving\n",
+    "Next we start Cluster Serving code at python client."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 19,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "redis group exist, will not create new one\n",
+      "redis group exist, will not create new one\n"
+     ]
+    }
+   ],
+   "source": [
+    "from bigdl.serving.client import InputQueue, OutputQueue\n",
+    "input_queue = InputQueue()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "In Cluster Serving, only NdArray is supported as input. Thus, we first transform the generator to ndarray (If you do not know how to transform your input to NdArray, you may get help at [data transform guide](https://github.com/intel-analytics/bigdl/tree/master/docs/docs/ClusterServingGuide/OtherFrameworkUsers#data))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 20,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "array([[[[0.12156864, 0.11764707, 0.10980393],\n",
+       "         [0.12156864, 0.11764707, 0.10980393],\n",
+       "         [0.11764707, 0.1137255 , 0.10588236],\n",
+       "         ...,\n",
+       "         [0.28627452, 0.29803923, 0.22352943],\n",
+       "         [0.24705884, 0.25882354, 0.18431373],\n",
+       "         [0.24705884, 0.24705884, 0.20000002]],\n",
+       "\n",
+       "        [[0.15686275, 0.15294118, 0.14509805],\n",
+       "         [0.13725491, 0.13333334, 0.1254902 ],\n",
+       "         [0.09803922, 0.09411766, 0.08627451],\n",
+       "         ...,\n",
+       "         [0.31764707, 0.3254902 , 0.27450982],\n",
+       "         [0.31764707, 0.3254902 , 0.27058825],\n",
+       "         [0.2784314 , 0.2784314 , 0.2392157 ]],\n",
+       "\n",
+       "        [[0.21960786, 0.21568629, 0.20784315],\n",
+       "         [0.23137257, 0.227451  , 0.21960786],\n",
+       "         [0.24705884, 0.24313727, 0.23529413],\n",
+       "         ...,\n",
+       "         [0.29411766, 0.29803923, 0.27450982],\n",
+       "         [0.26666668, 0.27058825, 0.2392157 ],\n",
+       "         [0.30588236, 0.30588236, 0.26666668]],\n",
+       "\n",
+       "        ...,\n",
+       "\n",
+       "        [[0.35686275, 0.3019608 , 0.15686275],\n",
+       "         [0.38431376, 0.29803923, 0.14509805],\n",
+       "         [0.36862746, 0.25490198, 0.12156864],\n",
+       "         ...,\n",
+       "         [0.1764706 , 0.08627451, 0.01568628],\n",
+       "         [0.16862746, 0.08627451, 0.00392157],\n",
+       "         [0.1764706 , 0.08627451, 0.03137255]],\n",
+       "\n",
+       "        [[0.30980393, 0.2784314 , 0.13333334],\n",
+       "         [0.3529412 , 0.29411766, 0.14117648],\n",
+       "         [0.3529412 , 0.26666668, 0.12156864],\n",
+       "         ...,\n",
+       "         [0.1764706 , 0.08627451, 0.01568628],\n",
+       "         [0.17254902, 0.08235294, 0.01176471],\n",
+       "         [0.18039216, 0.09019608, 0.03529412]],\n",
+       "\n",
+       "        [[0.30588236, 0.27450982, 0.13333334],\n",
+       "         [0.33333334, 0.28627452, 0.12941177],\n",
+       "         [0.3372549 , 0.26666668, 0.11764707],\n",
+       "         ...,\n",
+       "         [0.19607845, 0.09411766, 0.03529412],\n",
+       "         [0.18039216, 0.07843138, 0.02745098],\n",
+       "         [0.1764706 , 0.08627451, 0.03137255]]]], dtype=float32)"
+      ]
+     },
+     "execution_count": 20,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "arr = test_generator.next()[0]\n",
+    "arr"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 23,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Write to Redis successful\n",
+      "redis group exist, will not create new one\n",
+      "Write to Redis successful\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Use async api to put and get, you have pass a name arg and use the name to get\n",
+    "input_queue.enqueue('my-input', t=arr)\n",
+    "output_queue = OutputQueue()\n",
+    "prediction = output_queue.query('my-input')\n",
+    "# Use sync api to predict, this will block until the result is get or timeout\n",
+    "prediction = input_queue.predict(arr)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 24,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "array([[[0.        , 0.        , 0.        , ..., 0.        ,\n",
+       "         0.        , 0.        ],\n",
+       "        [0.        , 0.        , 0.        , ..., 0.        ,\n",
+       "         0.        , 0.        ],\n",
+       "        [0.        , 0.        , 0.        , ..., 0.        ,\n",
+       "         0.        , 0.        ],\n",
+       "        [0.        , 0.        , 0.        , ..., 1.3543907 ,\n",
+       "         0.        , 0.        ],\n",
+       "        [0.        , 0.        , 0.        , ..., 4.1898136 ,\n",
+       "         0.        , 0.        ]],\n",
+       "\n",
+       "       [[0.        , 0.        , 0.        , ..., 0.        ,\n",
+       "         0.        , 0.        ],\n",
+       "        [0.        , 3.286649  , 0.        , ..., 0.        ,\n",
+       "         0.        , 0.        ],\n",
+       "        [0.        , 4.0817494 , 0.        , ..., 0.        ,\n",
+       "         0.        , 0.        ],\n",
+       "        [0.        , 3.3224926 , 0.        , ..., 1.4220613 ,\n",
+       "         0.        , 0.        ],\n",
+       "        [0.        , 0.        , 0.        , ..., 4.9100547 ,\n",
+       "         0.        , 0.        ]],\n",
+       "\n",
+       "       [[0.        , 0.        , 0.        , ..., 0.        ,\n",
+       "         0.        , 0.        ],\n",
+       "        [0.        , 1.5577714 , 0.        , ..., 0.        ,\n",
+       "         0.        , 0.        ],\n",
+       "        [0.        , 1.767426  , 0.        , ..., 0.        ,\n",
+       "         0.        , 0.        ],\n",
+       "        [0.        , 2.3534465 , 0.        , ..., 0.        ,\n",
+       "         0.        , 0.        ],\n",
+       "        [0.        , 0.        , 0.        , ..., 0.21401057,\n",
+       "         0.        , 0.        ]],\n",
+       "\n",
+       "       [[0.        , 0.        , 0.        , ..., 0.        ,\n",
+       "         0.34797698, 0.        ],\n",
+       "        [0.        , 1.4496232 , 0.        , ..., 0.        ,\n",
+       "         1.6221215 , 0.        ],\n",
+       "        [0.        , 0.6171873 , 0.        , ..., 0.        ,\n",
+       "         0.        , 0.        ],\n",
+       "        [0.        , 0.        , 0.        , ..., 0.        ,\n",
+       "         0.        , 0.        ],\n",
+       "        [0.        , 0.        , 0.        , ..., 0.        ,\n",
+       "         0.        , 0.        ]],\n",
+       "\n",
+       "       [[0.        , 0.        , 0.        , ..., 0.        ,\n",
+       "         0.        , 0.        ],\n",
+       "        [0.        , 0.        , 0.        , ..., 0.        ,\n",
+       "         1.192298  , 0.        ],\n",
+       "        [0.        , 0.        , 0.        , ..., 0.        ,\n",
+       "         0.        , 0.        ],\n",
+       "        [0.        , 0.        , 0.        , ..., 0.        ,\n",
+       "         0.        , 0.        ],\n",
+       "        [0.        , 0.        , 0.        , ..., 0.        ,\n",
+       "         0.        , 0.        ]]], dtype=float32)"
+      ]
+     },
+     "execution_count": 24,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "prediction"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "If everything works well, the result `prediction` would be the exactly the same NdArray object with the output of original Keras model."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 25,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# don't forget to delete the model you save for this tutorial\n",
+    "! rm -rf /tmp/transfer_learning_mobilenetv2"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "This is the end of this tutorial. If you have any question, you could raise an issue at [BigDL Github](https://github.com/intel-analytics/bigdl/issues)."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.7.10"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/docs/readthedocs/source/doc/Serving/Example/l08c08_forecasting_with_lstm.py b/docs/readthedocs/source/doc/Serving/Example/l08c08_forecasting_with_lstm.py
new file mode 100644
index 00000000000..612017b8252
--- /dev/null
+++ b/docs/readthedocs/source/doc/Serving/Example/l08c08_forecasting_with_lstm.py
@@ -0,0 +1,75 @@
+# Related url: https://github.com/tensorflow/examples/blob/master/courses/udacity_intro_to_tensorflow_for_deep_learning/l08c08_forecasting_with_lstm.ipynb
+# Forecasting with LSTM
+import numpy as np
+import tensorflow as tf
+import tensorflow.keras as keras
+
+# Get the trend with time and slope
+def trend(time, slope=0):
+    return slope * time
+
+
+# Get a specific pattern, which can be customerized
+def seasonal_pattern(season_time):
+    return np.where(season_time < 0.4,
+                    np.cos(season_time * 2 * np.pi),
+                    1 / np.exp(3 * season_time))
+
+# Repeats the same pattern at each period
+def seasonality(time, period, amplitude=1, phase=0):
+    season_time = ((time + phase) % period) / period
+    return amplitude * seasonal_pattern(season_time)
+
+# Obtain a random white noise
+def white_noise(time, noise_level=1, seed=None):
+    rnd = np.random.RandomState(seed)
+    return rnd.randn(len(time)) * noise_level
+
+# Convert the series to dataset form
+def ndarray_to_dataset(ndarray):
+    return tf.data.Dataset.from_tensor_slices(ndarray)
+
+# Convert the series to dataset with some modifications
+def sequential_window_dataset(series, window_size):
+    series = tf.expand_dims(series, axis=-1)
+    ds = ndarray_to_dataset(series)
+    ds = ds.window(window_size + 1, shift=window_size, drop_remainder=True)
+    ds = ds.flat_map(lambda window: window.batch(window_size + 1))
+    ds = ds.map(lambda window: (window[:-1], window[1:]))
+    return ds.batch(1).prefetch(1)
+
+# Convert dataset form to ndarray
+def dataset_to_ndarray(dataset):
+    array=list(dataset.as_numpy_iterator())
+    return np.ndarray(array)
+
+# Generate some raw test data
+time_range=4 * 365 + 1
+time = np.arange(time_range)
+
+slope = 0.05
+baseline = 10
+amplitude = 40
+series = baseline + trend(time, slope) + seasonality(time, period=365, amplitude=amplitude)
+
+noise_level = 5
+noise = white_noise(time, noise_level, seed=42)
+
+series += noise
+
+# Modify the raw test data with DataSet form
+tf.random.set_seed(42)
+np.random.seed(42)
+
+window_size = 30
+test_set = sequential_window_dataset(series, window_size)
+
+# Convert the DataSet form data to ndarry
+#pre_in=series[np.newaxis, :, np.newaxis]
+test_array=dataset_to_ndarray(test_set)
+
+# Load the saved LSTM model
+model=tf.keras.models.load_model("path/to/model")
+
+# Predict with LSTM model
+rnn_forecast_nd = model.predict(test_array)
diff --git a/docs/readthedocs/source/doc/Serving/Example/l10c03_nlp_constructing_text_generation_model.py b/docs/readthedocs/source/doc/Serving/Example/l10c03_nlp_constructing_text_generation_model.py
new file mode 100644
index 00000000000..3d27b9a09c4
--- /dev/null
+++ b/docs/readthedocs/source/doc/Serving/Example/l10c03_nlp_constructing_text_generation_model.py
@@ -0,0 +1,75 @@
+# Related url: https://github.com/tensorflow/examples/blob/master/courses/udacity_intro_to_tensorflow_for_deep_learning/l10c03_nlp_constructing_text_generation_model.ipynb
+# Generating some new lyrics from the trained model
+
+import tensorflow as tf
+from tensorflow.keras.preprocessing.text import Tokenizer
+from tensorflow.keras.preprocessing.sequence import pad_sequences
+
+# Other imports for processing data
+import string
+import numpy as np
+import pandas as pd
+
+# DATA PREPROCESSING
+# First to get the dataset of the Song Lyrics dataset on Kaggle by:
+# !wget --no-check-certificate \
+#    https://drive.google.com/uc?id=1LiJFZd41ofrWoBtW-pMYsfz1w8Ny0Bj8 \
+#    -O /tmp/songdata.csv
+
+# Then to generate a tokenizer with the songdata.csv
+def tokenize_corpus(corpus, num_words=-1):
+  # Fit a Tokenizer on the corpus
+  if num_words > -1:
+    tokenizer = Tokenizer(num_words=num_words)
+  else:
+    tokenizer = Tokenizer()
+  tokenizer.fit_on_texts(corpus)
+  return tokenizer
+
+def create_lyrics_corpus(dataset, field):
+  # Remove all other punctuation
+  dataset[field] = dataset[field].str.replace('[{}]'.format(string.punctuation), '')
+  # Make it lowercase
+  dataset[field] = dataset[field].str.lower()
+  # Make it one long string to split by line
+  lyrics = dataset[field].str.cat()
+  corpus = lyrics.split('\n')
+  # Remove any trailing whitespace
+  for l in range(len(corpus)):
+    corpus[l] = corpus[l].rstrip()
+  # Remove any empty lines
+  corpus = [l for l in corpus if l != '']
+
+  return corpus
+
+# Read the dataset from csv
+dataset = pd.read_csv('/tmp/songdata.csv', dtype=str)
+# Create the corpus using the 'text' column containing lyrics
+corpus = create_lyrics_corpus(dataset, 'text')
+# Tokenize the corpus
+tokenizer = tokenize_corpus(corpus)
+
+# Get the uniform input length (max_sequence_len) of the model
+max_sequence_len=0
+for line in corpus:
+    token_list = tokenizer.texts_to_sequences([line])[0]
+    max_sequence_len=max(max_sequence_len,len(token_list))
+
+# Load the saved model which is trained on the Song Lyrics dataset
+model=tf.keras.models.load_model("path/to/model")
+
+# Generate new lyrics with some "seed text"
+seed_text = "im feeling chills" # seed text can be customerized
+next_words = 100    # this defined the length of the new lyrics
+
+for _ in range(next_words):
+    token_list = tokenizer.texts_to_sequences([seed_text])[0]   # convert the seed text to ndarray
+    token_list = pad_sequences([token_list], maxlen=max_sequence_len - 1, padding='pre')    # pad the input for equal length
+    predicted = np.argmax(model.predict(token_list), axis=-1)   # get the predicted word index
+    output_word = ""
+    for word, index in tokenizer.word_index.items():
+        if index == predicted:
+            output_word = word
+            break
+    seed_text += " " + output_word  # add the predicted word to the seed text
+print(seed_text)
diff --git a/docs/readthedocs/source/doc/Serving/Example/tf1-to-cluster-serving-example.ipynb b/docs/readthedocs/source/doc/Serving/Example/tf1-to-cluster-serving-example.ipynb
new file mode 100644
index 00000000000..53e0b188ca5
--- /dev/null
+++ b/docs/readthedocs/source/doc/Serving/Example/tf1-to-cluster-serving-example.ipynb
@@ -0,0 +1,573 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "reported-geometry",
+   "metadata": {},
+   "source": [
+    "In this example, we will use tensorflow v1 (version 1.15) to create a simple MLP model, and transfer the application to Cluster Serving step by step.\n",
+    "\n",
+    "This tutorial is recommended for Tensorflow v1 user only. If you are not Tensorflow v1 user, the keras tutorial [here](#keras-to-cluster-serving-example.ipynb) is more recommended."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "athletic-trance",
+   "metadata": {},
+   "source": [
+    "### Original Tensorflow v1 Application"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "olive-dutch",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "'1.15.0'"
+      ]
+     },
+     "execution_count": 1,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "import tensorflow as tf\n",
+    "tf.__version__"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "vertical-recall",
+   "metadata": {},
+   "source": [
+    "We first define the Tensorflow graph, and create some data."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "tropical-clinton",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "WARNING:tensorflow:From <ipython-input-2-853d23643c61>:24: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n",
+      "Instructions for updating:\n",
+      "Use tf.where in 2.0, which has the same broadcast rule as np.where\n"
+     ]
+    }
+   ],
+   "source": [
+    "g = tf.Graph()\n",
+    "with g.as_default():\n",
+    "   \n",
+    "    # Graph Inputs\n",
+    "    features = tf.placeholder(dtype=tf.float32, \n",
+    "                              shape=[None, 2], name='features')\n",
+    "    targets = tf.placeholder(dtype=tf.float32, \n",
+    "                             shape=[None, 1], name='targets')\n",
+    "\n",
+    "    # Model Parameters\n",
+    "    weights = tf.Variable(tf.zeros(shape=[2, 1], \n",
+    "                          dtype=tf.float32), name='weights')\n",
+    "    bias = tf.Variable([[0.]], dtype=tf.float32, name='bias')\n",
+    "    \n",
+    "\n",
+    "    \n",
+    "    # Forward Pass\n",
+    "    linear = tf.add(tf.matmul(features, weights), bias, name='linear')\n",
+    "    ones = tf.ones(shape=tf.shape(linear)) \n",
+    "    zeros = tf.zeros(shape=tf.shape(linear))\n",
+    "    prediction = tf.where(condition=tf.less(linear, 0.),\n",
+    "                          x=zeros, \n",
+    "                          y=ones, \n",
+    "                          name='prediction')\n",
+    "    \n",
+    "    # Backward Pass\n",
+    "    errors = targets - prediction\n",
+    "    weight_update = tf.assign_add(weights, \n",
+    "                                  tf.reshape(errors * features, (2, 1)),\n",
+    "                                  name='weight_update')\n",
+    "    bias_update = tf.assign_add(bias, errors,\n",
+    "                                name='bias_update')\n",
+    "    \n",
+    "    train = tf.group(weight_update, bias_update, name='train')\n",
+    "    \n",
+    "    saver = tf.train.Saver(name='saver')\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "legislative-boutique",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "((3, 2), (3,))"
+      ]
+     },
+     "execution_count": 4,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "import numpy as np\n",
+    "x_train, y_train = np.array([[1,2],[3,4],[1,3]]), np.array([1,2,1])\n",
+    "x_train.shape, y_train.shape"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "coated-grill",
+   "metadata": {},
+   "source": [
+    "### Export TensorFlow SavedModel\n",
+    "Then, we train the graph and in the `with tf.Session`, we save the graph to SavedModel. The detailed code is following, and we could see the prediction result is `[1]` with input `[1,2]`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "detailed-message",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Model parameters:\n",
+      "\n",
+      "Weights:\n",
+      " [[15.]\n",
+      " [20.]]\n",
+      "Bias: [[5.]]\n",
+      "[[1.]\n",
+      " [1.]\n",
+      " [1.]]\n",
+      "WARNING:tensorflow:From <ipython-input-5-399ffbde562b>:26: simple_save (from tensorflow.python.saved_model.simple_save) is deprecated and will be removed in a future version.\n",
+      "Instructions for updating:\n",
+      "This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.simple_save.\n",
+      "WARNING:tensorflow:From /home/user/anaconda3/envs/tf1/lib/python3.7/site-packages/tensorflow_core/python/saved_model/signature_def_utils_impl.py:201: build_tensor_info (from tensorflow.python.saved_model.utils_impl) is deprecated and will be removed in a future version.\n",
+      "Instructions for updating:\n",
+      "This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.build_tensor_info or tf.compat.v1.saved_model.build_tensor_info.\n",
+      "INFO:tensorflow:Assets added to graph.\n",
+      "INFO:tensorflow:No assets to write.\n",
+      "INFO:tensorflow:SavedModel written to: /tmp/mlp_tf1/saved_model.pb\n"
+     ]
+    }
+   ],
+   "source": [
+    "with tf.Session(graph=g) as sess:\n",
+    "    \n",
+    "    sess.run(tf.global_variables_initializer())\n",
+    "    \n",
+    "    for epoch in range(5):\n",
+    "        for example, target in zip(x_train, y_train):\n",
+    "            feed_dict = {'features:0': example.reshape(-1, 2),\n",
+    "                         'targets:0': target.reshape(-1, 1)}\n",
+    "            _ = sess.run(['train'], feed_dict=feed_dict)\n",
+    "\n",
+    "\n",
+    "    w, b = sess.run(['weights:0', 'bias:0'])    \n",
+    "    print('Model parameters:\\n')\n",
+    "    print('Weights:\\n', w)\n",
+    "    print('Bias:', b)\n",
+    "\n",
+    "    saver.save(sess, save_path='perceptron')\n",
+    "    \n",
+    "    pred = sess.run('prediction:0', feed_dict={features: x_train})\n",
+    "    print(pred)\n",
+    "    \n",
+    "    # in this session, save the model to savedModel format\n",
+    "    inputs = dict([(features.name, features)])\n",
+    "    outputs = dict([(prediction.name, prediction)])\n",
+    "    inputs, outputs\n",
+    "    tf.saved_model.simple_save(sess, \"/tmp/mlp_tf1\", inputs, outputs)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "consolidated-newport",
+   "metadata": {},
+   "source": [
+    "### Deploy Cluster Serving\n",
+    "After model prepared, we start to deploy it on Cluster Serving.\n",
+    "\n",
+    "First install Cluster Serving"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "inner-texas",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Looking in indexes: http://10.239.45.10:8081/repository/pypi-group/simple, https://pypi.tuna.tsinghua.edu.cn/simple\n",
+      "Collecting bigdl-serving\n",
+      "  Downloading http://10.239.45.10:8081/repository/pypi-group/packages/bigdl-serving/0.9.0/analytics_zoo_serving-0.9.0-20201216-py2.py3-none-any.whl (17 kB)\n",
+      "Requirement already satisfied: httpx in /home/user/anaconda3/envs/tf1/lib/python3.7/site-packages (from bigdl-serving) (0.17.1)\n",
+      "Requirement already satisfied: pyarrow in /home/user/anaconda3/envs/tf1/lib/python3.7/site-packages (from bigdl-serving) (3.0.0)\n",
+      "Requirement already satisfied: pyyaml in /home/user/anaconda3/envs/tf1/lib/python3.7/site-packages (from bigdl-serving) (5.4.1)\n",
+      "Requirement already satisfied: redis in /home/user/anaconda3/envs/tf1/lib/python3.7/site-packages (from bigdl-serving) (3.5.3)\n",
+      "Requirement already satisfied: opencv-python in /home/user/anaconda3/envs/tf1/lib/python3.7/site-packages (from bigdl-serving) (4.5.1.48)\n",
+      "Requirement already satisfied: certifi in /home/user/anaconda3/envs/tf1/lib/python3.7/site-packages (from httpx->bigdl-serving) (2020.12.5)\n",
+      "Requirement already satisfied: httpcore<0.13,>=0.12.1 in /home/user/anaconda3/envs/tf1/lib/python3.7/site-packages (from httpx->bigdl-serving) (0.12.3)\n",
+      "Requirement already satisfied: rfc3986[idna2008]<2,>=1.3 in /home/user/anaconda3/envs/tf1/lib/python3.7/site-packages (from httpx->bigdl-serving) (1.4.0)\n",
+      "Requirement already satisfied: sniffio in /home/user/anaconda3/envs/tf1/lib/python3.7/site-packages (from httpx->bigdl-serving) (1.2.0)\n",
+      "Requirement already satisfied: h11==0.* in /home/user/anaconda3/envs/tf1/lib/python3.7/site-packages (from httpcore<0.13,>=0.12.1->httpx->bigdl-serving) (0.12.0)\n",
+      "Requirement already satisfied: idna in /home/user/anaconda3/envs/tf1/lib/python3.7/site-packages (from rfc3986[idna2008]<2,>=1.3->httpx->bigdl-serving) (3.1)\n",
+      "Requirement already satisfied: numpy>=1.14.5 in /home/user/anaconda3/envs/tf1/lib/python3.7/site-packages (from opencv-python->bigdl-serving) (1.20.1)\n",
+      "Installing collected packages: bigdl-serving\n",
+      "Successfully installed bigdl-serving-0.9.0\n"
+     ]
+    }
+   ],
+   "source": [
+    "! pip install bigdl-serving"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "working-terrorism",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Trying to find config file in  /home/user/anaconda3/envs/tf1/lib/python3.7/site-packages/bigdl/conf/config.yaml\r\n",
+      "Config file found in pip package, copying...\r\n",
+      "Config file ready.\r\n",
+      "Cluster Serving has been properly set up.\r\n",
+      "You did not specify ANALYTICS_ZOO_VERSION, will download 0.9.0\r\n",
+      "ANALYTICS_ZOO_VERSION is 0.9.0\r\n",
+      "BIGDL_VERSION is 0.12.1\r\n",
+      "SPARK_VERSION is 2.4.3\r\n",
+      "2.4\r\n",
+      "You are installing Cluster Serving by pip, downloading...\r\n"
+     ]
+    }
+   ],
+   "source": [
+    "import os\n",
+    "! mkdir cluster-serving\n",
+    "os.chdir('cluster-serving')\n",
+    "! cluster-serving-init"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "id": "excited-exception",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "  2800K .......... .......... .......... .......... ..........  0% 11.8M 19m20s\r\n",
+      "  2850K .......... .......... .......... .......... ..........  0% 11.3M 19m1s\r\n",
+      "  2900K .......... .......... .......... .......... ..........  0% 8.60M 18m43s\r\n",
+      "  2950K .......... .......... .......... .......... ..........  0% 11.9M 18m25s\r\n",
+      "  3000K .......... .......... .......... .......... ..........  0% 11.8M 18m7s\r\n",
+      "  3050K .......... .......... .......... .......... ..........  0%  674K 18m4s\r\n",
+      "  3100K .......... .......... .......... .......... ..........  0%  418K 18m9s\r\n",
+      "  3150K .......... .......... .......... .......... ..........  0% 1.05M 18m0s\r\n",
+      "  3200K .......... .......... .......... .......... ..........  0%  750K 17m56s\r\n",
+      "  3250K .......... .......... .......... ...."
+     ]
+    }
+   ],
+   "source": [
+    "! tail wget-log"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "casual-premium",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# if you encounter slow download issue like above, you can just use following command to download\n",
+    "# ! wget https://repo1.maven.org/maven2/com/intel/analytics/bigdl/bigdl-bigdl_0.12.1-spark_2.4.3/0.9.0/bigdl-bigdl_0.12.1-spark_2.4.3-0.9.0-serving.jar\n",
+    "\n",
+    "# if you are using wget to download, or get \"bigdl-xxx-serving.jar\" after \"ls\", please call mv *serving.jar bigdl.jar after downloaded."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "ruled-bermuda",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "bigdl-bigdl_0.12.1-spark_2.4.3-0.9.0-serving.jar  config.yaml  wget-log\r\n"
+     ]
+    }
+   ],
+   "source": [
+    "# After initialization finished, check the directory\n",
+    "! ls"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "id": "computational-rehabilitation",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Call mv *serving.jar bigdl.jar as mentioned above\n",
+    "! mv *serving.jar bigdl.jar"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "id": "personal-central",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "config.yaml  wget-log  bigdl.jar\r\n"
+     ]
+    }
+   ],
+   "source": [
+    "! ls"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "combined-stability",
+   "metadata": {},
+   "source": [
+    "We config the model path in `config.yaml` to following (the detail of config is at [Cluster Serving Configuration](https://github.com/intel-analytics/bigdl/blob/master/docs/docs/ClusterServingGuide/ProgrammingGuide.md#2-configuration))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "received-hayes",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "## BigDL Cluster Serving\n",
+    "\n",
+    "model:\n",
+    "  # model path must be provided\n",
+    "  path: /tmp/mlp_tf1"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "id": "satellite-honey",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "## BigDL Cluster Serving\r\n",
+      "\r\n",
+      "model:\r\n",
+      "  # model path must be provided\r\n",
+      "  path: /tmp/mlp_tf1\r\n",
+      "  # name, default is serving_stream, you need to specify if running multiple servings\r\n",
+      "  name:\r\n",
+      "data:\r\n",
+      "  # default, localhost:6379\r\n",
+      "  src:\r\n"
+     ]
+    }
+   ],
+   "source": [
+    "! head config.yaml"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "planned-hometown",
+   "metadata": {},
+   "source": [
+    "### Start Cluster Serving\n",
+    "\n",
+    "Cluster Serving requires Flink and Redis installed, and corresponded environment variables set, check [Cluster Serving Installation Guide](https://github.com/intel-analytics/bigdl/blob/master/docs/docs/ClusterServingGuide/ProgrammingGuide.md#1-installation) for detail.\n",
+    "\n",
+    "Flink cluster should start before Cluster Serving starts, if Flink cluster is not started, call following to start a local Flink cluster."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "id": "antique-melbourne",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Starting cluster.\n",
+      "Starting standalonesession daemon on host user-PC.\n",
+      "Starting taskexecutor daemon on host user-PC.\n"
+     ]
+    }
+   ],
+   "source": [
+    "! $FLINK_HOME/bin/start-cluster.sh"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "interested-bench",
+   "metadata": {},
+   "source": [
+    "After configuration, start Cluster Serving by `cluster-serving-start` (the detail is at [Cluster Serving Programming Guide](https://github.com/intel-analytics/bigdl/blob/master/docs/docs/ClusterServingGuide/ProgrammingGuide.md#3-launching-service))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 18,
+   "id": "modern-monster",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "model_path=\"/tmp/mlp_tf1\"\n",
+      "redis_timeout=\"5000\"\n",
+      "Redis maxmemory is not set, using default value 8G\n",
+      "redis server started, please check log in redis.log\n",
+      "OK\n",
+      "OK\n",
+      "OK\n",
+      "redis config maxmemory set to 8G\n",
+      "OK\n",
+      "OK\n",
+      "Starting new Cluster Serving job.\n",
+      "Cluster Serving job submitted, check log in log-cluster_serving-serving_stream.txt\n",
+      "To list Cluster Serving job status, use cluster-serving-cli list\n",
+      "{maxmem=null, timeout=5000}timeout getted: 5000\n"
+     ]
+    }
+   ],
+   "source": [
+    "! cluster-serving-start"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "improved-rough",
+   "metadata": {},
+   "source": [
+    "### Prediction using Cluster Serving\n",
+    "Next we start Cluster Serving code at python client."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 22,
+   "id": "immune-madness",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "redis group exist, will not create new one\n",
+      "redis group exist, will not create new one\n",
+      "Write to Redis successful\n",
+      "redis group exist, will not create new one\n",
+      "Write to Redis successful\n"
+     ]
+    }
+   ],
+   "source": [
+    "from bigdl.serving.client import InputQueue, OutputQueue\n",
+    "input_queue = InputQueue()\n",
+    "# Use async api to put and get, you have pass a name arg and use the name to get\n",
+    "arr = np.array([1,2])\n",
+    "input_queue.enqueue('my-input', t=arr)\n",
+    "output_queue = OutputQueue()\n",
+    "prediction = output_queue.query('my-input')\n",
+    "# Use sync api to predict, this will block until the result is get or timeout\n",
+    "prediction = input_queue.predict(arr)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 23,
+   "id": "signal-attention",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "array([1.], dtype=float32)"
+      ]
+     },
+     "execution_count": 23,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "prediction"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "suitable-selection",
+   "metadata": {},
+   "source": [
+    "The `prediction` result would be the same as using Tensorflow.\n",
+    "\n",
+    "This is the end of this tutorial. If you have any question, you could raise an issue at [BigDL Github](https://github.com/intel-analytics/bigdl/issues)."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.7.10"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/docs/readthedocs/source/doc/Serving/Example/transfer_learning.py b/docs/readthedocs/source/doc/Serving/Example/transfer_learning.py
new file mode 100644
index 00000000000..9777ea70f65
--- /dev/null
+++ b/docs/readthedocs/source/doc/Serving/Example/transfer_learning.py
@@ -0,0 +1,40 @@
+# Related url: https://github.com/tensorflow/docs/blob/master/site/en/r1/tutorials/images/transfer_learning.ipynb
+# Categorize image to cat or dog   
+import os
+import tensorflow.compat.v1 as tf
+from tensorflow import keras
+
+# Obtain data from url:"https://storage.googleapis.com/mledu-datasets/cats_and_dogs_filtered.zip"
+zip_file = tf.keras.utils.get_file(origin="https://storage.googleapis.com/mledu-datasets/cats_and_dogs_filtered.zip",
+                                   fname="cats_and_dogs_filtered.zip", extract=True)
+
+# Find the directory of validation set
+base_dir, _ = os.path.splitext(zip_file)
+test_dir = os.path.join(base_dir, 'validation')
+
+# Set images size to 160x160x3
+image_size = 160
+
+# Rescale all images by 1./255 and apply image augmentation
+test_datagen = keras.preprocessing.image.ImageDataGenerator(rescale=1./255)
+
+# Flow images using generator to the test_generator
+test_generator = test_datagen.flow_from_directory(
+                test_dir,
+                target_size=(image_size, image_size),
+                batch_size=1,
+                class_mode='binary')
+
+# Convert the next data of ImageDataGenerator to ndarray
+def convert_to_ndarray(ImageGenerator):
+    return ImageGenerator.next()[0]
+
+# Load model from its path
+model=tf.keras.models.load_model("path/to/model")
+
+# Convert each image in test_generator to ndarray and predict with model
+max_length=test_generator.__len__()
+for i in range(max_length): # number of image to predict can be altered
+    test_input=convert_to_ndarray(test_generator)
+    prediction=model.predict(test_input)
+
diff --git a/docs/readthedocs/source/doc/Serving/FAQ/faq.md b/docs/readthedocs/source/doc/Serving/FAQ/faq.md
new file mode 100644
index 00000000000..812d07e361f
--- /dev/null
+++ b/docs/readthedocs/source/doc/Serving/FAQ/faq.md
@@ -0,0 +1,53 @@
+# BigDL Cluster Serving FAQ
+
+## General Debug Guide
+You could use following guide to debug if serving is not working properly.
+
+### Check if Cluster Serving environment is ready
+Run following commands in terminal
+```
+echo $FLINK_HOME
+echo $REDIS_HOME
+```
+the output directory
+```
+/path/to/flink-version
+/path/to/redis-version
+``` 
+ 
+should be displayed, otherwise, go to [Programming Guide](ProgrammingGuide.md) **Installation** section.
+
+### Check if Flink Cluster is working
+Run following commands in terminal
+```
+netstat -tnlp
+```
+output like following should be displayed, `6123,8081` is Flink default port usage.
+```
+tcp6       0      0 :::6123                 :::*                    LISTEN      xxxxx/java
+tcp6       0      0 :::8081                 :::*                    LISTEN      xxxxx/java
+```
+if not, run `$FLINK_HOME/bin/start-cluster.sh` to start Flink cluster.
+
+After that, check Flink log in `$FLINK_HOME/log/`, check the log file of `flink-xxx-standalone-xxx.log` and `flink-xxx-taskexecutor-xxx.log` to make sure there is no error.
+
+If the port could not bind in this step, kill the program which use the port, and `$FLINK_HOME/bin/stop-cluster.sh && $FLINK_HOME/bin/start-cluster.sh` to restart Flink cluster.
+### Check if Cluster Serving is running
+```
+$FLINK_HOME/bin/flink list
+```
+output of Cluster Serving job information should be displayed, if not, go to [Programming Guide](ProgrammingGuide.md) **Launching Service** section to make sure you call `cluster-serving-start` correctly.
+
+
+
+### Troubleshooting
+
+1. `Duplicate registration of device factory for type XLA_CPU with the same priority 50`
+
+This error is caused by Flink ClassLoader. Please put cluster serving related jars into `${FLINK_HOME}/lib`.
+
+2. `servable Manager config dir not exist`
+
+Check if `servables.yaml` exists in current directory. If not, download from [github](https://github.com/intel-analytics/bigdl/blob/master/ppml/trusted-realtime-ml/scala/docker-graphene/servables.yaml).
+### Still, I get no result
+If you still get empty result, raise issue [here](https://github.com/intel-analytics/bigdl/issues) and post the output/log of your serving job.
diff --git a/docs/readthedocs/source/doc/Serving/Overview/cluster_serving_overview.jpg b/docs/readthedocs/source/doc/Serving/Overview/cluster_serving_overview.jpg
new file mode 100644
index 00000000000..6edbc9c90ed
Binary files /dev/null and b/docs/readthedocs/source/doc/Serving/Overview/cluster_serving_overview.jpg differ
diff --git a/docs/readthedocs/source/doc/Serving/Overview/cluster_serving_steps.jpg b/docs/readthedocs/source/doc/Serving/Overview/cluster_serving_steps.jpg
new file mode 100644
index 00000000000..74fb2752af0
Binary files /dev/null and b/docs/readthedocs/source/doc/Serving/Overview/cluster_serving_steps.jpg differ
diff --git a/docs/readthedocs/source/doc/Serving/Overview/serving-overview.md b/docs/readthedocs/source/doc/Serving/Overview/serving-overview.md
new file mode 100644
index 00000000000..f2d8757b1b3
--- /dev/null
+++ b/docs/readthedocs/source/doc/Serving/Overview/serving-overview.md
@@ -0,0 +1,28 @@
+# BigDL Cluster Serving Overview
+BigDL Cluster Serving is a lightweight distributed, real-time serving solution that supports a wide range of deep learning models (such as TensorFlow, PyTorch, Caffe, BigDL and OpenVINO models). It provides a simple pub/sub API, so that the users can easily send their inference requests to the input queue (using a simple Python API); Cluster Serving will then automatically manage the scale-out and real-time model inference across a large cluster (using distributed streaming frameworks such as Apache Spark Streaming, Apache Flink, etc.) 
+
+The overall architecture of BigDL Cluster Serving solution is illustrated as below: 
+
+![overview](cluster_serving_overview.jpg)
+
+## Workflow Overview
+The figure below illustrates the simple 3-step "Prepare-Launch-Inference" workflow for Cluster Serving.
+
+![steps](cluster_serving_steps.jpg)
+
+#### 1. Install and prepare Cluster Serving environment on a local node:
+
+- Copy a previously trained model to the local node; currently TensorFlow, PyTorch, Caffe, BigDL and OpenVINO models are supported.
+- Install BigDL Cluster Serving on the local node (e.g., using a single pip install command)
+- Configure Cluster Server on the local node, including the file path to the trained model and the address of the cluster (such as Apache Hadoop YARN cluster, K8s cluster, etc.).
+Please note that you only need to deploy the Cluster Serving solution on a single local node, and NO modifications are needed for the (YARN or K8s) cluster. 
+
+#### 2. Launch the Cluster Serving service
+
+You can launch the Cluster Serving service by running the startup script on the local node. Under the hood, Cluster Serving will automatically deploy the trained model and serve the model inference requests across the cluster in a distributed fashion. You may monitor its runtime status (such as inference throughput) using TensorBoard. 
+
+#### 3. Distributed, real-time (streaming) inference
+
+Cluster Serving provides a simple pub/sub API to the users, so that you can easily send the inference requests to an input queue (currently Redis Streams is used) using a simple Python API.
+
+Cluster Serving will then read the requests from the Redis stream, run the distributed real-time inference across the cluster (using Flink), and return the results back through Redis. As a result, you may get the inference results again using a simple Python API.
diff --git a/docs/readthedocs/source/doc/Serving/ProgrammingGuide/serving-inference.md b/docs/readthedocs/source/doc/Serving/ProgrammingGuide/serving-inference.md
new file mode 100644
index 00000000000..097d1e0036f
--- /dev/null
+++ b/docs/readthedocs/source/doc/Serving/ProgrammingGuide/serving-inference.md
@@ -0,0 +1,185 @@
+# BigDL Cluster Serving Programming Guide
+
+## Model Inference
+Once you finish the installation and service launch, you could do inference using Cluster Serving client API.
+
+We support Python API and HTTP RESTful API for conducting inference with Data Pipeline in Cluster Serving. 
+
+### Python API
+For Python API, the requirements of python packages are `opencv-python`(for raw image only), `pyyaml`, `redis`. You can use `InputQueue` and `OutputQueue` to connect to data pipeline by providing the pipeline url, e.g. `my_input_queue = InputQueue(host, port)` and `my_output_queue = OutputQueue(host, port)`. If parameters are not provided, default url `localhost:6379` would be used.
+
+We provide some basic usages here, for more details, please see [API Guide](APIGuide.md).
+
+To input data to queue, you need a `InputQueue` instance, and using `enqueue` method, for each input, give a key correspond to your model or give arbitrary key if your model does not care about it.
+
+To enqueue an image
+```
+from bigdl.serving.client import InputQueue
+input_api = InputQueue()
+input_api.enqueue('my-image1', user_define_key={"path: 'path/to/image1'})
+```
+To enqueue an instance containing 1 image and 2 ndarray
+```
+from bigdl.serving.client import InputQueue
+import numpy as np
+input_api = InputQueue()
+t1 = np.array([1,2])
+t2 = np.array([[1,2], [3,4]])
+input_api.enqueue('my-instance', img={"path": 'path/to/image'}, tensor1=t1, tensor2=t2)
+```
+There are 4 types of inputs in total, string, image, tensor, sparse tensor, which could represents nearly all types of models. For more details of usage, go to [API Guide](APIGuide.md)
+
+To get data from queue, you need a `OutputQueue` instance, and using `query` or `dequeue` method. The `query` method takes image uri as parameter and returns the corresponding result. The `dequeue` method takes no parameter and just returns all results and also delete them in data queue. See following example.
+```
+from bigdl.serving.client import OutputQueue
+output_api = OutputQueue()
+img1_result = output_api.query('img1')
+all_result = output_api.dequeue() # the output queue is empty after this code
+```
+Consider the code above,
+```
+img1_result = output_api.query('img1')
+```
+##### Sync API
+Python API is a pub-sub schema async API. Specifically, thread would not block once you call `enqueue` method. If you want the thread to block, see this section.
+
+To use sync API, create a `InputQueue` instance with `sync=True` and `frontend_url=frontend_server_url` argument.
+```
+from bigdl.serving.client import InputQueue
+input_api = InputQueue(sync=True, frontend_url=frontend_server_url)
+response = input_api.predict(request_json_string)
+print(response.text)
+```
+example of `request_json_string` is
+```
+'{
+  "instances" : [ {
+    "ids" : [ 100.0, 88.0 ]
+  }]
+}'
+```
+This API is also a python support of [Restful API](#restful-api) section, so for more details of input format, refer to it.
+### RESTful API
+RESTful API uses serving HTTP server.
+This part describes API endpoints and end-to-end examples on usage. 
+The requests and responses are in JSON format. The composition of them depends on the requests type or verb. See the APIs for details.
+In case of error, all APIs will return a JSON object in the response body with error as key and the error message as the value:
+```
+{
+  "error": <error message string>
+}
+```
+#### Predict API
+URL
+```
+POST http://host:port/predict
+```
+Request Example for images as inputs:
+```
+curl -d \
+'{
+  "instances": [
+    {
+      "image": "/9j/4AAQSkZJRgABAQEASABIAAD/7RcEUGhvdG9za..."
+    },   
+    {
+      "image": "/9j/4AAQSkZJRgABAQEASABIAAD/7RcEUGhvdG9za..."
+    }
+  ]
+}' \
+-X POST http://host:port/predict
+```
+Response Example
+```
+{
+  "predictions": [
+    "{value=[[903,0.1306194]]}",    
+    "{value=[[903,0.1306194]]}"
+  ]
+}
+```
+Request Example for tensor as inputs:
+```
+curl -d \
+'{
+  "instances" : [ {
+    "ids" : [ 100.0, 88.0 ]
+  }, {
+    "ids" : [ 100.0, 88.0 ]
+  } ]
+}' \
+-X POST http://host:port/predict
+```
+Response Example
+```
+{
+  "predictions": [
+    "{value=[[1,0.6427843]]}",
+    "{value=[[1,0.6427842]]}"
+  ]
+}
+```
+Another request example for composition of scalars and tensors.
+```
+curl -d \
+ '{
+  "instances" : [ {
+    "intScalar" : 12345,
+    "floatScalar" : 3.14159,
+    "stringScalar" : "hello, world. hello, arrow.",
+    "intTensor" : [ 7756, 9549, 1094, 9808, 4959, 3831, 3926, 6578, 1870, 1741 ],
+    "floatTensor" : [ 0.6804766, 0.30136853, 0.17394465, 0.44770062, 0.20275897, 0.32762378, 0.45966738, 0.30405098, 0.62053126, 0.7037923 ],
+    "stringTensor" : [ "come", "on", "united" ],
+    "intTensor2" : [ [ 1, 2 ], [ 3, 4 ], [ 5, 6 ] ],
+    "floatTensor2" : [ [ [ 0.2, 0.3 ], [ 0.5, 0.6 ] ], [ [ 0.2, 0.3 ], [ 0.5, 0.6 ] ] ],
+    "stringTensor2" : [ [ [ [ "come", "on", "united" ], [ "come", "on", "united" ], [ "come", "on", "united" ], [ "come", "on", "united" ] ], [ [ "come", "on", "united" ], [ "come", "on", "united" ], [ "come", "on", "united" ], [ "come", "on", "united" ] ] ], [ [ [ "come", "on", "united" ], [ "come", "on", "united" ], [ "come", "on", "united" ], [ "come", "on", "united" ] ], [ [ "come", "on", "united" ], [ "come", "on", "united" ], [ "come", "on", "united" ], [ "come", "on", "united" ] ] ] ]
+  }]
+}' \
+-X POST http://host:port/predict
+```
+Another request example for composition of sparse and dense tensors.
+```
+curl -d \
+'{
+  "instances" : [ {
+    "sparseTensor" : {
+      "shape" : [ 100, 10000, 10 ],
+      "data" : [ 0.2, 0.5, 3.45, 6.78 ],
+      "indices" : [ [ 1, 1, 1 ], [ 2, 2, 2 ], [ 3, 3, 3 ], [ 4, 4, 4 ] ]
+    },
+    "intTensor2" : [ [ 1, 2 ], [ 3, 4 ], [ 5, 6 ] ]
+  }]
+}' \
+-X POST http://host:port/predict
+```
+
+
+#### Metrics API
+URL
+```
+GET http://host:port/metrics
+```
+Response example:
+```
+[
+  {
+    name: "bigdl.serving.redis.get",
+    count: 810,
+    meanRate: 12.627772820651845,
+    min: 0,
+    max: 25,
+    mean: 0.9687099303718213,
+    median: 0.928579,
+    stdDev: 0.8150031623593447,
+    _75thPercentile: 1.000047,
+    _95thPercentile: 1.141443,
+    _98thPercentile: 1.268665,
+    _99thPercentile: 1.608387,
+    _999thPercentile: 25.874584
+  }
+]
+```
+## Logs and Visualization
+To see outputs/logs, go to FLink UI -> job -> taskmanager, (`localhost:8081` by default), or go to `${FLINK_HOME}/logs`
+
+To visualize the statistics, e.g. performance, go to Flink UI -> job -> metrics, and select the statistic to monitor
diff --git a/docs/readthedocs/source/doc/Serving/ProgrammingGuide/serving-installation.md b/docs/readthedocs/source/doc/Serving/ProgrammingGuide/serving-installation.md
new file mode 100644
index 00000000000..70029f7a8df
--- /dev/null
+++ b/docs/readthedocs/source/doc/Serving/ProgrammingGuide/serving-installation.md
@@ -0,0 +1,154 @@
+# BigDL Cluster Serving Programming Guide
+
+## Installation
+It is recommended to install Cluster Serving by pulling the pre-built Docker image to your local node, which have packaged all the required dependencies. Alternatively, you may also manually install Cluster Serving (through either pip or direct downloading), Redis on the local node.
+#### Docker
+```
+docker pull intelanalytics/bigdl-cluster-serving
+```
+then, (or directly run `docker run`, it will pull the image if it does not exist)
+```
+docker run --name cluster-serving -itd --net=host intelanalytics/bigdl-cluster-serving:0.9.0
+```
+Log into the container
+```
+docker exec -it cluster-serving bash
+```
+`cd ./cluster-serving`, you can see all the environments prepared.
+
+#### Manual installation
+
+##### Requirements
+Non-Docker users need to install [Flink 1.10.0+](https://archive.apache.org/dist/flink/flink-1.10.0/), 1.10.0 by default, [Redis 5.0.0+](https://redis.io/topics/quickstart), 5.0.5 by default.
+
+For users do not have above dependencies, we provide following command to quickly set up.
+
+Redis
+```
+$ export REDIS_VERSION=5.0.5
+$ wget http://download.redis.io/releases/redis-${REDIS_VERSION}.tar.gz && \
+    tar xzf redis-${REDIS_VERSION}.tar.gz && \
+    rm redis-${REDIS_VERSION}.tar.gz && \
+    cd redis-${REDIS_VERSION} && \
+    make
+```
+
+Flink
+```
+$ export FLINK_VERSION=1.11.2
+$ wget https://archive.apache.org/dist/flink/flink-${FLINK_VERSION}/flink-${FLINK_VERSION}-bin-scala_2.11.tgz && \
+    tar xzf flink-${FLINK_VERSION}-bin-scala_2.11.tgz && \
+    rm flink-${FLINK_VERSION}-bin-scala_2.11.tgz.tgz
+```
+
+After preparing dependencies above, make sure the environment variable `$FLINK_HOME` (/path/to/flink-FLINK_VERSION-bin), `$REDIS_HOME`(/path/to/redis-REDIS_VERSION) is set before following steps. 
+
+#### Install release version
+```
+pip install bigdl-serving
+```
+#### Install nightly version
+Download package from [here](https://sourceforge.net/projects/bigdl/files/cluster-serving-py/), run following command to install Cluster Serving
+```
+pip install analytics_zoo_serving-*.whl
+```
+For users who need to deploy and start Cluster Serving, run `cluster-serving-init` to download and prepare dependencies.
+
+For users who need to do inference, aka. predict data only, the environment is ready.
+
+## Configuration
+### Set up cluster
+Cluster Serving uses Flink cluster, make sure you have it according to [Installation](#1-installation).
+
+For docker user, the cluster should be already started. You could use `netstat -tnlp | grep 8081` to check if Flink REST port is working, if not, call `$FLINK_HOME/bin/start-cluster.sh` to start Flink cluster.
+
+If you need to start Flink on yarn, refer to [Flink on Yarn](https://ci.apache.org/projects/flink/flink-docs-stable/deployment/resource-providers/yarn.html), or K8s, refer to [Flink on K8s](https://ci.apache.org/projects/flink/flink-docs-stable/deployment/resource-providers/standalone/kubernetes.html) at Flink official documentation.
+
+If you use Flink standalone, call `$FLINK_HOME/bin/start-cluster.sh` to start Flink cluster.
+
+
+
+### Configuration file
+After [Installation](#1-installation), you will see a config file `config.yaml` in your current working directory. This file contains all the configurations that you can customize for your Cluster Serving. See an example of `config.yaml` below.
+```
+## BigDL Cluster Serving Config Example
+# model path must be provided
+modelPath: /path/to/model
+```
+
+### Preparing Model
+Currently BigDL Cluster Serving supports TensorFlow, OpenVINO, PyTorch, BigDL, Caffe models. Supported types are listed below.
+
+You need to put your model file into a directory with layout like following according to model type, note that only one model is allowed in your directory. Then, set in `config.yaml` file with `modelPath:/path/to/dir`.
+
+**Tensorflow**
+***Tensorflow SavedModel***
+```
+|-- model
+   |-- saved_model.pb
+   |-- variables
+       |-- variables.data-00000-of-00001
+       |-- variables.index
+```
+***Tensorflow Frozen Graph***
+```
+|-- model
+   |-- frozen_inference_graph.pb
+   |-- graph_meta.json
+```
+**note:** `.pb` is the weight file which name must be `frozen_inference_graph.pb`, `.json` is the inputs and outputs definition file which name must be `graph_meta.json`, with contents like `{"input_names":["input:0"],"output_names":["output:0"]}`
+
+***Tensorflow Checkpoint***
+Please refer to [freeze checkpoint example](https://github.com/intel-analytics/bigdl/tree/master/pyzoo/bigdl/examples/tensorflow/freeze_checkpoint)
+
+**Pytorch**
+
+```
+|-- model
+   |-- xx.pt
+```
+Running Pytorch model needs extra dependency and config. Refer to [here](https://github.com/intel-analytics/bigdl/blob/master/pyzoo/bigdl/examples/pytorch/train/README.md) to install dependencies, and set environment variable `$PYTHONHOME` to your python, e.g. python could be run by `$PYTHONHOME/bin/python` and library is at `$PYTHONHOME/lib/`.
+
+**OpenVINO**
+
+```
+|-- model
+   |-- xx.xml
+   |-- xx.bin
+```
+**BigDL**
+
+```
+|--model
+   |-- xx.model
+```
+**Caffe**
+
+```
+|-- model
+   |-- xx.prototxt
+   |-- xx.caffemodel
+```
+
+
+### Other Configuration
+The field `params` contains your inference parameter configuration.
+
+* core_number: the **batch size** you use for model inference, usually the core number of your machine is recommended. Thus you could just provide your machine core number at this field. We recommend this value to be not smaller than 4 and not larger than 512. In general, using larger batch size means higher throughput, but also increase the latency between batches accordingly.
+
+### High Performance Configuration Recommended
+#### Tensorflow, Pytorch
+1 <= thread_per_model <= 8, in config
+```
+# default: number of models used in serving
+# modelParallelism: core_number of your machine / thread_per_model
+```
+environment variable
+```
+export OMP_NUM_THREADS=thread_per_model
+```
+#### OpenVINO
+environment variable
+```
+export OMP_NUM_THREADS=core_number of your machine
+```
diff --git a/docs/readthedocs/source/doc/Serving/ProgrammingGuide/serving-start.md b/docs/readthedocs/source/doc/Serving/ProgrammingGuide/serving-start.md
new file mode 100644
index 00000000000..195355c1464
--- /dev/null
+++ b/docs/readthedocs/source/doc/Serving/ProgrammingGuide/serving-start.md
@@ -0,0 +1,87 @@
+# BigDL Cluster Serving Programming Guide
+  
+## Launching Service of Serving
+
+Before do inference (predict), you have to start serving service. This section shows how to start/stop the service. 
+
+### Start
+You can use following command to start Cluster Serving.
+```
+cluster-serving-start
+```
+
+Normally, when calling `cluster-serving-start`, your `config.yaml` should be in current directory. You can also use `cluster-serving-start -c config_path` to pass config path `config_path` to Cluster Serving manually.
+
+### Stop
+You can use Flink UI in `localhost:8081` by default, to cancel your Cluster Serving job.
+
+Or you can use `${FLINK_HOME}/bin/flink list` to get serving job ID and call `${FLINK_HOME|/bin/flink cancel $ID`.
+
+### Shut Down
+You can use following command to shutdown Cluster Serving. This operation will stop all Cluster Serving jobs and Redis server. Note that your data in Redis will be removed when you shutdown. 
+```
+cluster-serving-shutdown
+```
+If you are using Docker, you could also run `docker rm` to shutdown Cluster Serving.
+### Start Multiple Serving
+To run multiple Cluster Serving job, e.g. the second job name is `serving2`, then use following configuration
+```
+# model path must be provided
+# modelPath: /path/to/model
+
+# name, default is serving_stream, you need to specify if running multiple servings
+# jobName: serving2
+```
+then call `cluster-serving-start` in this directory would start another Cluster Serving job with this new configuration.
+
+Then, in Python API, pass `name=serving2` argument during creating object, e.g.
+```
+input_queue=InputQueue(name=serving2)
+output_queue=OutputQueue(name=serving2)
+```
+Then the Python API would interact with job `serving2`.
+
+### HTTP Server
+If you want to use sync API for inference, you should start a provided HTTP server first. User can submit HTTP requests to the HTTP server through RESTful APIs. The HTTP server will parse the input requests and pub them to Redis input queues, then retrieve the output results and render them as json results in HTTP responses.
+
+#### Prepare
+User can download a bigdl-${VERSION}-http.jar from the Nexus Repository with GAVP: 
+```
+<groupId>com.intel.analytics.bigdl</groupId>
+<artifactId>bigdl-bigdl_${BIGDL_VERSION}-spark_${SPARK_VERSION}</artifactId>
+<version>${ZOO_VERSION}</version>
+```
+User can also build from the source code:
+```
+mvn clean package -P spark_2.4+ -Dmaven.test.skip=true
+```
+#### Start the HTTP Server
+User can start the HTTP server with following command.
+```
+java -jar bigdl-bigdl_${BIGDL_VERSION}-spark_${SPARK_VERSION}-${ZOO_VERSION}-http.jar
+```
+And check the status of the HTTP server with:
+```
+curl  http://${BINDED_HOST_IP}:${BINDED_HOST_PORT}/
+```
+If you get a response like "welcome to BigDL web serving frontend", that means the HTTP server is started successfully.
+#### Start options
+User can pass options to the HTTP server when start it:
+```
+java -jar bigdl-bigdl_${BIGDL_VERSION}-spark_${SPARK_VERSION}-${ZOO_VERSION}-http.jar --redisHost="172.16.0.109"
+```
+All the supported parameter are listed here:
+* **interface**: the binded server interface, default is "0.0.0.0"
+* **port**: the binded server port, default is 10020
+* **redisHost**: the host IP of redis server, default is "localhost"
+* **redisPort**: the host port of redis server, default is 6379
+* **redisInputQueue**: the input queue of redis server, default is "serving_stream"
+* **redisOutputQueue**: the output queue of redis server, default is "result:" 
+* **parallelism**: the parallelism of requests processing, default is 1000
+* **timeWindow**: the timeWindow wait to pub inputs to redis, default is 0
+* **countWindow**: the timeWindow wait to ub inputs to redis, default is 56
+* **tokenBucketEnabled**: the switch to enable/disable RateLimiter, default is false
+* **tokensPerSecond**: the rate of permits per second, default is 100
+* **tokenAcquireTimeout**: acquires a permit from this RateLimiter if it can be obtained without exceeding the specified timeout(ms), default is 100
+
+**User can adjust these options to tune the performance of the HTTP server.**
diff --git a/docs/readthedocs/source/doc/Serving/QuickStart/serving-quickstart.md b/docs/readthedocs/source/doc/Serving/QuickStart/serving-quickstart.md
new file mode 100644
index 00000000000..a25e67bd947
--- /dev/null
+++ b/docs/readthedocs/source/doc/Serving/QuickStart/serving-quickstart.md
@@ -0,0 +1,49 @@
+# BigDL Cluster Serving Quick Start
+
+This section provides a quick start example for you to run BigDL Cluster Serving. To simplify the example, we use docker to run Cluster Serving. If you do not have docker installed, [install docker](https://docs.docker.com/install/) first. The quick start example contains all the necessary components so the first time users can get it up and running within minutes:
+
+* A docker image for BigDL Cluster Serving (with all dependencies installed)
+* A sample configuration file
+* A sample trained TensorFlow model, and sample data for inference
+* A sample Python client program
+
+Use one command to run Cluster Serving container. (We provide quick start model in older version of docker image, for newest version, please refer to following sections and we remove the model to reduce the docker image size).
+```
+docker run --name cluster-serving -itd --net=host intelanalytics/bigdl-cluster-serving:0.9.1
+```
+Log into the container using `docker exec -it cluster-serving bash`, and run
+```
+cd cluster-serving
+cluster-serving-init
+```
+`bigdl.jar` and `config.yaml` is in your directory now.
+
+Also, you can see prepared TensorFlow frozen ResNet50 model in `resources/model` directory with following structure.
+
+```
+cluster-serving | 
+               -- | model
+                 -- frozen_graph.pb
+                 -- graph_meta.json
+```
+Modify `config.yaml` and add following to `model` config
+```
+model:
+    path: resources/model
+```
+
+Start Cluster Serving using `cluster-serving-start`. 
+
+Run python program `python3 image_classification_and_object_detection_quick_start.py -i resources/test_image` to push data into queue and get inference result. 
+
+Then you can see the inference output in console. 
+```
+cat prediction layer shape:  (1000,)
+the class index of prediction of cat image result:  292
+cat prediction layer shape:  (1000,)
+```
+Wow! You made it!
+
+Note that the Cluster Serving quick start example will run on your local node only. Check the [Deploy Your Own Cluster Serving](#deploy-your-own-cluster-serving) section for how to configure and run Cluster Serving in a distributed fashion.
+
+For more details, refer to following sections.