Skip to content

Latest commit

 

History

History
222 lines (128 loc) · 7.86 KB

client.md

File metadata and controls

222 lines (128 loc) · 7.86 KB

TabPy Client

TabPy client is the Python package for managing the published Python functions on TabPy server.

Installation

The client comes in the form of a pip package that can be installed from the project folder directly. Installing the package using setup.sh or setup.bat in TabPy folder or doing pip install of taby-server will automatically install TabPy Client.

If you prefer manual install, Tableau recommends that you install it within a Python environment. You can create such an environment with conda or virtualenv, To activate the environment run the following command:

On Linux/MacOS:

/Anaconda/bin/source activate Tableau-Python-Server

On Windows:

\Anaconda\scripts\activate Tableau-Python-Server

In order to install the client package, run this command:

pip install tabpy-client

Connecting to TabPy

The client library uses the notion of connecting to a service, to avoid having to specify the service location for all subsequent operations:

import tabpy_client

client = tabpy_client.Client('http://localhost:9004/')

The URL and port are where the Tableau-Python-Server process has been started - more info can be found in the server section of the documentation.

Deploying a Function

A persisted endpoint is backed by a Python method. For example:

def add(x,y):
    import numpy as np
    return np.add(x, y).tolist()

client.deploy('add', add, 'Adds two numbers x and y')

The next example is more complex, using scikit-learn's clustering API:

def clustering(x, y):
    import numpy as np
    from sklearn.cluster import DBSCAN
    from sklearn.preprocessing import StandardScaler
    X = np.column_stack([x, y])
    X = StandardScaler().fit_transform(X)
    db = DBSCAN(eps=1, min_samples=3).fit(X)
    return db.labels_.tolist()


client.deploy('clustering',
                clustering,
                'Returns cluster Ids for each data point specified by the pairs in x and y')

In this example the function clustering expects a set of two-dimensional data points, represented by the list of all x-coordinates and the list of all y-coordinates. It will return a set of numerical labels corresponding to the clusters each datapoint is assigned to. We deploy this function as an endpoint named clustering. It is now reachable as a REST API, as well as through the TabPy client- for details see the next section.

You can re-deploy a function (for example, after you modified its code) by setting the override parameter to True:

client.deploy('add', add, 'Adds two numbers x and y', override=True)

Each re-deployment of an endpoint will increment its version number, which is also returned as part of the query result.

When deploying endpoints that rely on supervised learning models, you may want to load a saved model instead of training on-the-fly for performance reasons.

Below is an excerpt from the training stage of a hypothetical model that predicts whether or not a loan will default:

from sklearn.ensemble import GradientBoostingClassifier

predictors = [x for x in train.columns if x not in [target, RowID]]
gbm = GradientBoostingClassifier(learning_rate=0.01, n_estimators=600,max_depth=9,
min_samples_split=1200, min_samples_leaf=60, subsample=0.85, random_state=10)
modelfit(gbm, train, test, predictors)

When the trained model (named gbm in this case) is used in a function being deployed (as in gbm.predict(...) below), Tableau will automatically save its definition using cloudpickle along with the definition of the function. The model will also be kept in memory on the server to achieve fast response times. If you persist your model manually to disk and read as part of your scoring function code however, you will notice that response times are noticeably longer as every time a client hits an endpoint, the code (including model loading) will get executed. In order to get the best performance, we recommended following the methodology outlined in this example.

def LoanDefaultClassifier(Loan_Amount, Loan_Tenure, Monthly_Income, Age):
    import pandas as pd
    data=pd.concat([Loan_Amount,Loan_Tenure,Monthly_Income,Age],axis=1)
    return gbm.predict(data)

client.deploy('WillItDefault',
              LoanDefaultClassifier,
              'Returns whether a loan application is likely to default.')

You can find a detailed working example with a downloadable sample Tableau workbook and an accompanying Jupyter workbook that walks through model fitting, evaluation and publishing steps on our blog.

The endpoints that are no longer needed can be removed the following way:

client.remove('WillItDefault')

Providing Schema Metadata

As soon as you share your deployed functions, you also need to share metadata about the function. The consumer of an endpoint needs to know the details of how to use the endpoint, such as:

  • The general purpose of the endpoint
  • Input parameter names, data types, and their meaning
  • Return data type and description

This data goes beyond the single string that we used above when deploying the function add. You can use an optional parameter to deploy to provide such a structured description, which can then be retrieved by other users connected to the same server. The schema is interpreted as a Json Schema object, which you can either manually create or generate using a utility method provided in this client package:

from tabpy_client.schema import generate_schema

schema = generate_schema(
  input={'x': 3, 'y': 2},
  output=5,
  input_description={'x': 'first value',
                     'y': 'second value'},
  output_description='the sum of x and y')

  client.deploy('add', add, 'Adds two numbers x and y', schema=schema)

To describe more complex input, like arrays, you would use the following syntax:

from tabpy_client.schema import generate_schema

schema = generate_schema(
  input={'x': [6.35, 6.40, 6.65, 8.60],
         'y': [1.95, 1.95, 2.05, 3.05]},
  output=[0, 0, 0, 1],
  input_description={'x': 'list of x values',
                     'y': 'list of y values'},
  output_description='cluster Ids for each point x, y')

  client.deploy('clustering',
      clustering,
      'Returns cluster Ids for each data point specified by the pairs in x and y',
      schema=schema)

A schema described as such can be retrieved through the REST Endpoints API or through the get_endpoints client API as follows:

client.get_endpoints()['add']['schema']

Querying an Endpoint

Once a Python function has been deployed to the server process, you can use the client's query method to query it (assumes you’re already connected to the service):

x = [6.35, 6.40, 6.65, 8.60, 8.90, 9.00, 9.10]
y = [1.95, 1.95, 2.05, 3.05, 3.05, 3.10, 3.15]

client.query('clustering', x, y)

Response:


{u'model': u'clustering',
 u'response': [0, 0, 0, 1, 1, 1, 1],
 u'uuid': u'1ca01e46-733c-4a77-b3da-3ded84dff4cd',
 u'version': 2}

Evaluating Arbitrary Python Scripts

The other core functionality besides deploying and querying methods as endpoints is the ad-hoc execution of Python code, called evaluate. Evaluate does not have a Python API in tabpy-client, only a raw REST interface that other client bindings can easily implement. Tableau connects to TabPy using REST Evaluate.

Evaluate allows calling a deployed endpoint from within the Python code block. The convention for this is to use a provided function call tabpy.query in the code, which behaves like the query method in tabpy-client. See the REST API documentation for an example.