Skip to content

Latest commit

 

History

History
369 lines (259 loc) · 13.3 KB

File metadata and controls

369 lines (259 loc) · 13.3 KB

Labs DS starter

Big picture

This template has starter code to deploy an API for your machine learning model and data visualizations.

You can see the template deployed on AWS as-is.

This diagram shows two different ways to use Python web frameworks. Both ways are good! The first way is what you learned in DS Unit 3, with Flask. The second way is more common in Build Weeks & Labs.

Instead of Flask, we'll use FastAPI. It's similar, but faster, with automatic interactive docs. For more comparison, see FastAPI for Flask Users.

You'll build and deploy a Data Science API. You'll work cross-functionally with your Web teammates to connect your API to a full-stack web app!

Tech stack

  • AWS Elastic Beanstalk: Platform as a service, hosts your API.
  • Docker: Containers, for reproducible environments.
  • FastAPI: Web framework. Like Flask, but faster, with automatic interactive docs.
  • Flake8: Linter, enforces PEP8 style guide.
  • Plotly: Visualization library, for Python & JavaScript.
  • Pytest: Testing framework, runs your unit tests.

Getting started

Create a new repository from this template.

Clone the repo

git clone https://github.com/YOUR-GITHUB-USERNAME/YOUR-REPO-NAME.git

cd YOUR-REPO-NAME

Build the Docker image

docker-compose build

Run the Docker image

docker-compose up

Go to localhost:8000 in your browser.

image

You'll see your API documentation:

  • Your app's title
  • Your description
  • An endpoint for POST requests, /predict
  • An endpoint for GET requests, /viz

Click the /predict endpoint's green button.

image

You'll see the endpoint's documentation, including:

  • Your function's docstring, """Make random baseline predictions for classification problem."""
  • Request body example, as JSON (like a Python dictionary)
  • A button, "Try it out"

Click the "Try it out" button.

image

The request body becomes editable.

Click the "Execute" button. Then scroll down.

image

You'll see the server response, including:

  • Code 200, which means the request was successful.
  • The response body, as JSON, with random baseline predictions for a classification problem.

Your job is to replace these random predictions with real predictions from your model. Use this starter code and documentation to deploy your model as an API!

File structure

project
├── requirements.txt
└── app
    ├── __init__.py
    ├── main.py
    ├── api
    │   ├── __init__.py
    │   ├── predict.py
    │   └── viz.py    
    └── tests
        ├── __init__.py
        ├── test_main.py
        ├── test_predict.py
        └── test_viz.py

requirements.txt is where you add Python packages that your app requires. Then run docker-compose build to re-build your Docker image.

app/main.py is where you edit your app's title and description, which are displayed at the top of the your automatically generated documentation. This file also configures "Cross-Origin Resource Sharing", which you shouldn't need to edit.

app/api/predict.py defines the Machine Learning endpoint. /predict accepts POST requests and responds with random predictions. In a notebook, train your model and pickle it. Then in this source code file, unpickle your model and edit the predict function to return real predictions.

When your API receives a POST request, FastAPI automatically parses and validates the request body JSON, using the Item class attributes and functions. Edit this class so it's consistent with the column names and types from your training dataframe.

app/api/viz.py defines the Visualization endpoint. Create your own Plotly visualizations in notebooks. Then add your code to this source code file. Your web teammates can use react-plotly.js to show the visualizations.

react-plotly.js animation

app/tests/test_*.py is where you edit your pytest unit tests.

Deploy to AWS

Get your AWS access keys.

Install AWS Command Line Interface.

Configure AWS CLI:

aws configure

Install AWS Elastic Beanstalk CLI:

pip install pipx
pipx install awsebcli

Follow AWS EB docs:

Use Docker to build the image locally, test it locally, then push it to Docker Hub.

docker build -f project/Dockerfile -t YOUR-DOCKER-HUB-ID/YOUR-IMAGE-NAME ./project

docker login

docker push YOUR-DOCKER-HUB-ID/YOUR-IMAGE-NAME

Edit the image name in the Dockerrun.aws.json file. Replace the placeholder YOUR-DOCKER-HUB-ID/YOUR-IMAGE-NAME with your real values.

Then use the EB CLI:

git add --all

git commit -m "Your commit message"

eb init -p docker YOUR-APP-NAME --region us-east-1

eb create YOUR-APP-NAME

eb open

To redeploy:

  • git commit ...
  • docker build ...
  • docker push ...
  • eb deploy
  • eb open

Example: Data visualization

Labs projects will use Plotly, a popular visualization library for both Python & JavaScript.

Follow the getting started instructions.

Edit app/main.py to add your API title and description.

app = FastAPI(
    title='World Metrics DS API',
    description='Visualize world metrics from Gapminder data',
    version='0.1',
    docs_url='/',
)

Prototype your visualization in a notebook.

import plotly.express as px

dataframe = px.data.gapminder().rename(columns={
    'year': 'Year', 
    'lifeExp': 'Life Expectancy', 
    'pop': 'Population', 
    'gdpPercap': 'GDP Per Capita'
})

country = 'United States'
metric = 'Population'
subset = dataframe[dataframe.country == country]
fig = px.line(subset, x='Year', y=metric, title=f'{metric} in {country}')
fig.show()

Define a function for your visualization. End with return fig.to_json()

Then edit app/api/viz.py to add your code.

import plotly.express as px

dataframe = px.data.gapminder().rename(columns={
    'year': 'Year', 
    'lifeExp': 'Life Expectancy', 
    'pop': 'Population', 
    'gdpPercap': 'GDP Per Capita'
})

@app.get('/worldviz')
async def worldviz(metric, country):
    """
    Visualize world metrics from Gapminder data

    ### Query Parameters
    - `metric`: 'Life Expectancy', 'Population', or 'GDP Per Capita'
    - `country`: [country name](https://www.gapminder.org/data/geo/), case sensitive

    ### Response
    JSON string to render with react-plotly.js
    """
    subset = dataframe[dataframe.country == country]
    fig = px.line(subset, x='Year', y=metric, title=f'{metric} in {country}')
    return fig.to_json()

Test locally, then deploy to AWS.

Your web teammates will re-use the data viz code & docs in our labs-spa-starter repo. The web app will call the DS API to get the data, then use react-plotly.js to render the visualization.

Plotly Python docs

Plotly JavaScript docs

Example: Machine learning

Follow the getting started instructions.

Edit app/main.py to add your API title and description.

app = FastAPI(
    title='House Price DS API',
    description='Predict house prices in California',
    version='0.1',
    docs_url='/',
)

Edit app/api/predict.py to add a docstring for your predict function and return a naive baseline.

@router.post('/predict')
async def predict(item: Item):
    """Predict house prices in California."""
    y_pred = 200000
    return {'predicted_price': y_pred}

In a notebook, explore your data. Make an educated guess of what features you'll use.

import pandas as pd
from sklearn.datasets import fetch_california_housing

# Load data
california = fetch_california_housing()
print(california.DESCR)
X = pd.DataFrame(california.data, columns=california.feature_names)
y = california.target

# Rename columns
X.columns = X.columns.str.lower()
X = X.rename(columns={'avebedrms': 'bedrooms', 'averooms': 'total_rooms'})

# Explore descriptive stats
X.describe()
# Use these 3 features
features = ['bedrooms', 'total_rooms', 'house_age']

Edit the class in app/api/predict.py to use your features.

class House(BaseModel):
    """Use this data model to parse the request body JSON."""
    bedrooms: int
    total_rooms: float
    house_age: float

    def to_df(self):
        """Convert pydantic object to pandas dataframe with 1 row."""
        return pd.DataFrame([dict(self)])

@router.post('/predict')
async def predict(house: House):
    """Predict house prices in California."""
    X_new = house.to_df()
    y_pred = 200000
    return {'predicted_price': y_pred}

Test locally, then deploy to AWS with your work-in-progress. Now your web teammates can make POST requests to your API endpoint.

In a notebook, train your pipeline and pickle it. See these docs:

Get version numbers for every package you used in your pipeline. Add these packages to your requirements.txt file with their exact version numbers. Then run docker-compose build to re-build your Docker image.

Edit app/api/predict.py to unpickle your model and use it in your predict function.

Now you are ready to re-deploy! 🚀