Version: 1.1 Date: 17/9/2018 Author: David Glance
- Build out a binary classification model using Amazon Machine Learning
- Explore parameters that affect the model’s training and evaluation process
Ubuntu AWS Amazon Machine Learning boto3 Python
The aim of this lab is to write a program that will:
- The principles of the binary classifier using the AWS Machine Learning tutorial
- Understand how the classifier uses banking data to decide who is likely to open a deposit account
- Understand how to interpret the predictive performance of the model and set score thresholds
Note that this is essentially the programmatic version of the demonstration shown online that made use of the AWS UI. It requires you to understand what was happening at each step.
Historical data for products like bank term deposit
https://s3.amazonaws.com/aml-sample-data/banking.csv
Data to test whether people will get a term deposit
https://s3.amazonaws.com/aml-sample-data/banking-batch.csv
Put these files in your S3 bucket
The explanation of the attributes in this data and how they are predictive is here:
Use boto3's machine learning call 'create_data_source_from_s3
You will need to create two data sources - one that is used for training and the other that is used for testing.
Use defaults where you can. ComputeStatistics needs to be set to true.
Use your student number to identify the data source
Use the schema listed below
Use create_ml_model to create a machine learning model from the data source that you created for training
Use BINARY as the category of supervised model.
Use your student number to identify the ml model
use create_evaluation to evaluate the model using the data source you created for evaluation.
use get_evaluation to get data about the evaluation including the Performance Metrics from your evaluation.
NOTE do this on your VirtualBox VM
schema file
{
"excludedAttributeNames": [],
"version": "1.0",
"dataFormat": "CSV",
"rowId": null,
"dataFileContainsHeader": true,
"attributes": [
{
"attributeName": "age",
"attributeType": "NUMERIC"
},
{
"attributeName": "job",
"attributeType": "CATEGORICAL"
},
{
"attributeName": "marital",
"attributeType": "CATEGORICAL"
},
{
"attributeName": "education",
"attributeType": "CATEGORICAL"
},
{
"attributeName": "default",
"attributeType": "CATEGORICAL"
},
{
"attributeName": "housing",
"attributeType": "CATEGORICAL"
},
{
"attributeName": "loan",
"attributeType": "CATEGORICAL"
},
{
"attributeName": "contact",
"attributeType": "CATEGORICAL"
},
{
"attributeName": "month",
"attributeType": "CATEGORICAL"
},
{
"attributeName": "day_of_week",
"attributeType": "CATEGORICAL"
},
{
"attributeName": "duration",
"attributeType": "NUMERIC"
},
{
"attributeName": "campaign",
"attributeType": "NUMERIC"
},
{
"attributeName": "pdays",
"attributeType": "NUMERIC"
},
{
"attributeName": "previous",
"attributeType": "BINARY"
},
{
"attributeName": "poutcome",
"attributeType": "CATEGORICAL"
},
{
"attributeName": "emp_var_rate",
"attributeType": "NUMERIC"
},
{
"attributeName": "cons_price_idx",
"attributeType": "NUMERIC"
},
{
"attributeName": "cons_conf_idx",
"attributeType": "NUMERIC"
},
{
"attributeName": "euribor3m",
"attributeType": "NUMERIC"
},
{
"attributeName": "nr_employed",
"attributeType": "NUMERIC"
},
{
"attributeName": "y",
"attributeType": "BINARY"
}
],
"targetAttributeName": "y"
}
Lab Assessment: This semester all labs will be assessed as "Lab notes". You should follow all steps in each lab and include your own comments. In addition, include screenshots showing the output for every commandline instruction that you execute in the terminal and any other relevant screenshots that demonstrate you followed the steps from the corresponding lab. Please also include any linux or python script that you create and the corresponding output you get when executed. Please submit a single PDF file. The formatting is up to you but a well organised structure of your notes is appreciated.