Training and Testing Models

Jump to bottom

SimonLee edited this page Aug 10, 2019 · 2 revisions

Training and Testing Models （训练与测试模型）

Intro

how well is my model doing? （我的模型表现如何）
how do we improve it based on these metrics? （如何根据这些指标改善模型）

Outline

Problem （问题）
Tools （模型）
Measurement Tools （评测）

Stats Refresher

Mean （均值/期望）
Median （中值）
Variance （方差）

Statistics Refresher （统计学基础）

Loading data into Pandas

import pandas
data = pandas.read_csv("file_name.csv")

Pandas Refresher （Pandas基础）

NumPy Arrays

Numpy Refresher （Numpy基础）

Training models in sklearn （训练模型）

Logistic Regression （逻辑回归）
Neural Networks （神经网络）
Decision Trees （决策树）
Support Vector Machines （支持向量机）

sklearn Refresher （sklearn基础）

Tuning Parameters Manually （手动调参）

随着参数数量的增加，调参越来越困难

Tuning Parameters Automatically （自动调参）

Train （训练模型）
Test （测试模型）
Evaluate （评估模型）
Validate （验证模型）

Testing your models （测试模型）

Regression （回归）
Classification （分类）
Testing set make it more general （测试集使模型泛化能力更好）
Never use your testing data for training （绝对不要使用测试集训练模型）

Quiz: Testing in sklearn

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25)