Skip to content

Training and Testing Models

SimonLee edited this page Aug 10, 2019 · 2 revisions

Training and Testing Models (训练与测试模型)

Intro

  • how well is my model doing? (我的模型表现如何)

  • how do we improve it based on these metrics? (如何根据这些指标改善模型)

Outline

  • Problem (问题)

  • Tools (模型)

  • Measurement Tools (评测)

Stats Refresher

  • Mean (均值/期望)

  • Median (中值)

  • Variance (方差)

Statistics Refresher (统计学基础)

Loading data into Pandas

import pandas
data = pandas.read_csv("file_name.csv")

Pandas Refresher (Pandas基础)

NumPy Arrays

Numpy Refresher (Numpy基础)

Training models in sklearn (训练模型)

  • Logistic Regression (逻辑回归)

  • Neural Networks (神经网络)

  • Decision Trees (决策树)

  • Support Vector Machines (支持向量机)

sklearn Refresher (sklearn基础)

Tuning Parameters Manually (手动调参)

  • 随着参数数量的增加,调参越来越困难

Tuning Parameters Automatically (自动调参)

  • Train (训练模型)

  • Test (测试模型)

  • Evaluate (评估模型)

  • Validate (验证模型)

Testing your models (测试模型)

  • Regression (回归)

  • Classification (分类)

  • Testing set make it more general (测试集使模型泛化能力更好)

  • Never use your testing data for training (绝对不要使用测试集训练模型)

Quiz: Testing in sklearn

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25)