css:	presentation.css
skip-help:	true
data-transition-duration:	1000

My talk at the University of Mauritius

id:	title-slide

Introduction to Machine Learning

Dr. Omri Har-Shemesh

30. August 2019, University of Mauritius

data-y:	r1000

About me

id:	background-physics

Background: M.Sc. in Physics

Background: Ph.D. in Computational Science

Current: Data Science / Programming @ schmiede.one

data-rotate-y:	0
data-x:	3000
data-y:	0

Introduction to Machine Learning

Articifial Intelligence?

data-rotate-y:	90
class:	wide-step

What about AI?

data-rotate-y:	90
data-y:	r1000

What about AI?

data-rotate-y:	90
data-y:	r1000

What about AI?

data-y:	0

data-rotate-y:	0

data-y:	r1000

So what is Machine Learning?

Machine Learning: Field of study that gives computers the ability to learn without begin explicitly programmed.

Arthur Samuel (1959)

A methodology and a collection of algorithms designed to discover and exploit meaningful patterns in raw data.

Omri Har-Shemesh (now)

data-y:	r1000

Is this possible?

class:	wide-step

Is this possible?

class:	very-wide-step

Is this possible?

class:	very-wide-step

Is this possible?

class:	very-wide-step

Is this possible?

Yes!

data-y:	0
data-x:	5000

How is this possible?

Data + Computing Power + Some Clever Ideas

class:	quite-wide-step
data-y:	r1000

What is data?

Numbers (scalars, vectors, tensors)
Categories
- Yes/No
- Very satisfied / Satisfied / ... / Not satisfied
- Cat / Dog / Building
Images
Sound
Text

Web logs
Financial records
Website interactions
Customer service contacts
Social media interactions
Machine logs
Medical records
...

class:	wide-step

What is data?

Three types of machine learning

Supervised Learning
Unsupervised Learning
Reinforcement Learning

class:	wide-step

Supervised Learning

class:	wide-step

Unsupervised Learning

class:	wide-step

Reinforcement Learning

class:	very-wide-step white-back

Two types of supervised learning

Regression
You're predicting a number
Classification
You're predicting a category

Example: linear regression

We have data

X = [x_1, x_2, x_3, \ldots]

y = [y_1, y_2, y_3, \ldots]

We want to predict, for a given new X, the appropriate y.

Example: linear regression

First step - assume a model:

y \sim aX + b + \epsilon

Second step - define a loss function

L = \sum\limits_{i=1}^N (y_i - \hat{y}_i)^2

class:	wide-step

Example: linear regression

First step - assume a model:

y \sim aX + b + \epsilon

Second step - define a loss function

L = \sum\limits_{i=1}^N (y_i - \hat{y}_i)^2

Third step - find parameters a and b that minimize L.

\frac{\partial}{\partial a} L = 0 \\ \frac{\partial}{\partial b}L = 0

class:	very-wide-step

Demo

Potential Problems

Data too noisy
Model too simple (high bias / underfitting)
Model too complex (high variance / overfitting)

Bias-Variance Tradeoff

E \left[ \left(y - \hat{f}(x)\right)^2 \right] = \left(E[\hat{f}(x)] - y
\right)^2
+ E[\hat{f}^2(x)] - E[\hat{f}(x)]^2 + \sigma^2

What can we do?

If data is too noisy - get better data! You can't solve this with ML.
If model too simple - complicate the model (easy).
Model overfitting the data?
- Use test set to know you're overfitting.
- Penalized loss function.
- Add more data (make it harder to overfit).
- Fine tune the parameters of the model.
- Model specific tricks (e.g. Dropout layers).

class:	very-wide-step white-back

Are we done?

data-scale:	7
data-y:	3000
data-x:	4000

data-scale:	1
data-y:	0
data-x:	7000
class:	white-back

Introducing Python

data-y:	r1000

pandas

A great package for

Data exploration
Data cleaning
Data wrangling
Data visualization (together with matplotlib)
Data preparation

class:	very-wide-step

scikit-learn

class:	quite-wide-step

scikit-learn: small demo

def generate_data(n_points=10, eps=0.1, seed=None):
    if seed:
        np.random.seed(seed)
    if n_points == None:
        X = np.linspace(-2.2, 3.2, 1000)
        n_points = 1000
    else:
        X = np.random.uniform(low=-2, high=3, size=n_points)
        np.sort(X)
    y = X**3 - 2 * X ** 2 + 1.5 ** X + 5 + eps * np.random.normal(size=n_points)
    return X, y

class:	quite-wide-step

scikit-learn: small demo

# Simple Example - Linear regression with a linear feature set
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

X, y = generate_data(n_points=100, eps=2)
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.1)

model = LinearRegression()
model.fit(X_train, y_train)

prediction = model.predict(X)

score = model.score(X_test, y_test)

class:	quite-wide-step

scikit-learn: small demo

# Let's do the same with polynomial fit
from sklearn.preprocessing import PolynomialFeatures

pf = PolynomialFeatures(degree=3)
pf.fit(X_train)
X_train_poly = pf.transform(X_train)

model = LinearRegression()
model.fit(X_train_poly, y_train)

class:	quite-wide-step

scikit-learn: small demo

# Or easier, combine them in a pipeline!
from sklearn.pipeline import make_pipeline

pipeline = make_pipeline(
    PolynomialFeatures(degree=3),
    LinearRegression()
)

pipeline.fit(X_train, y_train)
pipeline.score(X_test, y_test)

class:	quite-wide-step

class:	quite-wide-step

A small challenge :)

class:	very-wide-step

And a small plug...

data-y:	-2000
data-x:	4000
data-scale:	15

Thank you!

Dr. Omri Har-Shemesh, data scientist @

Files

presentation.rst

Latest commit

History

presentation.rst

File metadata and controls

Introduction to Machine Learning

Dr. Omri Har-Shemesh

30. August 2019, University of Mauritius

About me

Background: M.Sc. in Physics

Background: Ph.D. in Computational Science

Current: Data Science / Programming @ schmiede.one

Introduction to Machine Learning

What about AI?

What about AI?

What about AI?

So what is Machine Learning?

Is this possible?

Is this possible?

Is this possible?

Is this possible?

Is this possible?

Yes!

How is this possible?

Data + Computing Power + Some Clever Ideas

What is data?

What is data?

Three types of machine learning

Supervised Learning

Unsupervised Learning

Reinforcement Learning

Two types of supervised learning

Example: linear regression

Example: linear regression

Example: linear regression

Example: linear regression

Demo

Potential Problems

Bias-Variance Tradeoff

What can we do?

Are we done?

Introducing Python

pandas

A great package for

scikit-learn

scikit-learn: small demo

scikit-learn: small demo

scikit-learn: small demo

scikit-learn: small demo

A small challenge :)

And a small plug...

Thank you!