Machine-Learning-101

Pre Processing

Data Pre Processing Notes:

Pandas library is super important for data pre processing.
pd.read_csv(‘path/to/dataset’) creates a dataframe .
In any dataset with which you train a model
1. Features/Independant Variable: columns with which we predict dependant variable. Mostly present in first columns.
2. Labels/Dependant variable factor: This is usually in last column of dataset and is something which has to be predicted.
Missing data can be harmful for training the data, so in turn sometimes we end up filling it using various tools. (Like missing salary can be replaced by average salaries)
Encoding categorical data:
1. Sometimes there are some string categories of data which is difficult for the machine to understand so we encode this data.One simple way is to give numbers to each data. For eg Spain :1, Germany: 2 etc.
Better way is one hot encoding is turning the country column to three dfifferent something like. [1,0,0]: Spain [0,1,0]: Germany This will be easier for machine to interpret
Label encoding: dependant variable we use Label Encoder.
When should be feature scaling done?
1. It should be done after splitting the dataset into train and test.
2. Feature scaling is taking the variables in same scale.
3. Test set is supposed to be brand new, so should not be worked before training.
4. Feature scaling works on the data before training
Feature Scaling:
1. Allows us to put all the features to same scale a. Standardisation (- 3 to +3): it works well all the time b. Normalisation (0 to 1): recommended when distribution is normal

Go with Standardisation 10. Only apply transform on test dataset

Classficiation

Unlike classification where we predict a continuous number, we use classification to classify the the objects in to different category. It finds its use cases in a lot of medical and marketing fields.

Some main Classification problems are:

Logistic Regression
K nearest Neighbors
Support Vector Machine
Kernel SVM
Naive Bayes
Decision Tree Classification
Random Forest Classification

Logistic Regression:

Applying Linear regression to classification problem.

We stage a likelihood of the person taking an offer.

Linear Regression at least gives us the range of people taking the offer. The line in between 1 and 0 makes sense but not the line above or below it.

We cut of the lines from top and below. Apply a sigmoid function to /y = mx +c/

Logistic Regression

log(p/p-1) = b0 + b1x

An example with various age groups›

We use the probability to have a score. But what if we don’t want probability and ask for prediction.

Anything having probability of less than 0.5 its projected to 0 and anything above is projected upward.

K Nearest Neighbor:

Rule Guide:

Choose the number K of neighbours, default values is 5.
Take Manhattan distance or euclidian distance.
Count the data points in different category.
Assign new data point to the category where you counted most neighbours.

Euclidean distance

Take Euclidean distance from 5 points and assign the category.

Support Vector Machine

SVM tries to pick extreme cases of categories which is risky. If we are differentiating between apples and oranges most of algorithms will look at only most common features SVM looks at the boundary conditions and tries to create a Line to separate them.

Kernel SVM

What if the Data cannot be separated Linearly. In that case we use Kernel SVM

Map data into Linearly separable dataset using Higher Dimension.

Post applying some function, Hyperplane separates the data.

Mapping to higher dimension it can lead to more compute power being required

The Kernel Trick

If landmark is large then we get a value very close to zero. If the landmark is closer to reference point it be smaller and e^0 is 1.

We use the kernel to separate our data.

Anything out side circle will assigned 0, anything outside will be 1. Sigma defines how wide can the circumference of cycle can be. By finding the right sigma we find the distinction.

Types of kernel function:

Non Linear SVR

If we project hyperplane which is same as running a linear model in 3 D. We use a RBF vector to create a 3 D plot of data and run Hyplerplane to get minimum error

Naives Bayes:

How can the above be applied to train a model.

Decision Tree:

Splits in such a way to maximise a category in each split. It is very similar to Regression part of this, difference being in the algorithm to classify or regress.

Random Forest Classifier:

Ensemble learning is when you take a lot of models and train them together and take their average. For eg We can have multiple points on the basis of which we can device Decision tress who’s combined average gives us a comparatively good result.

CAP Analysis:

K Means Clustering

Move the centroids

When no new re assignments take place we can assume the algorithm Has converged.

The selection of centroid can hinder the selection of clusters. There is modification in K Means clustering algorithm which is k means ++ algorithm.

When there is only 1 cluster. The value of WCSS will be very large.

When clusters are increased to 2. The WCSS decreases.

When the number clusters is 3.

We can have as many clusters, but how to find optimal fit.

When the drop in WCSS becomes less, We see an elbow point which is the number of clusters required for data modelling

Regression

Regression model are used for predicting future values of a particular nature.

Simple Linear Regression:

Y = b0 + b1*X1

Y: Dependant Variable(Something you try to understand how is it dependant on something) X: Independent Variable which might and might not affect the dependant variable b1: Coefficient (connector between y and x) b0: Constant term.

If we are trying to figure out salary for x work ex: Salary = b0 + b1 * Experience

Constant : Intersection on Y axis (Starting salary) B1: Slope of the line.

Simple Linear Regression:

Simple Linear Regression will make many such lines from actual to assumed and calculate

SUM(y - y`)^2 -> min

The Line represents the predicted line trying to fit through test dataset.

Multiple Regression

Much more coefficients as compared to Linear regression Y = b0 + b1x1 + b2x2 + b3*x3…

There are some Independent variables or features which we have to throw out to increase the accuracy of the model. Only important features should be selected.

P- Value:

The value at which we discard a feature and assume its not adding any value to model.

Methods to select the correct features:

All in
Backward Elimination
Forward Selection
Bidirectional Elimination
Score Comparison

Stepwise regression is Backward, Forward and Bidirectional.

All in:

Throw all the variables to build the model. It is used for prepping for Backward Elimination

Backward Elimination:

Select significance level to stay in the model. 5%
Fit the full model with all possible predictors
Consider the predictor with the highest P value
1. If p > SL go to step 4 else fin.
Remove predictor
Fit the model without this variable.
Go to step 3

Forward Selection

Select significance level to 5 %
Fit all the simple regression models. Select 1 with lowest p values
Keep the variable and fit all possible models with one extra predictor added to one you already have.
Consider the predictor with the lowest P-value.if P value < SL go to step 3 else FIN.
Go to step 3. Keep the previous model

Bidirectional Elimination

STAY 5% and ENTER 5%
Perform next step of Forward selection p<ENTER
Perform next step of Backward selection p<STAY.go to step 2
Until no new variable.

All Models:

Select a criterion of goodness of fit.
Construct all possible Regression Models. 2^n-1
Select with nest criterion

Backward Elimination os the fastest.

Polynomial Linear Regression

Y = b0 + b1x1 + b2x2^2 + … If the data distribution is non linear we need a non linear curve to match the data better. We add vector for powers of the feature

They are used to describe diseases might spread etc. Why we didn’t split the data to training and test set? We have very few number of observations. So we take all the rows.

Comparing Linear and Polynomial Models using marplot

Support Vector Regression

Adds a layer of buffer to Linear regression Line. There are some points which lie out side Epsilon tube, these are slack variables. The min distance from slack variable defines the Linear Buffer line passing through the data.

The ones which have implicit relationship we have to apply feature scaling.

We also apply the feature scaling to Label/ Dependant variable.

Decision Trees

Classification Tress Regression Trees

Each partition is called a leaf. Algorithm finds splits and final leaves are called terminal leaves.

The above figure shows how the decision tree is constructed using the splits.

You take average of the points within the terminal leaves which will be assigned to prediction.

Add the above average values to Decision tree and the data coming in will use this decision tree to make predictions.

We don’t have to apply feature scaling for decision tree. They work with highly complex datasets

Random Forest Regression

This is version of ensemble learning, which when you take same algorithm multiple times to make it better.

Pick random data points from training set.
Build a decision tree based on data points selected above
Keep building regression trees.
Use all of them to predict.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Classification		Classification
Clustering		Clustering
DataPreProcessing		DataPreProcessing
HousePricePrediction		HousePricePrediction
Regression		Regression
RestaurantRevenue		RestaurantRevenue
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Machine-Learning-101

Pre Processing

Classficiation

Logistic Regression:

Logistic Regression

log(p/p-1) = b0 + b1x

K Nearest Neighbor:

Support Vector Machine

Kernel SVM

Mapping to higher dimension it can lead to more compute power being required

The Kernel Trick

Non Linear SVR

Naives Bayes:

Decision Tree:

Random Forest Classifier:

CAP Analysis:

K Means Clustering

Regression

Simple Linear Regression:

Multiple Regression

P- Value:

All in:

Backward Elimination:

Forward Selection

Bidirectional Elimination

All Models:

Polynomial Linear Regression

Support Vector Regression

Decision Trees

Random Forest Regression

About

Releases

Packages

Languages

sarthaktyagi-505/Machine-Learning-101

Folders and files

Latest commit

History

Repository files navigation

Machine-Learning-101

Pre Processing

Classficiation

Logistic Regression:

Logistic Regression

log(p/p-1) = b0 + b1x

K Nearest Neighbor:

Support Vector Machine

Kernel SVM

Mapping to higher dimension it can lead to more compute power being required

The Kernel Trick

Non Linear SVR

Naives Bayes:

Decision Tree:

Random Forest Classifier:

CAP Analysis:

K Means Clustering

Regression

Simple Linear Regression:

Multiple Regression

P- Value:

All in:

Backward Elimination:

Forward Selection

Bidirectional Elimination

All Models:

Polynomial Linear Regression

Support Vector Regression

Decision Trees

Random Forest Regression

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages