Skip to content

Commit

Permalink
Add table of contents
Browse files Browse the repository at this point in the history
  • Loading branch information
PashaBarahimi committed Jun 16, 2024
1 parent e10135c commit 7331e47
Showing 1 changed file with 95 additions and 40 deletions.
135 changes: 95 additions & 40 deletions Project/src/Regression Analysis.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -16,24 +16,79 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Project - Phase 2 - Regression Analysis"
"# <a id='toc1_'></a>[Project - Phase 2 - Regression Analysis](#toc0_)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Introduction\n",
"**Table of contents**<a id='toc0_'></a> \n",
"- [Project - Phase 2 - Regression Analysis](#toc1_) \n",
" - [Introduction](#toc1_1_) \n",
" - [Objectives](#toc1_2_) \n",
" - [Tasks](#toc1_3_) \n",
" - [Environment Setup](#toc1_4_) \n",
" - [Steps](#toc1_5_) \n",
" - [Loading the Data](#toc1_5_1_) \n",
" - [Preprocessing](#toc1_5_2_) \n",
" - [Converting Boolean Features to Binary](#toc1_5_2_1_) \n",
" - [Splitting the Data into Features and Target](#toc1_5_2_2_) \n",
" - [Scaling the Features](#toc1_5_2_3_) \n",
" - [Scaling the Target](#toc1_5_2_4_) \n",
" - [Feature Engineering and Selection](#toc1_5_3_) \n",
" - [Constants](#toc1_5_3_1_) \n",
" - [Body Type_Cabriolet](#toc1_5_3_1_1_) \n",
" - [Correlation](#toc1_5_3_2_) \n",
" - [Fuel Consumption](#toc1_5_3_2_1_) \n",
" - [Dimensionality Reduction](#toc1_5_4_) \n",
" - [PCA](#toc1_5_4_1_) \n",
" - [Two Dimensions](#toc1_5_4_1_1_) \n",
" - [95% Variance](#toc1_5_4_1_2_) \n",
" - [Evaluation Metrics](#toc1_5_5_) \n",
" - [Model Training](#toc1_5_6_) \n",
" - [Method One: Neural Network](#toc1_5_6_1_) \n",
" - [Methods Two & Three](#toc1_5_6_2_) \n",
" - [Linear Regression](#toc1_5_6_2_1_) \n",
" - [SVM](#toc1_5_6_2_2_) \n",
" - [Random Forest](#toc1_5_6_2_3_) \n",
" - [Gradient Boosting](#toc1_5_6_2_4_) \n",
" - [KNN](#toc1_5_6_2_5_) \n",
" - [Decision Tree](#toc1_5_6_2_6_) \n",
" - [Comparison](#toc1_5_6_3_) \n",
" - [Feature Analysis](#toc1_5_7_) \n",
" - [Random Forest](#toc1_5_7_1_) \n",
" - [Gradient Boosting](#toc1_5_7_2_) \n",
" - [Neural Network](#toc1_5_7_3_) \n",
" - [Comparison](#toc1_5_7_4_) \n",
" - [Overall Report and Discussions](#toc1_5_8_) \n",
" - [References](#toc1_6_) \n",
"\n",
"<!-- vscode-jupyter-toc-config\n",
"\tnumbering=false\n",
"\tanchor=true\n",
"\tflat=false\n",
"\tminLevel=1\n",
"\tmaxLevel=6\n",
"\t/vscode-jupyter-toc-config -->\n",
"<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## <a id='toc1_1_'></a>[Introduction](#toc0_)\n",
"\n",
"The dataset contains information about car features and their selling price in South Africa. The objective is to predict the selling price of the cars based on the features provided.\n",
"\n",
"## Objectives\n",
"## <a id='toc1_2_'></a>[Objectives](#toc0_)\n",
"\n",
"The purpose of this phase is as follows:\n",
"\n",
"1. To build a machine learning model that predicts the selling price of the cars based on the features provided.\n",
"\n",
"## Tasks\n",
"## <a id='toc1_3_'></a>[Tasks](#toc0_)\n",
"\n",
"- Preprocessing (if necessary)\n",
"- Feature Engineering and Selection\n",
Expand All @@ -48,7 +103,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Environment Setup\n",
"## <a id='toc1_4_'></a>[Environment Setup](#toc0_)\n",
"\n",
"We'll begin by setting up your Python environment and installing the necessary libraries."
]
Expand Down Expand Up @@ -120,14 +175,14 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Steps"
"## <a id='toc1_5_'></a>[Steps](#toc0_)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Loading the Data"
"### <a id='toc1_5_1_'></a>[Loading the Data](#toc0_)"
]
},
{
Expand Down Expand Up @@ -558,7 +613,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Preprocessing\n",
"### <a id='toc1_5_2_'></a>[Preprocessing](#toc0_)\n",
"\n",
"In the previous section (EDA), we were asked to perform various data preprocessing operations as needed in the data analysis stages. In this section, we'll perform these operations with a machine learning algorithms approach **(Note: if we have already completed the following steps in the previous phase, there is no need to repeat them here).**\n",
"\n",
Expand All @@ -576,7 +631,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Converting Boolean Features to Binary\n",
"#### <a id='toc1_5_2_1_'></a>[Converting Boolean Features to Binary](#toc0_)\n",
"\n",
"This step is not necessary as in the scaling step, these features will be converted to binary. However, we can convert them to binary in this step explicitly."
]
Expand Down Expand Up @@ -668,7 +723,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Splitting the Data into Features and Target\n",
"#### <a id='toc1_5_2_2_'></a>[Splitting the Data into Features and Target](#toc0_)\n",
"\n",
"We need to split the data into features and target variables. The target variable is the selling price of the cars, and the features are the remaining columns except for the `Finance Price` column which has a high correlation with the target variable."
]
Expand All @@ -687,7 +742,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Scaling the Features\n",
"#### <a id='toc1_5_2_3_'></a>[Scaling the Features](#toc0_)\n",
"\n",
"We need to scale the features to ensure that all features contribute equally to the result. If we don't scale the features, the model may give more weight to features with higher values which may lead to a poor model and slower convergence. We can use the `StandardScaler` class from the `sklearn.preprocessing` module to scale the features."
]
Expand Down Expand Up @@ -1109,7 +1164,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Scaling the Target\n",
"#### <a id='toc1_5_2_4_'></a>[Scaling the Target](#toc0_)\n",
"\n",
"As the target variable is the price of the cars and it contains a wide range of values, we need to scale it to make the training process easier for the model."
]
Expand Down Expand Up @@ -1226,7 +1281,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Feature Engineering and Selection\n",
"### <a id='toc1_5_3_'></a>[Feature Engineering and Selection](#toc0_)\n",
"\n",
"Based on the nature of our data, we'll apply specific feature engineering techniques to enhance the quality of our features. We may change these techniques after we check the performance of our models, to improve the metrics. Also, we may decide to select only specific features among all of our features for the next steps.\n",
"\n",
Expand All @@ -1244,14 +1299,14 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Constants"
"#### <a id='toc1_5_3_1_'></a>[Constants](#toc0_)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### Body Type_Cabriolet\n",
"##### <a id='toc1_5_3_1_1_'></a>[Body Type_Cabriolet](#toc0_)\n",
"\n",
"As the report shows, this feature is constant and has no predictive power. We'll drop this feature."
]
Expand Down Expand Up @@ -1282,7 +1337,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Correlation\n",
"#### <a id='toc1_5_3_2_'></a>[Correlation](#toc0_)\n",
"\n",
"Let's plot the correlation matrix for the features that have warning signs in the report."
]
Expand Down Expand Up @@ -1320,7 +1375,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"##### Fuel Consumption"
"##### <a id='toc1_5_3_2_1_'></a>[Fuel Consumption](#toc0_)"
]
},
{
Expand All @@ -1343,9 +1398,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Dimensionality Reduction\n",
"### <a id='toc1_5_4_'></a>[Dimensionality Reduction](#toc0_)\n",
"\n",
"#### PCA\n",
"#### <a id='toc1_5_4_1_'></a>[PCA](#toc0_)\n",
"\n",
"Using the PCA method, we'll reduce the dimensions of numerical features to two dimensions. How much of the initial data variance is transferred to the new space? If we aim to retain 95% of the original variance, what is the minimum number of dimensions required in the new space? **We'll save both the original data and the dimension-reduced one for the next parts.**"
]
Expand Down Expand Up @@ -1446,7 +1501,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"##### Two Dimensions"
"##### <a id='toc1_5_4_1_1_'></a>[Two Dimensions](#toc0_)"
]
},
{
Expand Down Expand Up @@ -1479,7 +1534,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"##### 95% Variance"
"##### <a id='toc1_5_4_1_2_'></a>[95% Variance](#toc0_)"
]
},
{
Expand Down Expand Up @@ -1548,7 +1603,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Evaluation Metrics\n",
"### <a id='toc1_5_5_'></a>[Evaluation Metrics](#toc0_)\n",
"\n",
"We will choose appropriate evaluation metrics based on the nature of the data and the project goal, and explain our reasons for choosing them."
]
Expand All @@ -1571,7 +1626,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Model Training\n",
"### <a id='toc1_5_6_'></a>[Model Training](#toc0_)\n",
"\n",
"In this section, we need to implement three methods to predict our target variable. First, we'll split the initial data (including all features) into training and test sets. This is done for both the original data and the dimension-reduced data. Also, we'll split the training set into training and validation sets to tune the hyperparameters of the models."
]
Expand Down Expand Up @@ -1860,7 +1915,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Method One: Neural Network\n",
"#### <a id='toc1_5_6_1_'></a>[Method One: Neural Network](#toc0_)\n",
"\n",
"We'll design and train a neural network for our goal. Then we'll report the following:\n",
"\n",
Expand Down Expand Up @@ -2043,7 +2098,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Methods Two & Three\n",
"#### <a id='toc1_5_6_2_'></a>[Methods Two & Three](#toc0_)\n",
"\n",
"We should choose two methods from the following based on the problem goals and train the models:\n",
"\n",
Expand All @@ -2068,7 +2123,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"##### Linear Regression"
"##### <a id='toc1_5_6_2_1_'></a>[Linear Regression](#toc0_)"
]
},
{
Expand Down Expand Up @@ -2127,7 +2182,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"##### SVM"
"##### <a id='toc1_5_6_2_2_'></a>[SVM](#toc0_)"
]
},
{
Expand Down Expand Up @@ -2239,7 +2294,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"##### Random Forest"
"##### <a id='toc1_5_6_2_3_'></a>[Random Forest](#toc0_)"
]
},
{
Expand Down Expand Up @@ -2355,7 +2410,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"##### Gradient Boosting"
"##### <a id='toc1_5_6_2_4_'></a>[Gradient Boosting](#toc0_)"
]
},
{
Expand Down Expand Up @@ -2471,7 +2526,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"##### KNN"
"##### <a id='toc1_5_6_2_5_'></a>[KNN](#toc0_)"
]
},
{
Expand Down Expand Up @@ -2583,7 +2638,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"##### Decision Tree"
"##### <a id='toc1_5_6_2_6_'></a>[Decision Tree](#toc0_)"
]
},
{
Expand Down Expand Up @@ -2695,7 +2750,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Comparison\n",
"#### <a id='toc1_5_6_3_'></a>[Comparison](#toc0_)\n",
"\n",
"Finally, we'll compare the three implemented methods. **Which method performed better? We should our analysis of this comparison.** Note that having 3 models is mandatory for this comparison, and adding more models, based on how this extra information improves the quality of our comparison, has a bonus score."
]
Expand Down Expand Up @@ -2821,7 +2876,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Feature Analysis\n",
"### <a id='toc1_5_7_'></a>[Feature Analysis](#toc0_)\n",
"\n",
"We'll train the best-performing method from the previous section using the dimension-reduced data. How did the model performance change? We should provide our analysis."
]
Expand All @@ -2837,7 +2892,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Random Forest"
"#### <a id='toc1_5_7_1_'></a>[Random Forest](#toc0_)"
]
},
{
Expand Down Expand Up @@ -2960,7 +3015,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Gradient Boosting"
"#### <a id='toc1_5_7_2_'></a>[Gradient Boosting](#toc0_)"
]
},
{
Expand Down Expand Up @@ -3083,7 +3138,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Neural Network"
"#### <a id='toc1_5_7_3_'></a>[Neural Network](#toc0_)"
]
},
{
Expand Down Expand Up @@ -3259,7 +3314,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Comparison"
"#### <a id='toc1_5_7_4_'></a>[Comparison](#toc0_)"
]
},
{
Expand Down Expand Up @@ -3331,7 +3386,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Overall Report and Discussions\n",
"### <a id='toc1_5_8_'></a>[Overall Report and Discussions](#toc0_)\n",
"\n",
"This is the last step of our project! We will provide a brief report about our main steps from phase 0 till the end of this phase. We don’t want detailed information in this report; only mentioning key decisions and ideas is enough. This will show the roadmap of our project. Also we should mention the problems and challenges we faced and our solutions for them, along with some alternatives."
]
Expand Down Expand Up @@ -3468,7 +3523,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## References"
"## <a id='toc1_6_'></a>[References](#toc0_)"
]
}
],
Expand All @@ -3488,7 +3543,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.19"
"version": "3.12.2"
}
},
"nbformat": 4,
Expand Down

0 comments on commit 7331e47

Please sign in to comment.