From 7331e472dbdd006381271e6305845e9b5d5a0fac Mon Sep 17 00:00:00 2001 From: Pasha Barahimi Date: Sun, 16 Jun 2024 19:04:34 +0330 Subject: [PATCH] Add table of contents --- Project/src/Regression Analysis.ipynb | 135 ++++++++++++++++++-------- 1 file changed, 95 insertions(+), 40 deletions(-) diff --git a/Project/src/Regression Analysis.ipynb b/Project/src/Regression Analysis.ipynb index f702750..d2c00d5 100644 --- a/Project/src/Regression Analysis.ipynb +++ b/Project/src/Regression Analysis.ipynb @@ -16,24 +16,79 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Project - Phase 2 - Regression Analysis" + "# [Project - Phase 2 - Regression Analysis](#toc0_)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "## Introduction\n", + "**Table of contents** \n", + "- [Project - Phase 2 - Regression Analysis](#toc1_) \n", + " - [Introduction](#toc1_1_) \n", + " - [Objectives](#toc1_2_) \n", + " - [Tasks](#toc1_3_) \n", + " - [Environment Setup](#toc1_4_) \n", + " - [Steps](#toc1_5_) \n", + " - [Loading the Data](#toc1_5_1_) \n", + " - [Preprocessing](#toc1_5_2_) \n", + " - [Converting Boolean Features to Binary](#toc1_5_2_1_) \n", + " - [Splitting the Data into Features and Target](#toc1_5_2_2_) \n", + " - [Scaling the Features](#toc1_5_2_3_) \n", + " - [Scaling the Target](#toc1_5_2_4_) \n", + " - [Feature Engineering and Selection](#toc1_5_3_) \n", + " - [Constants](#toc1_5_3_1_) \n", + " - [Body Type_Cabriolet](#toc1_5_3_1_1_) \n", + " - [Correlation](#toc1_5_3_2_) \n", + " - [Fuel Consumption](#toc1_5_3_2_1_) \n", + " - [Dimensionality Reduction](#toc1_5_4_) \n", + " - [PCA](#toc1_5_4_1_) \n", + " - [Two Dimensions](#toc1_5_4_1_1_) \n", + " - [95% Variance](#toc1_5_4_1_2_) \n", + " - [Evaluation Metrics](#toc1_5_5_) \n", + " - [Model Training](#toc1_5_6_) \n", + " - [Method One: Neural Network](#toc1_5_6_1_) \n", + " - [Methods Two & Three](#toc1_5_6_2_) \n", + " - [Linear Regression](#toc1_5_6_2_1_) \n", + " - [SVM](#toc1_5_6_2_2_) \n", + " - [Random Forest](#toc1_5_6_2_3_) \n", + " - [Gradient Boosting](#toc1_5_6_2_4_) \n", + " - [KNN](#toc1_5_6_2_5_) \n", + " - [Decision Tree](#toc1_5_6_2_6_) \n", + " - [Comparison](#toc1_5_6_3_) \n", + " - [Feature Analysis](#toc1_5_7_) \n", + " - [Random Forest](#toc1_5_7_1_) \n", + " - [Gradient Boosting](#toc1_5_7_2_) \n", + " - [Neural Network](#toc1_5_7_3_) \n", + " - [Comparison](#toc1_5_7_4_) \n", + " - [Overall Report and Discussions](#toc1_5_8_) \n", + " - [References](#toc1_6_) \n", + "\n", + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## [Introduction](#toc0_)\n", "\n", "The dataset contains information about car features and their selling price in South Africa. The objective is to predict the selling price of the cars based on the features provided.\n", "\n", - "## Objectives\n", + "## [Objectives](#toc0_)\n", "\n", "The purpose of this phase is as follows:\n", "\n", "1. To build a machine learning model that predicts the selling price of the cars based on the features provided.\n", "\n", - "## Tasks\n", + "## [Tasks](#toc0_)\n", "\n", "- Preprocessing (if necessary)\n", "- Feature Engineering and Selection\n", @@ -48,7 +103,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Environment Setup\n", + "## [Environment Setup](#toc0_)\n", "\n", "We'll begin by setting up your Python environment and installing the necessary libraries." ] @@ -120,14 +175,14 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Steps" + "## [Steps](#toc0_)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "### Loading the Data" + "### [Loading the Data](#toc0_)" ] }, { @@ -558,7 +613,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Preprocessing\n", + "### [Preprocessing](#toc0_)\n", "\n", "In the previous section (EDA), we were asked to perform various data preprocessing operations as needed in the data analysis stages. In this section, we'll perform these operations with a machine learning algorithms approach **(Note: if we have already completed the following steps in the previous phase, there is no need to repeat them here).**\n", "\n", @@ -576,7 +631,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "#### Converting Boolean Features to Binary\n", + "#### [Converting Boolean Features to Binary](#toc0_)\n", "\n", "This step is not necessary as in the scaling step, these features will be converted to binary. However, we can convert them to binary in this step explicitly." ] @@ -668,7 +723,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "#### Splitting the Data into Features and Target\n", + "#### [Splitting the Data into Features and Target](#toc0_)\n", "\n", "We need to split the data into features and target variables. The target variable is the selling price of the cars, and the features are the remaining columns except for the `Finance Price` column which has a high correlation with the target variable." ] @@ -687,7 +742,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "#### Scaling the Features\n", + "#### [Scaling the Features](#toc0_)\n", "\n", "We need to scale the features to ensure that all features contribute equally to the result. If we don't scale the features, the model may give more weight to features with higher values which may lead to a poor model and slower convergence. We can use the `StandardScaler` class from the `sklearn.preprocessing` module to scale the features." ] @@ -1109,7 +1164,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "#### Scaling the Target\n", + "#### [Scaling the Target](#toc0_)\n", "\n", "As the target variable is the price of the cars and it contains a wide range of values, we need to scale it to make the training process easier for the model." ] @@ -1226,7 +1281,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Feature Engineering and Selection\n", + "### [Feature Engineering and Selection](#toc0_)\n", "\n", "Based on the nature of our data, we'll apply specific feature engineering techniques to enhance the quality of our features. We may change these techniques after we check the performance of our models, to improve the metrics. Also, we may decide to select only specific features among all of our features for the next steps.\n", "\n", @@ -1244,14 +1299,14 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "#### Constants" + "#### [Constants](#toc0_)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "##### Body Type_Cabriolet\n", + "##### [Body Type_Cabriolet](#toc0_)\n", "\n", "As the report shows, this feature is constant and has no predictive power. We'll drop this feature." ] @@ -1282,7 +1337,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "#### Correlation\n", + "#### [Correlation](#toc0_)\n", "\n", "Let's plot the correlation matrix for the features that have warning signs in the report." ] @@ -1320,7 +1375,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "##### Fuel Consumption" + "##### [Fuel Consumption](#toc0_)" ] }, { @@ -1343,9 +1398,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Dimensionality Reduction\n", + "### [Dimensionality Reduction](#toc0_)\n", "\n", - "#### PCA\n", + "#### [PCA](#toc0_)\n", "\n", "Using the PCA method, we'll reduce the dimensions of numerical features to two dimensions. How much of the initial data variance is transferred to the new space? If we aim to retain 95% of the original variance, what is the minimum number of dimensions required in the new space? **We'll save both the original data and the dimension-reduced one for the next parts.**" ] @@ -1446,7 +1501,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "##### Two Dimensions" + "##### [Two Dimensions](#toc0_)" ] }, { @@ -1479,7 +1534,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "##### 95% Variance" + "##### [95% Variance](#toc0_)" ] }, { @@ -1548,7 +1603,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Evaluation Metrics\n", + "### [Evaluation Metrics](#toc0_)\n", "\n", "We will choose appropriate evaluation metrics based on the nature of the data and the project goal, and explain our reasons for choosing them." ] @@ -1571,7 +1626,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Model Training\n", + "### [Model Training](#toc0_)\n", "\n", "In this section, we need to implement three methods to predict our target variable. First, we'll split the initial data (including all features) into training and test sets. This is done for both the original data and the dimension-reduced data. Also, we'll split the training set into training and validation sets to tune the hyperparameters of the models." ] @@ -1860,7 +1915,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "#### Method One: Neural Network\n", + "#### [Method One: Neural Network](#toc0_)\n", "\n", "We'll design and train a neural network for our goal. Then we'll report the following:\n", "\n", @@ -2043,7 +2098,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "#### Methods Two & Three\n", + "#### [Methods Two & Three](#toc0_)\n", "\n", "We should choose two methods from the following based on the problem goals and train the models:\n", "\n", @@ -2068,7 +2123,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "##### Linear Regression" + "##### [Linear Regression](#toc0_)" ] }, { @@ -2127,7 +2182,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "##### SVM" + "##### [SVM](#toc0_)" ] }, { @@ -2239,7 +2294,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "##### Random Forest" + "##### [Random Forest](#toc0_)" ] }, { @@ -2355,7 +2410,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "##### Gradient Boosting" + "##### [Gradient Boosting](#toc0_)" ] }, { @@ -2471,7 +2526,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "##### KNN" + "##### [KNN](#toc0_)" ] }, { @@ -2583,7 +2638,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "##### Decision Tree" + "##### [Decision Tree](#toc0_)" ] }, { @@ -2695,7 +2750,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "#### Comparison\n", + "#### [Comparison](#toc0_)\n", "\n", "Finally, we'll compare the three implemented methods. **Which method performed better? We should our analysis of this comparison.** Note that having 3 models is mandatory for this comparison, and adding more models, based on how this extra information improves the quality of our comparison, has a bonus score." ] @@ -2821,7 +2876,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Feature Analysis\n", + "### [Feature Analysis](#toc0_)\n", "\n", "We'll train the best-performing method from the previous section using the dimension-reduced data. How did the model performance change? We should provide our analysis." ] @@ -2837,7 +2892,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "#### Random Forest" + "#### [Random Forest](#toc0_)" ] }, { @@ -2960,7 +3015,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "#### Gradient Boosting" + "#### [Gradient Boosting](#toc0_)" ] }, { @@ -3083,7 +3138,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "#### Neural Network" + "#### [Neural Network](#toc0_)" ] }, { @@ -3259,7 +3314,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "#### Comparison" + "#### [Comparison](#toc0_)" ] }, { @@ -3331,7 +3386,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Overall Report and Discussions\n", + "### [Overall Report and Discussions](#toc0_)\n", "\n", "This is the last step of our project! We will provide a brief report about our main steps from phase 0 till the end of this phase. We don’t want detailed information in this report; only mentioning key decisions and ideas is enough. This will show the roadmap of our project. Also we should mention the problems and challenges we faced and our solutions for them, along with some alternatives." ] @@ -3468,7 +3523,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## References" + "## [References](#toc0_)" ] } ], @@ -3488,7 +3543,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.9.19" + "version": "3.12.2" } }, "nbformat": 4,