[ Fixes: #2 ] Training and Testing a Classification Model- Ensemble Method (Forests of Randomized Trees) on defaults.csv #53
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
[ Fixes #2 ]
Ensemble method- Forests of Randomized Trees under 'sklearn' has been implemented on defaults.csv
It includes the following:
Exploratory data analysis
Data pre-processing
Hyper-parameter tuning of the model (done manually through experimentation)
Fitting and Prediction on train-test data
Computing evaluating metrics
Note: In pre-processing KMeans clustering has also been implemented to achieve better results.
The classification report looks as follows:
pandas_profiling
is an extra library that has been added to the environment and it is of great advantage which are stated below:For each column the following statistics - if relevant for the column type - are presented in an interactive HTML report: