[ Fixes: #78 ] Covariate Data #80

iamarchisha · 2020-03-18T05:53:07Z

[ Fixes: #78, #3 ]

StratifiedKFold and RandomForestClassifier has been used to detect covariates in data.
Improving performance by assigning importance weight using Density Ratio Estimation
Evaluation metric score before covariate analysis: 0.52
Evaluation metric score after covariate analysis: 0.96
False Negatives can be reduced using [ Fixes: #63 ]Learn from Misclassification #74

Updating master

Updating KaairaGupta/master

1. Added SVM classifier with outlier removal and hyperparameter tuning 2. Notebook is reused from issue#2 3. Added code to run multiple test/train split and test the accuracy of the model"

1. Increased the number of loops for test-train cycle

1. Updated test_train_split function 2. Updated function call in python notebook

1. Fixed Transformation Code 2. Added warnings

* visual for eeg * code restructured * mozilla#3 data-split space mapped * fixes issue3 * studied data splits for all classifiers * added graph in the loop * docstrings added * validation sets added * formatting * evaluated all classifiers * compared models * result added * indexed * removed plot-recall-curve * learning-curve * added models * env refresh * final estimate added * black formats * conclusion added

* visual for eeg * code restructured * mozilla#3 data-split space mapped * tabulated relation btw k and evaluation metrics * gain-lift charts of models * auc-roc implemented * fixes issue3 * studied data splits for all classifiers * added graph in the loop * docstrings added * validation sets added * formatting * evaluated all classifiers * compared models * result added * interprtation added * docstring, interpretation added * indexed * removed plot-recall-curve * shorten PR * conflict resolve Co-authored-by: mlopatka <[email protected]>

* Outreachy startup task 1. Added module with all functions toimport from jupyter notebook. 2. Added Jupyter Notebook with outputs. * made the dataset path a variable agrument for the function * Details about confusion matrix * Added functions to see scatterplot and explained confusion matrix and kernel method. * Removed redundant imports and cells. * Used standard scalar to standardize data * minor changes * minor changes * Added relative paths * Added the dataset in the repo and python black formatting * Changed path to relative path in repo

* Data Loaded from vehicles.csv * Data visulaization and training model with ifferent algorithms * Evaluation of model is done. * Changed model from Logistic Regression to Support Vector Machine At first attempt i used three differnet models but Logistic Regression , Support Vector Machine and Decision Tree, and the overall accuracy with LR was better than any other but with changing validation parameters in SVM classification , model accuacy increased from 82% to 88%. * Delete train and test model-checkpoint.ipynb * Changed file named. * all python modules were added * docstrings were added * labels added in confusion matrix * Histogram colors were changed into single color * solved histogram issue * Update modules.py * changes made in histogram * Update modules.py * sorted histogram * Update modules.py * labels were added for confusion matrix * Python Custom Modules were added * Update Vehicle_Classifier.ipynb * Update modules.py * Update modules.py * Update modules.py * added labels in confusion matrix * Update modules.py * Update modules.py * Update modules.py * Update modules.py * Update modules.py * Requested Changes were made * Update modules.py * change categorical data into numerical data * change areguments in LR model * Update modules.py * Update modules.py * Update modules.py * Update modules.py * Update modules.py * Update modules.py * Update modules.py * Update modules.py * Update modules.py * Update modules.py * Update modules.py * Update modules.py * Update modules.py * Update modules.py * Update modules.py * Update modules.py * Update modules.py * Update modules.py * Update modules.py * Requested changes were made * Update modules.py * Update modules.py * Update modules.py * Update Vehicle_Classifier.ipynb * updates file * Delete Untitled.ipynb * Update modules.py * code formatted using python Black * Requested changes were made * shifted classifier's code from modules.py to ModelEvaluation.py * removed learning curves from file * added function for model evaluation in Model Evaluation file * updated svm and lr * added comments * added doc strings * Update ModelEvaluation.py * Update ModelEvaluation.py * Update Vehicle_Classifier.ipynb * Update modules.py * added descriptions * added interpretations of visualization * creating another branch from master * solving branching issues * main visulaization file is added * Added module for visualization of missclassification * added docstring and reformatted to python black * added interpretation of misclassification * removed unnecessary comments * changed name of file from VehicleClassifier to TrainTestSplit_Traversal * custom module file of Train_Test_plit_Traversal is added * version 2 of train_test_split_traversal is added with some changes plus code is reformatted with python black * changed plot labels * removed error * Update TrainTest_Split_Traversal.py * minor changes were done * chnaged file names and custom module file of CrossValidationFold_Traversal is added * added docstrings and resolved rrors * all files are moved to folder * added docstring and file reformatted to python black * added comments

…ozilla#32) * 1. Simple scatter plot 2. Violin and Box plots * Added plots for visualising misclassifications for Logistic regression and SVM classification. * Revert "1. Simple scatter plot" This reverts commit cb26342. * Removed redundant commits and updated notebook according to start up task * Changed to csv file from the repo and updated notebook acc to PR mozilla#22 * Refactored code in module and added python black formatting * 1. Added ROC curves with AUC 2. Formatting and refactoring. * Minor changes

* Create Readme.md * Create files for exploring issue mozilla#2 * Format using black * Remove notebook from master * Increase modularization * create file for issue 6 * remove file added by mistake * Create notebook for issue 6 * Re-upload to the right folder * Delete file from the incorrect folder

* initial commit * updated notebook template to get started * added comment in notebook * Added notebook for issue#5 - calibration plots 1. Added module to plot calibration curve 2. Notebook to read data and display the calibration plot * fixed formatting

mlopatka

I have reviewed the notebook's rendered version in github, the progress looks good.
And while I appreciate you taking the initiative to use another dataset, this PR needs to be refactored to merge in to master.

The *.zip file is still included in the PR despite it's addition to the .gitignore. If you want to work with this data set please submit a PR that adds the extracted CSV into the repo's main dataset directory and then load it from that source as with other datasets.

* adds .ipynb and .py * updates code by formatting

* Data Loaded from vehicles.csv * Data visulaization and training model with ifferent algorithms * Evaluation of model is done. * Changed model from Logistic Regression to Support Vector Machine At first attempt i used three differnet models but Logistic Regression , Support Vector Machine and Decision Tree, and the overall accuracy with LR was better than any other but with changing validation parameters in SVM classification , model accuacy increased from 82% to 88%. * Delete train and test model-checkpoint.ipynb * Changed file named. * all python modules were added * docstrings were added * labels added in confusion matrix * Histogram colors were changed into single color * solved histogram issue * Update modules.py * changes made in histogram * Update modules.py * sorted histogram * Update modules.py * labels were added for confusion matrix * Python Custom Modules were added * Update Vehicle_Classifier.ipynb * Update modules.py * Update modules.py * Update modules.py * added labels in confusion matrix * Update modules.py * Update modules.py * Update modules.py * Update modules.py * Update modules.py * Requested Changes were made * Update modules.py * change categorical data into numerical data * change areguments in LR model * Update modules.py * Update modules.py * Update modules.py * Update modules.py * Update modules.py * Update modules.py * Update modules.py * Update modules.py * Update modules.py * Update modules.py * Update modules.py * Update modules.py * Update modules.py * Update modules.py * Update modules.py * Update modules.py * Update modules.py * Update modules.py * Update modules.py * Requested changes were made * Update modules.py * Update modules.py * Update modules.py * Update Vehicle_Classifier.ipynb * updates file * Delete Untitled.ipynb * Update modules.py * code formatted using python Black * Requested changes were made * shifted classifier's code from modules.py to ModelEvaluation.py * removed learning curves from file * added function for model evaluation in Model Evaluation file * updated svm and lr * added comments * added doc strings * Update ModelEvaluation.py * Update ModelEvaluation.py * Update Vehicle_Classifier.ipynb * Update modules.py * added descriptions * added interpretations of visualization * creating another branch from master * solving branching issues * main visulaization file is added * Added module for visualization of missclassification * added docstring and reformatted to python black * added interpretation of misclassification * removed unnecessary comments * changed name of file from VehicleClassifier to TrainTestSplit_Traversal * custom module file of Train_Test_plit_Traversal is added * version 2 of train_test_split_traversal is added with some changes plus code is reformatted with python black * changed plot labels * removed error * Update TrainTest_Split_Traversal.py * minor changes were done * chnaged file names and custom module file of CrossValidationFold_Traversal is added * added docstrings and resolved rrors * all files are moved to folder * added docstring and file reformatted to python black * added comments * addedd calibrationplot .py * added docstrings * Update Calibration plot.ipynb

* Update .gitignore * Preliminary Analysis * Helper modules (Bar and Hist graph) * Rough KNN algorithm implemented * Delete libraries.py * KNN classifier refactored and polished Returns only variable of intests for use the metrics calculations. * refactored for performance just the required functions imported * draft mlp classifier implemented to be reviewed * ... * Threshold conversion logic implemented Since knn.predict calculates a probability, we implement a logic for binary classification * Prelimary cleaning and knn model classification implemented! * Adjusted plor error with title placement * ... * Files reformated with 'Black' * Logistic Regression classifier * Refactores modules to improve modularity * Implemented Log Reg * Deleted mpl module to focus on knn and log reg * Refactors gotignore to my personal folder * refactored for readability * Implementation to add counts and relative percentages on bars graph * Refactored name mozilla#2, Completed Prelimary Analysis and Interpreted Results * Update Issue mozilla#2 - Train and test a classification model (PRESC).ipynb * Files reformated with 'Black' * Display Error corrected * Interpreted choice of hyper-parameters * Refactored and Added Modules used for Issue 3 * Prelimanry Analysis - Traversal of the space of train_test splits * Issue#3 complete * Removed Issues mozilla#2 and mozilla#3 ipynb * Issue mozilla#4 - completed Issue mozilla#4 - Traversal of the space of cross-validation folds * Delete defaults_data.csv Removing duplication of the existing data set which can be loaded from the repos root directory. Co-authored-by: mlopatka <[email protected]>

…aset (mozilla#92) * Classification model wine.csv * Classification model wine.csv * Merging modifications

mozilla#2 Dropped quality, shifted the logic to python file, shifted imports to the top, added confusion_matrix and classification_report

Adding logistic regression for winequality.csv

) * fixes mozilla#8 * fixes mozilla#4, attempt 1 * updated missclassification graph and brokedown functions * first attempt to fix # 3 * implemeneted all change requests * formatted code for all helper files * minor fix * fixed code formatting issues and removed extra file * fixed code formatting, added docstring to func * fixed relative path * fixed all changes requested * fixed relative path in notebook * fixing conflict with some file changes * fixing attempt last for conflicts

* fixes mozilla#8 * fixes mozilla#4, attempt 1 * first attempt to fix # 3 * implemeneted all change requests * formatted code for all helper files * minor fix * fixed code formatting, added docstring to func * first attempt on 63 * fixing conflicts Co-authored-by: mlopatka <[email protected]>

…ear Model (Stochastic Gradient Descent) on winequality.csv (mozilla#58) * adds incomplete files * adds .ipynb, .py and updates environment.yml * Delete winequality.ipynb removing duplicate files * Delete winequality_modules.py removing duplicate files * Delete winequality.ipynb removing incomplete files * Delete winequality_modules.py removing incomplete files * adds .ipynb, .py and updates environment.yml * adds description and deatiled reasoning for the methods, models and parameters used * drops quality column * updates .py file * adds files in a new folder * updates .yml

…mozilla#111) * WIP: mozilla#2 on the dataset 'eeg.csv' WIP: mozilla#2 on the dataset 'eeg.csv' * Add files via upload * Delete WIP: mozilla#2 on the dataset 'eeg.csv' * Delete mozilla#2 Train and test a classification model, eeg.csv-checkpoint.ipynb * WIP: mozilla#2 on the dataset 'eeg.csv' * Delete mozilla#2 Train and test a classification model, eeg.csv-checkpoint.ipynb * WIP: mozilla#2 Train and test a classification model, eeg.csv dataset * Delete mozilla#2 Train and test a classification model, eeg.csv.ipynb * Create README * WIP: mozilla#2 Train and test a classification model, eeg.csv dataset * Delete README

issue#3:traversal-of-the-space-of-train-test-splits

For mozilla#2: on the dataset 'winequality.csv'

…variate

iamarchisha · 2020-03-29T04:45:35Z

I was trying to push lfs files in the branch and in the process I made a few mistakes. I tried but could not find a way to revert the changes made. Is it okay if I close this PR and make a new one? Or if there is something else that could help solve the problem?

mlopatka · 2020-03-30T15:17:10Z

@archisha-chandel can you link to the new PR in a comment?

iamarchisha · 2020-03-30T17:07:40Z

New PR #136 solves the issue #78

KaairaGupta and others added 30 commits March 7, 2020 12:44

Merge pull request mozilla#1 from mozilla/master

a71e28a

Updating master

winequality evaluation on different models

dd861e3

dropped quality column

4c42101

comparing different models using undersampling

fc55ac3

minor changes

b43a0a8

'TrainingandTestingClassificationSVM'

69a06c8

Merge pull request mozilla#2 from mozilla/master

b13e89f

Updating KaairaGupta/master

SVMandKNN_on_WineQualitydataset

3c1e178

SVMandKNN_on_WineQualitydataset

f03838b

SVMandKNN_on_WineQualitydataset

694146a

changes according to the review provided

a4f5ec6

Added new jupyter notebook for issue#3

45feaa4

SVM with multiple test/train split function" -m "

27320f3

1. Added SVM classifier with outlier removal and hyperparameter tuning 2. Notebook is reused from issue#2 3. Added code to run multiple test/train split and test the accuracy of the model"

Initial commits

f8de1d6

Added some changes

cd11e7a

Changes

03f40dc

Made some changes

fd7d0b1

updated description in the notebook

a9bdbfa

1. Increased the number of loops for test-train cycle

updated description in the notebook

7a6d033

1. Increased the number of loops for test-train cycle

Updated function for test-train split experiment

63cd20c

1. Updated test_train_split function 2. Updated function call in python notebook

removed idea folder

cf85f11

updated doc block of test_train_split function

ce946ea

Dropped Quality and used funtions

ed15177

Added 3 different functions to handle outliers

bb6aabe

Fixed the comments of the notebook

10ef45f

Fixed the comments of the notebook

e173cbb

Fixed comment in notebook

e3b89b6

Fixed transformation code

8d03143

1. Fixed Transformation Code 2. Added warnings

Made changes according to the review

49a3874

removed redundant files and kept code in a directory

7bbdf46

Addi-11 and others added 9 commits March 25, 2020 18:57

reformatted with black, added doc strings

34313e3

Update .gitignore

21fe2f1

mlopatka suggested changes Mar 27, 2020

View reviewed changes

iamarchisha and others added 14 commits March 27, 2020 14:33

[ Fixes: mozilla#8 ] Analyzing Importance of Data Points (mozilla#96)

6eae983

* adds .ipynb and .py * updates code by formatting

issue mozilla#2 Training classification model for winequality.csv dat…

5dc5a98

…aset (mozilla#92) * Classification model wine.csv * Classification model wine.csv * Merging modifications

Merge pull request mozilla#40 from simran0117/master

885e7b2

mozilla#2 Dropped quality, shifted the logic to python file, shifted imports to the top, added confusion_matrix and classification_report

Merge pull request mozilla#55 from Sumangrewal/master

7bef4bc

Adding logistic regression for winequality.csv

Merge pull request mozilla#57 from shashigharti/shashigharti/issue-3

4881da4

issue#3:traversal-of-the-space-of-train-test-splits

Merge pull request mozilla#18 from KaairaGupta/master

d08cd19

For mozilla#2: on the dataset 'winequality.csv'

clear git cache

44dd0b8

Merge branch 'covariate' of github.com:archisha-chandel/PRESC into co…

d09d519

…variate

iamarchisha closed this Mar 29, 2020

iamarchisha deleted the covariate branch March 29, 2020 12:18

iamarchisha mentioned this pull request Mar 31, 2020

[ Fixes: #78 ] Covariate Data #136

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ Fixes: #78 ] Covariate Data #80

[ Fixes: #78 ] Covariate Data #80

iamarchisha commented Mar 18, 2020 •

edited

Loading

mlopatka left a comment

iamarchisha commented Mar 29, 2020

mlopatka commented Mar 30, 2020

iamarchisha commented Mar 30, 2020

[ Fixes: #78 ] Covariate Data #80

[ Fixes: #78 ] Covariate Data #80

Conversation

iamarchisha commented Mar 18, 2020 • edited Loading

mlopatka left a comment

Choose a reason for hiding this comment

iamarchisha commented Mar 29, 2020

mlopatka commented Mar 30, 2020

iamarchisha commented Mar 30, 2020

iamarchisha commented Mar 18, 2020 •

edited

Loading