Performance Comparision of Trained models on eeg.csv #16

Addi-11 · 2020-03-07T18:12:21Z

Various classification models have been trained on the dataset.
Accuracy of each model measured.
The results can be seen in demo.ipynb

Further various evaluation metrics to be added and plotted for various data splits for issue 4.

dzeber

This PR does a nice job of defining the space of models to explore and exploring them methodically. In fact it looks like you have more models prepared than you included in the notebook. It would be great to include them! I'm curious to see how they performed. I like the classification report you prepared for each one.

That said, I would like to request a couple of updates:

Please make sure your code runs in the conda environment given in the repo. It looks like a few utilities you have used, eg. plot_confusion_matrix, are not available in the version of scikit-learn we are using. Please work around this.
You appear to have used docstrings ("""This is a docstring""") in the place of comments. Docstrings should only be used on the first line inside modules or functions and are treated as special objects by the interpreter. Please use comments (# This is a comment) elsewhere in the modules and Markdown text cells inside the notebook.
To compare model performance, you should be using a validation set (or cross-validation) separate from the test set.
Please add a comment indicating why you decided on a 60/40 split.

Finally, please break off your work for #3 into a separate PR and we will review that part there.

Addi-11 · 2020-03-11T09:36:01Z

ThankYou for your review I will work on the required changes.

Addi-11 · 2020-03-13T12:59:09Z

I have changed and corrected my code as stated above.
I have even added graphs to help visualise and comapre the performance of all the stated models.

dzeber

Your notebook looks much better after these updates! It's really interesting to compare all the classifiers you tried, and you did a nice job visualizing the scores. One final thing: now that you've selected the K-NN model using your evaluation scores, you should train that model on the full training set (ie. train+validation) and compute the evaluation scores for it against the test set you held out. That is your final estimate of its performance.

There are a couple of outstanding administrative changes after which we'll be ready to merge.

Please remove the issue 3 work from this PR (it will be considered separately in fixes #3 #45). You should remove data_split_examine.py and issue3_demo.ipynb from the git index on this branch.
Please make sure your code runs in the environment set up in the repo. Follow steps 1-3 here and run your demo notebook. Then fix any import failures. Currently plot_precision_recall_curve and plot_confusion_matrix are not available in our environment, but you can easily work around them by just printing the confusion_matrix and plotting the precision_recall_curve directly. I would rather not update scikit-learn in the environment at the moment as it is a core package and may break other people's work.

mlopatka

Raw pyhton (non notebook) files should adhere to python black formatting guidelines Please run these through black before merging into master.

Addi-11 · 2020-03-17T06:17:32Z

I will make the required changes.

dzeber · 2020-03-19T22:59:04Z

You still need to address the following:

now that you've selected the K-NN model using your evaluation scores, you should train that model on the full training set (ie. train+validation) and compute the evaluation scores for it against the test set you held out. That is your final estimate of its performance.
Please make sure your code runs in the environment set up in the repo. Follow steps 1-3 here and run your demo notebook. Then fix any import failures.

Refresh your conda environment (delete it and recreate it) as described in the README and make sure your notebook runs without error.

Addi-11 · 2020-03-21T21:48:00Z

I am sorry for the delay. I will work on it right away.

Addi-11 added 3 commits March 7, 2020 23:14

visual for eeg

7d4678f

code restructured

652c40e

mozilla#3 data-split space mapped

b65565c

Addi-11 changed the title ~~WIP: #2 on eeg, continuation on #4~~ WIP: #2 on eeg, continuation on #3 Mar 7, 2020

Addi-11 changed the title ~~WIP: #2 on eeg, continuation on #3~~ WIP: #2 on eeg, #3 mapping spliting space Mar 7, 2020

dzeber suggested changes Mar 10, 2020

View reviewed changes

Addi-11 added 2 commits March 11, 2020 17:14

fixes issue3

fef742f

studied data splits for all classifiers

acda00d

Addi-11 changed the title ~~WIP: #2 on eeg, #3 mapping spliting space~~ WIP: #2 on eeg Mar 11, 2020

added graph in the loop

937a62e

dzeber mentioned this pull request Mar 11, 2020

fixes #4 #25

Merged

Addi-11 added 5 commits March 12, 2020 19:27

docstrings added

c5f970c

validation sets added

cb677c2

formatting

3f52d77

evaluated all classifiers

72bb407

compared models

03df047

Addi-11 requested a review from dzeber March 13, 2020 12:56

Addi-11 changed the title ~~WIP: #2 on eeg~~ Performance Comparision of Trained models on eeg.csv Mar 13, 2020

result added

e1501c1

dzeber suggested changes Mar 14, 2020

View reviewed changes

Addi-11 requested a review from dzeber March 16, 2020 19:45

mlopatka approved these changes Mar 16, 2020

View reviewed changes

Addi-11 added 2 commits March 17, 2020 20:41

indexed

d9eeef2

removed plot-recall-curve

afc407c

env refresh

c6747c6

Addi-11 and others added 3 commits March 22, 2020 13:18

final estimate added

2eea815

Merge branch 'master' into dev_Addi

b45cf21

Merge branch 'master' into dev_Addi

9b8b586

mlopatka merged commit e5a8d54 into mozilla:master Mar 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance Comparision of Trained models on eeg.csv #16

Performance Comparision of Trained models on eeg.csv #16

Addi-11 commented Mar 7, 2020

dzeber left a comment

Addi-11 commented Mar 11, 2020

Addi-11 commented Mar 13, 2020

dzeber left a comment

mlopatka left a comment

Addi-11 commented Mar 17, 2020

dzeber commented Mar 19, 2020

Addi-11 commented Mar 21, 2020

Performance Comparision of Trained models on eeg.csv #16

Performance Comparision of Trained models on eeg.csv #16

Conversation

Addi-11 commented Mar 7, 2020

dzeber left a comment

Choose a reason for hiding this comment

Addi-11 commented Mar 11, 2020

Addi-11 commented Mar 13, 2020

dzeber left a comment

Choose a reason for hiding this comment

mlopatka left a comment

Choose a reason for hiding this comment

Addi-11 commented Mar 17, 2020

dzeber commented Mar 19, 2020

Addi-11 commented Mar 21, 2020