Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance Comparision of Trained models on eeg.csv #16

Merged
merged 18 commits into from
Mar 30, 2020

Conversation

Addi-11
Copy link
Contributor

@Addi-11 Addi-11 commented Mar 7, 2020

  • Various classification models have been trained on the dataset.
  • Accuracy of each model measured.
  • The results can be seen in demo.ipynb

Further various evaluation metrics to be added and plotted for various data splits for issue 4.

@Addi-11 Addi-11 changed the title WIP: #2 on eeg, continuation on #4 WIP: #2 on eeg, continuation on #3 Mar 7, 2020
@Addi-11 Addi-11 changed the title WIP: #2 on eeg, continuation on #3 WIP: #2 on eeg, #3 mapping spliting space Mar 7, 2020
Copy link
Contributor

@dzeber dzeber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR does a nice job of defining the space of models to explore and exploring them methodically. In fact it looks like you have more models prepared than you included in the notebook. It would be great to include them! I'm curious to see how they performed. I like the classification report you prepared for each one.

That said, I would like to request a couple of updates:

  • Please make sure your code runs in the conda environment given in the repo. It looks like a few utilities you have used, eg. plot_confusion_matrix, are not available in the version of scikit-learn we are using. Please work around this.
  • You appear to have used docstrings ("""This is a docstring""") in the place of comments. Docstrings should only be used on the first line inside modules or functions and are treated as special objects by the interpreter. Please use comments (# This is a comment) elsewhere in the modules and Markdown text cells inside the notebook.
  • To compare model performance, you should be using a validation set (or cross-validation) separate from the test set.
  • Please add a comment indicating why you decided on a 60/40 split.

Finally, please break off your work for #3 into a separate PR and we will review that part there.

@Addi-11
Copy link
Contributor Author

Addi-11 commented Mar 11, 2020

ThankYou for your review I will work on the required changes.

@Addi-11 Addi-11 changed the title WIP: #2 on eeg, #3 mapping spliting space WIP: #2 on eeg Mar 11, 2020
@dzeber dzeber mentioned this pull request Mar 11, 2020
@Addi-11 Addi-11 requested a review from dzeber March 13, 2020 12:56
@Addi-11
Copy link
Contributor Author

Addi-11 commented Mar 13, 2020

I have changed and corrected my code as stated above.
I have even added graphs to help visualise and comapre the performance of all the stated models.

@Addi-11 Addi-11 changed the title WIP: #2 on eeg Performance Comparision of Trained models on eeg.csv Mar 13, 2020
Copy link
Contributor

@dzeber dzeber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your notebook looks much better after these updates! It's really interesting to compare all the classifiers you tried, and you did a nice job visualizing the scores. One final thing: now that you've selected the K-NN model using your evaluation scores, you should train that model on the full training set (ie. train+validation) and compute the evaluation scores for it against the test set you held out. That is your final estimate of its performance.

There are a couple of outstanding administrative changes after which we'll be ready to merge.

  • Please remove the issue 3 work from this PR (it will be considered separately in fixes #3 #45). You should remove data_split_examine.py and issue3_demo.ipynb from the git index on this branch.
  • Please make sure your code runs in the environment set up in the repo. Follow steps 1-3 here and run your demo notebook. Then fix any import failures. Currently plot_precision_recall_curve and plot_confusion_matrix are not available in our environment, but you can easily work around them by just printing the confusion_matrix and plotting the precision_recall_curve directly. I would rather not update scikit-learn in the environment at the moment as it is a core package and may break other people's work.

@Addi-11 Addi-11 requested a review from dzeber March 16, 2020 19:45
Copy link
Contributor

@mlopatka mlopatka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raw pyhton (non notebook) files should adhere to python black formatting guidelines Please run these through black before merging into master.

@Addi-11
Copy link
Contributor Author

Addi-11 commented Mar 17, 2020

I will make the required changes.

@dzeber
Copy link
Contributor

dzeber commented Mar 19, 2020

You still need to address the following:

  • now that you've selected the K-NN model using your evaluation scores, you should train that model on the full training set (ie. train+validation) and compute the evaluation scores for it against the test set you held out. That is your final estimate of its performance.

  • Please make sure your code runs in the environment set up in the repo. Follow steps 1-3 here and run your demo notebook. Then fix any import failures.

Refresh your conda environment (delete it and recreate it) as described in the README and make sure your notebook runs without error.

@Addi-11
Copy link
Contributor Author

Addi-11 commented Mar 21, 2020

I am sorry for the delay. I will work on it right away.

@mlopatka mlopatka merged commit e5a8d54 into mozilla:master Mar 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants