-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance Comparision of Trained models on eeg.csv #16
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR does a nice job of defining the space of models to explore and exploring them methodically. In fact it looks like you have more models prepared than you included in the notebook. It would be great to include them! I'm curious to see how they performed. I like the classification report you prepared for each one.
That said, I would like to request a couple of updates:
- Please make sure your code runs in the conda environment given in the repo. It looks like a few utilities you have used, eg.
plot_confusion_matrix
, are not available in the version of scikit-learn we are using. Please work around this. - You appear to have used docstrings (
"""This is a docstring"""
) in the place of comments. Docstrings should only be used on the first line inside modules or functions and are treated as special objects by the interpreter. Please use comments (# This is a comment
) elsewhere in the modules and Markdown text cells inside the notebook. - To compare model performance, you should be using a validation set (or cross-validation) separate from the test set.
- Please add a comment indicating why you decided on a 60/40 split.
Finally, please break off your work for #3 into a separate PR and we will review that part there.
ThankYou for your review I will work on the required changes. |
I have changed and corrected my code as stated above. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your notebook looks much better after these updates! It's really interesting to compare all the classifiers you tried, and you did a nice job visualizing the scores. One final thing: now that you've selected the K-NN model using your evaluation scores, you should train that model on the full training set (ie. train+validation) and compute the evaluation scores for it against the test set you held out. That is your final estimate of its performance.
There are a couple of outstanding administrative changes after which we'll be ready to merge.
- Please remove the issue 3 work from this PR (it will be considered separately in fixes #3 #45). You should remove
data_split_examine.py
andissue3_demo.ipynb
from the git index on this branch. - Please make sure your code runs in the environment set up in the repo. Follow steps 1-3 here and run your demo notebook. Then fix any import failures. Currently
plot_precision_recall_curve
andplot_confusion_matrix
are not available in our environment, but you can easily work around them by just printing theconfusion_matrix
and plotting theprecision_recall_curve
directly. I would rather not update scikit-learn in the environment at the moment as it is a core package and may break other people's work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Raw pyhton (non notebook) files should adhere to python black formatting guidelines Please run these through black before merging into master.
I will make the required changes. |
You still need to address the following:
Refresh your conda environment (delete it and recreate it) as described in the README and make sure your notebook runs without error. |
I am sorry for the delay. I will work on it right away. |
Further various evaluation metrics to be added and plotted for various data splits for
issue 4
.