Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed #2, train and test a classification model on vehicles dataset #30

Merged
merged 9 commits into from
Mar 20, 2020

Conversation

Bolaji61
Copy link
Contributor

@Bolaji61 Bolaji61 commented Mar 9, 2020

No description provided.

Copy link
Contributor

@dzeber dzeber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great start on this task. I am happy to see the diversity of models you are looking into.

I have a couple of recommendations:

  • Please remove .DS_Store from your PR. That is supposed to be excluded automatically by the .gitignore, but I just noticed it's not included in there - I will fix the gitignore on the master branch.
  • The code you use to test out the models should be moved to a separate module and called from there (and that way you can also reuse the same code as a single function).
  • I saw you included comments in the notebook explaining what your code does. That is good - however, I'd also like to see more text/markdown cells explaining your thought process: why you decided to do certain analyses, and your interpretations of the results. This is a great help to someone reading your notebook for the first time. For example, the correlation plot shows some fascinating patterns. It would be great to include a few sentences discussing your conclusions from seeing this - what should we pay attention to and how did it influence the rest of your work.
  • The accuracy of 24% for the SVM looks suspect - I'd double-check to make sure it is working correctly. That's no better than guessing class labels at random.

dev/Omobolaji/vehicles.ipynb Outdated Show resolved Hide resolved
@Bolaji61
Copy link
Contributor Author

Train and Test a Classification Model on vehicles.csv dataset.
I've been able to create a separate modules file containing the models and functions used.
Next is to study the performance evaluation metrics and give detailed explanations of each line of code.
@dzeber , I hope my pace is not too slow, I'm quite new to open source. Thanks for your earlier review.

@Bolaji61 Bolaji61 changed the title WIP: First attempt on "Train and Test a Classification Model on vehicles.csv dataset" Train and Test a Classification Model on vehicles.csv dataset Mar 17, 2020
@Bolaji61
Copy link
Contributor Author

I have properly documented my thought process on my choice of model and test_size fraction. Next, is to tune the parameters of the models chosen in order to achieve a higher accuracy.

@Bolaji61 Bolaji61 changed the title Train and Test a Classification Model on vehicles.csv dataset Fixed #2, train and test model on vehicles dataset Mar 20, 2020
@Bolaji61 Bolaji61 changed the title Fixed #2, train and test model on vehicles dataset Fixed #2, train and test a classification model on vehicles dataset Mar 20, 2020
@dzeber
Copy link
Contributor

dzeber commented Mar 20, 2020

Thanks for these updates - this has developed into a nicely presented, well-documented notebook. I'm merging this now, as it satisfies the requirements for #2. If you wish to continue working on it, I recommend checking out the other issues in the repo.

A couple of general comments (don't need to fix at present):

  • The _model() functions in your modules.py have a lot of repeated code. An improvement would be to combine into a single function and supply the parts that change as params.
  • For the presentation of your very nice analysis of test split sizes, rather than hard-coding the results to present in the table, why not run the evaluations in the notebook, capture them dynamically and present the results directly.

@dzeber dzeber merged commit 343efaf into mozilla:master Mar 20, 2020
@Bolaji61
Copy link
Contributor Author

Yaay, this is great news. Thanks a lot, its my first ever merged PR in an Open Source project. I'll work on other issues in the project and fix the issues you commented on later as suggested.
I do appreciate your reviews a lot, I hope to help some others this same way when I'm advanced. Do have a great weekend.

@Bolaji61
Copy link
Contributor Author

Thanks for these updates - this has developed into a nicely presented, well-documented notebook. I'm merging this now, as it satisfies the requirements for #2. If you wish to continue working on it, I recommend checking out the other issues in the repo.

A couple of general comments (don't need to fix at present):

* The `_model()` functions in your modules.py have a lot of repeated code. An improvement would be to combine into a single function and supply the parts that change as params.

* For the presentation of your very nice analysis of test split sizes, rather than hard-coding the results to present in the table, why not run the evaluations in the notebook, capture them dynamically and present the results directly.

The first suggestion has been fixed accordingly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants