Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Startup task: issue#2 attempt#1 #120

Merged
merged 2 commits into from
Mar 30, 2020
Merged

Startup task: issue#2 attempt#1 #120

merged 2 commits into from
Mar 30, 2020

Conversation

urvigodha
Copy link
Contributor

@urvigodha urvigodha commented Mar 24, 2020

#2 Startup task: Train and Test on winequality dataset using random forest classifier
Hi @dzeber and @mlopatka, this is my first attempt on the startup task.
Would really appreciate your feedback. Thanks!

Startup task: Train and Test on winequality dataset using random forest classifier
@urvigodha urvigodha changed the title issue#2 attempt#1 Startup task: issue#2 attempt#1 Mar 24, 2020
Copy link
Contributor

@mlopatka mlopatka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the contribution @urvigodha.

This is a great start on the classification task. I'm going to merge it in as we are close to the application deadline. It is good that you observed the perfect separation possible due to the quality/recommend relationship, performance excluding the quality feature is more realistic for this task.

1- I liked your inclusion of a transformation function to normalize the features to a comparable magnitude. It would have been helpful to have a bit more context/interpretation of what this means for performance on future data. How would you implement a comparable normalization for a new wine that is introduced later on in a production deployment.
2- Further discussion on your choice for a 0.3/0.7 testing/training split would be helpful. Do you expect that random sampling in this case is skewing the performance results?
3- The random forest algorithm has many parameters that can be tuned, further exploration of these is interesting in quantifying the stability of the performance reported here.

thanks again for your contribution.

@mlopatka mlopatka merged commit 8e00a35 into mozilla:master Mar 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants