-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First attempt on vehicle dataset with a random forest classifier #13
Conversation
Typos in above comment, correct comment is: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor changes
it was my first attempt for fixes #2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a nice, modular PR addressing the issue very succinctly.
If you would like, you can investigate the misclassified points in more depth, or submit that as a new PR.
Feel free to merge in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, very nice PR which does a good job splitting the code between the module and the notebook. Requesting changes to fix the test set size discrepancy.
# print('The target variable: ') | ||
# print(y[:5]) | ||
# Split dataset into training set and test set | ||
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2 ,random_state = 21) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 0.2
test proportion does not match the comment in the notebook. In general, how did decide on this test set size? I would be good to include a comment about this in the notebook.
"metadata": {}, | ||
"source": [ | ||
"### Conclusion:\n", | ||
"Overall I got 78 percent accuaracy which doesn't seem good, next step for today will be to first try to improve this model then I will experiment with other models to see comparative performance of other models on this dataset." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the confusion matrix and evaluation metric table, can you expand on your interpretation of this overall accuracy? I notice that the per-class accuracy scores are quite different across the classes.
This is the first attempt to classify vehicle.csv dataset with random forest classifier.
Nest step is to try to improve this classifer.
Then move to experiment with other classifiers, to see which archives the best accuracy.