Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First attempt on vehicle dataset with a random forest classifier #13

Merged
merged 9 commits into from
Mar 16, 2020

Conversation

Sidrah-Madiha
Copy link
Contributor

This is the first attempt to classify vehicle.csv dataset with random forest classifier.
Nest step is to try to improve this classifer.
Then move to experiment with other classifiers, to see which archives the best accuracy.

@Sidrah-Madiha
Copy link
Contributor Author

Typos in above comment, correct comment is:
This is the first attempt to classify vehicle.csv dataset with random forest classifier.
Next step is to try to improve this classifer.
Then move to experiment with other classifiers, to see which achieves the best accuracy.

Copy link
Contributor Author

@Sidrah-Madiha Sidrah-Madiha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor changes

@Sidrah-Madiha
Copy link
Contributor Author

it was my first attempt for fixes #2

Copy link
Contributor

@mlopatka mlopatka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nice, modular PR addressing the issue very succinctly.
If you would like, you can investigate the misclassified points in more depth, or submit that as a new PR.

Feel free to merge in.

Copy link
Contributor

@dzeber dzeber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, very nice PR which does a good job splitting the code between the module and the notebook. Requesting changes to fix the test set size discrepancy.

# print('The target variable: ')
# print(y[:5])
# Split dataset into training set and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2 ,random_state = 21)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 0.2 test proportion does not match the comment in the notebook. In general, how did decide on this test set size? I would be good to include a comment about this in the notebook.

"metadata": {},
"source": [
"### Conclusion:\n",
"Overall I got 78 percent accuaracy which doesn't seem good, next step for today will be to first try to improve this model then I will experiment with other models to see comparative performance of other models on this dataset."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the confusion matrix and evaluation metric table, can you expand on your interpretation of this overall accuracy? I notice that the per-class accuracy scores are quite different across the classes.

@mlopatka mlopatka merged commit bd53913 into mozilla:master Mar 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants