-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Outreachy applications] Importance score for dataset training samples #8
Comments
@dzeber I want to work on this issue. I had a doubt, I am working with winequality dataset. Can I perform this task on the same dataset? |
@janvi04 Yes, you would implement a general method, and you can test it out on any dataset (you can reuse your work from #2).
The goal is to come up with a general score for the "importance" of an individual point in the training set. This would be useful in studying misclassifications, eg. to see whether they are "important" or not. You are encouraged to develop your own approach. Training the model with and without the point is one possible approach but there are others. |
Thanks @dzeber. I will work on it. |
I was about to make a pull request to this . Please can you reopen it? |
I accidentally closed this while merging in a PR yesterday. The issue remains open; apologies. |
@mlopatka You have mistakenly closed it again. |
thanks |
* First attempt on vehicle data with a random forest calssifier * minor changes * Comparative model evaluation for vehicle dataset * first attempt for implementing task 7 * fixes #8 * fixed all change requests * fixed relative path, improved visualisation * minor fix in plot * added absolute distance for comapring with senstivity calculaation
* First attempt on vehicle data with a random forest calssifier * minor changes * Comparative model evaluation for vehicle dataset * first attempt for implementing task 7 * fixes #8 * fixes #4, attempt 1 * implemeneted all change requests * formatted code for all helper files * minor fix * fixes # 9 * fixed relative path * fixing conflicts
* fixes #8 * fixes #4, attempt 1 * updated missclassification graph and brokedown functions * first attempt to fix # 3 * implemeneted all change requests * formatted code for all helper files * minor fix * fixed code formatting issues and removed extra file * fixed code formatting, added docstring to func * fixed relative path * fixed all changes requested * fixed relative path in notebook * fixing conflict with some file changes * fixing attempt last for conflicts
* fixes #8 * fixes #4, attempt 1 * first attempt to fix # 3 * implemeneted all change requests * formatted code for all helper files * minor fix * fixed code formatting, added docstring to func * first attempt on 63 * fixing conflicts Co-authored-by: mlopatka <[email protected]>
Implement a way to assess the importance of an inidividual training datapoint to the performance of the model. This could be done by training the same model with a particular point included and then excluded from the training set, and computing the difference in performance scores on the test set.
The text was updated successfully, but these errors were encountered: