Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Outreachy applications] Importance score for dataset training samples #8

Closed
dzeber opened this issue Mar 4, 2020 · 7 comments · Fixed by #31, #96 or #139
Closed

[Outreachy applications] Importance score for dataset training samples #8

dzeber opened this issue Mar 4, 2020 · 7 comments · Fixed by #31, #96 or #139

Comments

@dzeber
Copy link
Contributor

dzeber commented Mar 4, 2020

Implement a way to assess the importance of an inidividual training datapoint to the performance of the model. This could be done by training the same model with a particular point included and then excluded from the training set, and computing the difference in performance scores on the test set.

Sidrah-Madiha added a commit to Sidrah-Madiha/PRESC that referenced this issue Mar 9, 2020
KaairaGupta added a commit to KaairaGupta/PRESC that referenced this issue Mar 10, 2020
@janvi04
Copy link
Contributor

janvi04 commented Mar 12, 2020

@dzeber I want to work on this issue. I had a doubt, I am working with winequality dataset. Can I perform this task on the same dataset?
Also, this task requires us to asses the performance of a model with a particular data point and without it. Please correct me if I am wrong.

@dzeber
Copy link
Contributor Author

dzeber commented Mar 14, 2020

@janvi04 Yes, you would implement a general method, and you can test it out on any dataset (you can reuse your work from #2).

asses the performance of a model with a particular data point and without it

The goal is to come up with a general score for the "importance" of an individual point in the training set. This would be useful in studying misclassifications, eg. to see whether they are "important" or not. You are encouraged to develop your own approach. Training the model with and without the point is one possible approach but there are others.

@janvi04
Copy link
Contributor

janvi04 commented Mar 14, 2020

Thanks @dzeber. I will work on it.

@tab1tha
Copy link
Contributor

tab1tha commented Mar 17, 2020

I was about to make a pull request to this . Please can you reopen it?

@mlopatka
Copy link
Contributor

I accidentally closed this while merging in a PR yesterday. The issue remains open; apologies.

@tab1tha
Copy link
Contributor

tab1tha commented Mar 18, 2020

@mlopatka You have mistakenly closed it again.

@mlopatka mlopatka reopened this Mar 18, 2020
@mlopatka
Copy link
Contributor

thanks

@mlopatka mlopatka reopened this Mar 20, 2020
mlopatka pushed a commit that referenced this issue Mar 25, 2020
* First attempt on vehicle data with a random forest calssifier

* minor changes

* Comparative model evaluation for vehicle dataset

* first attempt for implementing task 7

* fixes #8

* fixed all change requests

* fixed relative path, improved visualisation

* minor fix in plot

* added absolute distance for comapring with senstivity calculaation
mlopatka pushed a commit that referenced this issue Mar 25, 2020
* fixes #8

* fixes #4, attempt 1

* implemeneted all change requests

* formatted code for all helper files

* minor fix

* fixes #5

* fixing conflicts
mlopatka pushed a commit that referenced this issue Mar 25, 2020
* First attempt on vehicle data with a random forest calssifier

* minor changes

* Comparative model evaluation for vehicle dataset

* first attempt for implementing task 7

* fixes #8

* fixes #4, attempt 1

* implemeneted all change requests

* formatted code for all helper files

* minor fix

* fixes # 9

* fixed relative path

* fixing conflicts
@mlopatka mlopatka reopened this Mar 25, 2020
mlopatka pushed a commit that referenced this issue Mar 27, 2020
* adds .ipynb and .py

* updates code by formatting
mlopatka pushed a commit that referenced this issue Mar 27, 2020
* fixes #8

* fixes #4, attempt 1

* updated missclassification graph and brokedown functions

* first attempt to fix # 3

* implemeneted all change requests

* formatted code for all helper files

* minor fix

* fixed code formatting issues and  removed extra file

* fixed code formatting, added docstring to func

* fixed relative path

* fixed all changes requested

* fixed relative path in notebook

* fixing conflict with some file changes

* fixing attempt last for conflicts
mlopatka added a commit that referenced this issue Mar 27, 2020
* fixes #8

* fixes #4, attempt 1

* first attempt to fix # 3

* implemeneted all change requests

* formatted code for all helper files

* minor fix

* fixed code formatting, added docstring to func

* first attempt on 63

* fixing conflicts

Co-authored-by: mlopatka <[email protected]>
@mlopatka mlopatka reopened this Mar 27, 2020
@dzeber dzeber reopened this Mar 31, 2020
@dzeber dzeber changed the title Importance score for dataset training samples [Outreachy applications] Importance score for dataset training samples Jul 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants