[Outreachy applications] Importance score for dataset training samples #8

dzeber · 2020-03-04T22:18:47Z

Implement a way to assess the importance of an inidividual training datapoint to the performance of the model. This could be done by training the same model with a particular point included and then excluded from the training set, and computing the difference in performance scores on the test set.

janvi04 · 2020-03-12T13:59:18Z

@dzeber I want to work on this issue. I had a doubt, I am working with winequality dataset. Can I perform this task on the same dataset?
Also, this task requires us to asses the performance of a model with a particular data point and without it. Please correct me if I am wrong.

dzeber · 2020-03-14T01:13:49Z

@janvi04 Yes, you would implement a general method, and you can test it out on any dataset (you can reuse your work from #2).

asses the performance of a model with a particular data point and without it

The goal is to come up with a general score for the "importance" of an individual point in the training set. This would be useful in studying misclassifications, eg. to see whether they are "important" or not. You are encouraged to develop your own approach. Training the model with and without the point is one possible approach but there are others.

janvi04 · 2020-03-14T06:09:23Z

Thanks @dzeber. I will work on it.

tab1tha · 2020-03-17T04:16:01Z

I was about to make a pull request to this . Please can you reopen it?

mlopatka · 2020-03-17T14:37:22Z

I accidentally closed this while merging in a PR yesterday. The issue remains open; apologies.

tab1tha · 2020-03-18T16:57:18Z

@mlopatka You have mistakenly closed it again.

mlopatka · 2020-03-18T17:06:09Z

thanks

* First attempt on vehicle data with a random forest calssifier * minor changes * Comparative model evaluation for vehicle dataset * first attempt for implementing task 7 * fixes #8 * fixed all change requests * fixed relative path, improved visualisation * minor fix in plot * added absolute distance for comapring with senstivity calculaation

* fixes #8 * fixes #4, attempt 1 * implemeneted all change requests * formatted code for all helper files * minor fix * fixes #5 * fixing conflicts

* First attempt on vehicle data with a random forest calssifier * minor changes * Comparative model evaluation for vehicle dataset * first attempt for implementing task 7 * fixes #8 * fixes #4, attempt 1 * implemeneted all change requests * formatted code for all helper files * minor fix * fixes # 9 * fixed relative path * fixing conflicts

* adds .ipynb and .py * updates code by formatting

* fixes #8 * fixes #4, attempt 1 * updated missclassification graph and brokedown functions * first attempt to fix # 3 * implemeneted all change requests * formatted code for all helper files * minor fix * fixed code formatting issues and removed extra file * fixed code formatting, added docstring to func * fixed relative path * fixed all changes requested * fixed relative path in notebook * fixing conflict with some file changes * fixing attempt last for conflicts

* fixes #8 * fixes #4, attempt 1 * first attempt to fix # 3 * implemeneted all change requests * formatted code for all helper files * minor fix * fixed code formatting, added docstring to func * first attempt on 63 * fixing conflicts Co-authored-by: mlopatka <[email protected]>

Sidrah-Madiha added a commit to Sidrah-Madiha/PRESC that referenced this issue Mar 9, 2020

fixes mozilla#8

4d94959

Sidrah-Madiha mentioned this issue Mar 9, 2020

Data sensitivity feature, fixes # 8 #31

Merged

KaairaGupta added a commit to KaairaGupta/PRESC that referenced this issue Mar 10, 2020

added example for importance_score, fixes mozilla#8

0c8b27f

mlopatka closed this as completed in bd53913 Mar 16, 2020

tab1tha mentioned this issue Mar 17, 2020

Importance score of a data point #75

Merged

mlopatka reopened this Mar 17, 2020

mlopatka closed this as completed in dc30f9c Mar 18, 2020

mlopatka reopened this Mar 18, 2020

mlopatka closed this as completed in 3a27b6a Mar 20, 2020

mlopatka reopened this Mar 20, 2020

iamarchisha mentioned this issue Mar 21, 2020

[ Fixes: #8 ] Analyzing Importance of Data Points #96

Merged

mlopatka closed this as completed in #31 Mar 25, 2020

mlopatka pushed a commit that referenced this issue Mar 25, 2020

Calibration plot [fixes # 5] (#110)

a186ae6

* fixes #8 * fixes #4, attempt 1 * implemeneted all change requests * formatted code for all helper files * minor fix * fixes #5 * fixing conflicts

mlopatka reopened this Mar 25, 2020

mlopatka closed this as completed in #96 Mar 27, 2020

mlopatka pushed a commit that referenced this issue Mar 27, 2020

[ Fixes: #8 ] Analyzing Importance of Data Points (#96)

6eae983

* adds .ipynb and .py * updates code by formatting

mlopatka reopened this Mar 27, 2020

mlopatka closed this as completed in 389d070 Mar 30, 2020

dzeber reopened this Mar 31, 2020

Addi-11 mentioned this issue Mar 31, 2020

Importance of Training Sample points. #139

Merged

dzekem mentioned this issue Mar 31, 2020

Importance score for dataset training on defaults dataset #106

Merged

namrathagopalabhatla mentioned this issue Apr 1, 2020

WIP: Importance score for training samples #143

Merged

mlopatka closed this as completed in 191ee4e Apr 3, 2020

dzeber changed the title ~~Importance score for dataset training samples~~ [Outreachy applications] Importance score for dataset training samples Jul 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Outreachy applications] Importance score for dataset training samples #8

[Outreachy applications] Importance score for dataset training samples #8

dzeber commented Mar 4, 2020

janvi04 commented Mar 12, 2020 •

edited

Loading

dzeber commented Mar 14, 2020

janvi04 commented Mar 14, 2020

tab1tha commented Mar 17, 2020

mlopatka commented Mar 17, 2020

tab1tha commented Mar 18, 2020

mlopatka commented Mar 18, 2020

[Outreachy applications] Importance score for dataset training samples #8

[Outreachy applications] Importance score for dataset training samples #8

Comments

dzeber commented Mar 4, 2020

janvi04 commented Mar 12, 2020 • edited Loading

dzeber commented Mar 14, 2020

janvi04 commented Mar 14, 2020

tab1tha commented Mar 17, 2020

mlopatka commented Mar 17, 2020

tab1tha commented Mar 18, 2020

mlopatka commented Mar 18, 2020

janvi04 commented Mar 12, 2020 •

edited

Loading