Same amount of class images in ImageFolder dataset #260

kkoripl · 2020-10-30T14:32:16Z

kkoripl
Oct 30, 2020

Question

Hi, I've just prepared my image dataset for training, but realized one particular issue of it - one class got like 80k examples, when other has only 1k. Is there possibility of taking only 1k examples of bigger class at random in ImageFolder dataset? Or loading all, but then for training using only fixed number of examples from each class?

It's like in python splitting dataset with classes ratio fixed, but on the moment of loading dataset, as I've got it splitted for training and testing.

zachgk · 2020-10-30T18:14:41Z

zachgk
Oct 30, 2020
Maintainer

If you want to use only part of the ImageFolder, you will probably need to create a new Dataset class MyImageFolder extends AbstractImageFolder. Just copy the class and modify the prepare method.

Instead, consider using the loss weights. This is the strategy that I have usually seen. You use all of your data, but multiply the loss values for your small class by some constant value (between 1 and 80) to make them more important. This should counterbalance against the data quantity difference to avoid having skewed predictions.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Same amount of class images in ImageFolder dataset #260

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Same amount of class images in ImageFolder dataset #260

kkoripl Oct 30, 2020

Question

Replies: 1 comment

zachgk Oct 30, 2020 Maintainer

kkoripl
Oct 30, 2020

zachgk
Oct 30, 2020
Maintainer