You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I would like to ask how Data Augmentation is performed in the case of the Baseline RCNN that only uses groundtruth ROIs as Primary Regions.
More specifically within the paper, you mention that:
Rather than limiting training to the ground-truth person
locations, we use all regions that overlap more than 0.5 with
a ground-truth box. This condition serves as a form of data
augmentation. For every primary region, we randomly select
N regions from the set of candidate secondary regions.
N is a function of the GPU memory limit (we use a Nvidia
K40 GPU) and the batch size.
We fine-tune our network starting with a model trained
on ImageNet-1K for the image classification task. We tie
the weights of the fully connected primary and secondary
layers (fc6, fc7), but not for the final scoring models. We set
the learning rate to 0.0001, the batch size to 30 and consider
2 images per batch. We pick N = 10 and train for 10K
iterations. Larger learning rates prevented fine-tuning from
converging.
Thus for the case of the simple RCNN baseline that uses only primary regions and no secondary regions, this means that each batch contains 2 images and 30 ROIs for the ROI-Pooling layer.
Assuming the aforementioned assumption holds, in case the two images contain only 1 primary region each, with what do you fill the rest of the batch (as there should be 28 positions left empty) ?
Since the number of primary regions is not fixed per image, do you enforce the number of data augmentation samples to be balanced per class somehow?
Would it be possible to share the results you achieve without using data augmentation?
The text was updated successfully, but these errors were encountered:
I would like to ask how Data Augmentation is performed in the case of the Baseline RCNN that only uses groundtruth ROIs as Primary Regions.
More specifically within the paper, you mention that:
Thus for the case of the simple RCNN baseline that uses only primary regions and no secondary regions, this means that each batch contains 2 images and 30 ROIs for the ROI-Pooling layer.
Assuming the aforementioned assumption holds, in case the two images contain only 1 primary region each, with what do you fill the rest of the batch (as there should be 28 positions left empty) ?
Since the number of primary regions is not fixed per image, do you enforce the number of data augmentation samples to be balanced per class somehow?
Would it be possible to share the results you achieve without using data augmentation?
The text was updated successfully, but these errors were encountered: