Voxel Size #56

Steven-m2ai · 2022-08-05T01:40:56Z

Hello,

I am experimenting with a custom dataset on ImVoxelNet.
My dataset is ~2000 images, and i am running into extreme overfitting issues. For example the prediction on the validation image is in the same pattern as some of the training image predictions.

I was looking through what could be the case, I guess I could try playing with the lr and scheduler. however, I was also looking into voxel size and number. Do you think this could have any affect on the outcome? Any other advice? Thanks!

filaPro · 2022-08-05T07:52:26Z

Hi @Steven-m2ai ,

The overfitting issues are strange from my point of view, 2000 image should be enough. Can you share your config, train/val metrics, some info about dataset (indoor/outdoor, number of classes, ...)?

I think number of voxels and the voxel size are important for model quality, but it is probably not connected with overfitting. Smaller voxels may lead to better accuracy, but much more memory for 3d convolutions.

Are you sure your projection matrices are fine? You can somehow visualize that your 3d object centers are projected to 2d object centers here.

You can also try smaller model to prevent overfitting, e.g. ResNet50 -> ResNet18.

Steven-m2ai · 2022-08-05T17:49:01Z

Hello @filaPro,

Thank you for your response. A little about the dataset:
~2000 images take from 10 different video streams (~200 image frames per video). I only have one class to consider. The dataset is also Indoor. (If you do not mind, could I email you the extra details about the dataset?)

Okay I understand that the voxel size is important for quality, since the 3D convolution size relies on them. The projection matrices should be fine since I plotted all the ground truths of my dataset and they seem to be good.

I haven't tried playing with the learning rate, or smaller model yet. Maybe this is a good road to take.

The Config File

model = dict(
    type='ImVoxelNet',
    pretrained='torchvision://resnet50',
    backbone=dict(
        type='ResNet',
        depth=50,
        num_stages=4,
        out_indices=(0, 1, 2, 3),
        frozen_stages=1,
        norm_cfg=dict(type='BN', requires_grad=False),
        norm_eval=True,
        style='pytorch'),
    neck=dict(
        type='FPN',
        in_channels=[256, 512, 1024, 2048],
        out_channels=256,
        num_outs=4),
    neck_3d=dict(
        type='FastIndoorImVoxelNeck',
        in_channels=256,
        out_channels=128,
        n_blocks=[1, 1, 1]),
    bbox_head=dict(
        type='SunRgbdImVoxelHeadV2',
        n_classes=10,
        n_channels=128,
        n_reg_outs=7,
        n_scales=3,
        limit=27,
        centerness_topk=18),
    n_voxels=(40, 40, 16),                                          # number of voxels : CAN CHANGE [original: (40, 40, 16)]
    voxel_size=(0.16, 0.16, 0.16))                                     # size of voxels : CAN CHANGE [original: (0.16, 0.16, 0.16)]  0.05, 0.0325, 0.1375
train_cfg = dict()
test_cfg = dict(
    nms_pre=1000,
    nms_thr=.15,
    use_rotate_nms=True,
    score_thr=0.05)
img_norm_cfg = dict(mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)

dataset_type = 'SunRgbdMultiViewDataset'
data_root = 'data/dataset1/'

class_names = ('box',)

train_pipeline = [
    dict(type='LoadAnnotations3D'),
    dict(
        type='MultiViewPipeline',
        n_images=1,
        transforms=[
            dict(type='LoadImageFromFile'),
            dict(type='RandomFlip', flip_ratio=0.5),
            dict(type='Resize', img_scale=[(512, 384), (768, 576)], multiscale_mode='range', keep_ratio=True),      # data augmentation: CAN CHANGE
            dict(type='Normalize', **img_norm_cfg),
            dict(type='Pad', size_divisor=32)]),
    dict(type='SunRgbdRandomFlip'),
    dict(type='DefaultFormatBundle3D', class_names=class_names),
    dict(type='Collect3D', keys=['img', 'gt_bboxes_3d', 'gt_labels_3d'])]
test_pipeline = [
    dict(
        type='MultiViewPipeline',
        n_images=1,
        transforms=[
            dict(type='LoadImageFromFile'),
            dict(type='Resize', img_scale=(640, 480), keep_ratio=True),
            dict(type='Normalize', **img_norm_cfg),
            dict(type='Pad', size_divisor=32)]),
    dict(type='DefaultFormatBundle3D', class_names=class_names, with_label=False),
    dict(type='Collect3D', keys=['img'])]
data = dict(
    samples_per_gpu=4,
    workers_per_gpu=4,
    train=dict(
        type='RepeatDataset',
        times=2,
        dataset=dict(
            type=dataset_type,
            data_root=data_root,
            ann_file=data_root + 'sunrgbd_infos_train.pkl',
            pipeline=train_pipeline,
            classes=class_names,
            filter_empty_gt=True,
            box_type_3d='Depth')),
    val=dict(
        type=dataset_type,
        data_root=data_root,
        ann_file=data_root + 'sunrgbd_infos_val.pkl',
        pipeline=test_pipeline,
        classes=class_names,
        test_mode=True,
        box_type_3d='Depth'),
    test=dict(
        type=dataset_type,
        data_root=data_root,
        ann_file=data_root + 'sunrgbd_infos_val.pkl',
        pipeline=test_pipeline,
        classes=class_names,
        test_mode=True,
        box_type_3d='Depth'))

optimizer = dict(                                                               # optimizer: CAN CHANGE
    type='AdamW',
    lr=0.0001,
    weight_decay=0.0001,
    paramwise_cfg=dict(
        custom_keys={'backbone': dict(lr_mult=0.1, decay_mult=1.0)}))
optimizer_config = dict(grad_clip=dict(max_norm=35., norm_type=2))
lr_config = dict(policy='step', step=[8, 11])

total_epochs = 36                                                              # epochs: CAN CHANGE (original:12)

checkpoint_config = dict(interval=1, max_keep_ckpts=1)
log_config = dict(
    interval=50,
    hooks=[
        dict(type='TextLoggerHook'),
        dict(type='TensorboardLoggerHook')
    ])
evaluation = dict(interval=1)
dist_params = dict(backend='nccl')
find_unused_parameters = True  # todo: fix number of FPN outputs
log_level = 'INFO'
load_from = None
resume_from = None
workflow = [('train', 1)]

filaPro · 2022-08-05T18:38:16Z

If you do not mind, could I email you the extra details about the dataset?

I think yes, if there is something more to share.

The projection matrices should be fine since I plotted all the ground truths

Have you plotted with our visualization functions? SUN RGB-D axis order or smth may differ from your coordinate system...

I only have one class to consider

You should set n_classes to 1.

So, what are your metrics on train / val? Are you able to achieve 100% accuracy on small subset of train?

Steven-m2ai · 2022-08-05T20:55:27Z

Yes, I will send you an email with more information.

Yes, I plotted using by using the same method that you use. I simply modify your visualization function to show ground truth instead of predictions.

Yes, I have set n_classes to 1

I am able to achieve a super high mAP (0.8) on a test set that is a subset of the training, but when using a new unseen set, i get very low (0.24) mAP.

I will send you more information for extra visualizations via email.
Thank you for your time

filaPro · 2022-08-05T21:31:43Z

Yes, I have set n_classes to 1

But in your config it is 10. However it is not important.

I saw your images. It must be much better on test for 2000 images in train. I thing something is wrong with the training or inference. Can you check without SunRgbdRandomFlip? May be it is wrong for your data. Just comment this line and the line with RandomFlip.

Steven-m2ai · 2022-08-05T23:12:56Z

oh yes what i mean is I just changed it to 1 in my config when you pointed it out. Yes I believe that wouldn't solve the overfitting but thanks for catching that.

i am training without SunRgbdRandomFlip and RandomFlip augmentation. So far, the mAP seems to be stagnant at 0.24 +/- 0.05 as it trains from epoch 1 to 12. Maybe this means that those augmentations work for my case.

Do you think maybe the issue is the low diversity in the dataset? i.e. the frames do not look very different from each other, thus maybe there becomes a data imbalance?

filaPro · 2022-08-07T20:45:05Z

Hard to help you here :( Does reducing the model size help?

Steven-m2ai · 2022-08-09T21:22:49Z

Hello,

Yes i guess this is a hard problem to debug. I will keep thinking about this.

Conceptually, is it true that for the indoor head, there is no concept of "anchor boxes"? Rather each voxel acts as a center point, in which we use the ground truth min/max in each dimension to get the delta(x,y,z)? Then these deltas are the targets the model tries to predict?

Perhaps I can open a separate issue for conceptual questions, i am very interested in your work. i would love to really understand the pipeline implemented here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Voxel Size #56

Voxel Size #56

Steven-m2ai commented Aug 5, 2022

filaPro commented Aug 5, 2022

Steven-m2ai commented Aug 5, 2022 •

edited

Loading

filaPro commented Aug 5, 2022 •

edited

Loading

Steven-m2ai commented Aug 5, 2022

filaPro commented Aug 5, 2022

Steven-m2ai commented Aug 5, 2022 •

edited

Loading

filaPro commented Aug 7, 2022

Steven-m2ai commented Aug 9, 2022 •

edited

Loading

Voxel Size #56

Voxel Size #56

Comments

Steven-m2ai commented Aug 5, 2022

filaPro commented Aug 5, 2022

Steven-m2ai commented Aug 5, 2022 • edited Loading

filaPro commented Aug 5, 2022 • edited Loading

Steven-m2ai commented Aug 5, 2022

filaPro commented Aug 5, 2022

Steven-m2ai commented Aug 5, 2022 • edited Loading

filaPro commented Aug 7, 2022

Steven-m2ai commented Aug 9, 2022 • edited Loading

Steven-m2ai commented Aug 5, 2022 •

edited

Loading

filaPro commented Aug 5, 2022 •

edited

Loading

Steven-m2ai commented Aug 5, 2022 •

edited

Loading

Steven-m2ai commented Aug 9, 2022 •

edited

Loading