Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to prepare the data for #22

AnaRhisT94 opened this issue Jan 4, 2020 · 15 comments

How to prepare the data for #22

AnaRhisT94 opened this issue Jan 4, 2020 · 15 comments


Copy link

AnaRhisT94 commented Jan 4, 2020

Hi, thanks for this amazing repo. @JiaxiongQ

I'm trying to get and to work in order to train the first NN.
This is what I understood so far that I need in order to train:

  1. Download data_depth_velodyne which is the sparse Lidar dataset.
  2. Download data_depth_annotated which is the ground-truth (Dense) Lidar dataset.
  3. Use the second repo. in order to generate from the ground-truth Dense Lidar dataset the ground-truth normals.
  4. Download ALL the RGB Kitti images from all the categories ( City | Residential | Road | Campus | Person | Calibration ), Is there a link to download all at once instead of downloading one by one?

Question 1: Do I need to extract all the RGB Images into the folders one by one into data_depth_velodyne/train/..*sync/ - I need to add image_02 and image_03 folders to each of the sync folders? (This is implied from your code)

Question 2: Is there a way to download all the RGB Images in one-shot instead of clicking one by one and extracting them one by one to all the folders?

In the function dataloader(filepath) returns 3 variables: left_train,normalS_train,normal_gts which are:
a. left_train - the RGB Kitty Image folders 'data_depth_velodyne/train/..*sync/image02 & 03/data.
b. normalS_train - - the Sparse lidar folders 'data_depth_velodyne/train/..*sync/proj_depth/velodyne_raw/image02 & 03/.
c. normal_gts is the folder which has all the normals I generated from dense gt: data_depth_annotated/*_sync/proj_depth/groundtruth/image_02 & image_03 -> gt/out/train/*_sync/image_02 & image_03 or should it be all in gt/out/train/*_sync/? Because in the code there isn't anything about concatinating the image_02 & image_03.

Question 3: please look at c., I asked there about the ground-truth normals.

Question 4: When and where the synthetic data is used? Do we use it also in Do we use it in all the 3 NNs?

Question 5: How many epochs is recommended to train on?
Other than that, thank you. It took me so many hours just to get to the point I understand how to get the data ready (and still trying), I'll definitely add a guide on how to prepare the data to train after this post, so others can save many hours to understand the process.

Copy link

1: Yes, we used images from left camera and right camera.
2. Sorry, I don't know where this link exists.
3. I think it doesn't matter, you could just confirm that all the file names from different folders can be matched.
4. We just trained our surface normal model on the synthetic data firstly and then finetuned it on the KITTI to get better surface normal.
5. All 3 NNs are trained on 15 epochs, but the last one used lower learning rate.
Thanks for your good questions! I think they also can help others.

Copy link

AnaRhisT94 commented Jan 5, 2020

I see, thank you for the answers! @JiaxiongQ
I'll update later with my progress and write a full step-by-step on how to do it for people who are confused in the beginning like me.

Copy link

AnaRhisT94 commented Jan 5, 2020

For training with Synthetic data:
I use RGBRight and RGBLeft folders,
Sparse Lidar dataset in the folder lidar
And finally, the ground-truth normals from dense depth lidar I take from the folder Normal_m right?
Question 1: Is that true that the folders above are used for training?

Question 2: If yes, there's only sparse lidar for RGBLeft and Normals_m for RGBLeft, why do we use RGBRight?

Copy link

Yes, because we only generated surface normal from the depth of left camera.

Copy link

Yes, because we only generated surface normal from the depth of left camera.

Thank you!

Copy link

AnaRhisT94 commented Jan 6, 2020

Hi @JiaxiongQ ,
After I prepared the 3 folders: RGBLeft, lidar and Normals_m from /Town11/SEQ (to test that the training work), I'm getting the following error:

  File "/home/unknown/depth_est/DeepLiDAR/submodels/", line 155, in forward
    inputS =,mask),1)
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 1. Got 512 and 256 in dimension 2 at /pytorch/aten/src/THC/generic/

(4, 1, 256, 512)
(4, 256, 512, 1)

I probably need to change the 1 in mask to be after 4 and it will be fixed, I'll try that out and update. But why doesn't it work out of the box? I didn't see any posts about this when training the first NN, did I do something wrong in the process?

EDIT: When changing the shape with np.transpose so that mask will have (4,1,256,512), it gives me a new errors, also other errors happen if I change sprarse instead.. any ideas how to solve this? I'm out of ideas, also didn't see anyone here saying they got this error when training. I double and triple checked my paths and the images and len of images (495 images) is for every of the 3 folders, so the data itself should be fine.

Copy link

JiaxiongQ commented Jan 7, 2020

In our 'dataloader/', there is:
Screenshot from 2020-01-07 09:22:30
So you should not need to do 'np.transpose'.
And sparse did the same operation, their shapes should be matched.

Copy link

AnaRhisT94 commented Jan 7, 2020

I see, but that still doesn't work, I attached an image of the variables before exiting __getitem__ function in
Same error: (Haven't changed anything in the code except loading the images in

RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 1. Got 512 and 256 in dimension 2 at /pytorch/aten/src/THC/generic/

Also, searching for a solution for this problem, they suggested to make batch_size = 1, which didn't help in:

TrainImgLoader =
        DA.myImageFloder(all_left_img,all_normal,all_gts ,True, args.model),
        batch_size = 1, shuffle=True, num_workers=1, drop_last=True)

Also in I printed the shapes before loss=train(...)

        for batch_idx, (imgL_crop,sparse_n,mask,mask1,data_in1) in enumerate(TrainImgLoader):
            start_time = time.time()


(1, 3, 256, 512)
(1, 1, 256, 512)
(1, 256, 512, 3)
(1, 256, 512, 3)
(1, 256, 512, 3)

I really want to get this to work and I've no idea why this doesn't..

Copy link

Sorry, I don't know why this would happen, but you can use ‘torch.permute()’ to change the dimension and make all the dimension of inputs like (b, c, 256, 512).

Copy link

AnaRhisT94 commented Jan 8, 2020

Sorry, I don't know why this would happen, but you can use ‘torch.permute()’ to change the dimension and make all the dimension of inputs like (b, c, 256, 512).

Hi @JiaxiongQ ,
I'll try with torch.permute()’ soon, other than that,
I'm out of ideas, any chances you could help me out with this any further?
Here's the code to prepare the 3 folders from Town11/SEQ0:

def dataloader_synthetic(filepath):
    imagesl = []
    normalS = []
    normal_gts = []
    temp = filepath

    filepathl = temp + 'Town11/SEQ0' #RGB dataset folder, Left and Right
    filepathgt = filepathl + '/Normal_m'
    #seqs = [seq for seq in os.listdir(filepathl) if seq.find('sync') > -1]
    left_fold = '/RGBLeft'
    right_fold = '/RGBright'
    lidar_foldl ='/lidar'
    #lidar_foldr = '/proj_depth/velodyne_raw/image_03'

    #for seq in seqs:
    left_path = filepathl + left_fold
    right_path= filepathl + right_fold
    lc= [os.path.join(left_path, img) for img in os.listdir(left_path)]

    rc= [os.path.join(right_path, img) for img in os.listdir(right_path)]
    imagesl = np.append(imagesl, lc)
    #imagesl = np.append(imagesl, rc)

    gt_path = filepathgt
    lids2l = filepathl
    lidar2l = [os.path.join(lids2l + lidar_foldl,lid) for lid in os.listdir(lids2l + lidar_foldl)]
    normalS = np.append(normalS, lidar2l)
    #lids2r = os.path.join(filepathl, seq) + lidar_foldr
    #lidar2r = [os.path.join(lids2r, lid) for lid in os.listdir(temp)]
    #normalS = np.append(normalS, lidar2r)

    gt_imgs = [os.path.join(gt_path, norm) for norm in os.listdir(gt_path)]
    normal_gts= np.append(normal_gts, gt_imgs)
    #normal_gts= np.append(normal_gts, gt_imgs)

    left_train = imagesl
    normalS_train = normalS
    return left_train,normalS_train,normal_gts

Didn't change anything else.

After using torch.permute(), it doesn't shoot this error now, but there's a new error in the function: nomal_loss: (there's a torch inside that tuple), so I guess it needs to be converted to torch, or not permuted at all, I'm not sure why I'm getting all these errors and no one else posted any of these errors here.

    pred_n = pred.permute(0,2,3,1)
AttributeError: 'tuple' object has no attribute 'permute'

Full code of that function:

def nomal_loss(pred, targetN,mask1):
    valid_mask = (mask1 > 0.0).detach()
    pred_n = pred.permute(0,2,3,1)
    pred_n = pred_n[valid_mask]
    target_n = targetN[valid_mask]

    pred_n = pred_n.contiguous().view(-1,3)
    pred_n = F.normalize(pred_n)
    target_n = target_n.contiguous().view(-1, 3)

    loss_function = nn.CosineEmbeddingLoss()
    loss = loss_function(pred_n, target_n, Variable(torch.Tensor(pred_n.size(0)).cuda().fill_(1.0)))
    return loss

Now changed to
pred_n = pred[0]

and new error:

    pred_n = pred_n[valid_mask]
IndexError: The shape of the mask [1, 3, 256, 512] at index 1does not match the shape of the indexed tensor [1, 2, 256, 512] at index 1

Copy link

This code is mainly for KITTI,you should modify it and just insure the file names can be matched.

Copy link

AnaRhisT94 commented Jan 8, 2020

This code is mainly for KITTI,you should modify it and just insure the file names can be matched.

Hi @JiaxiongQ ,
Yes, I did, I modified it to work with the 3 folders with the synthetic, and it still doesn't work. (you can see most of it is commented out, and i renamed the function name)

Copy link

valgur commented Mar 21, 2020

Regarding Q2, the raw KITTI data overview page provides a script to download and extract all of the raw data zip files. A slightly modified version with cleaner status info output can be found here:

Copy link

graycrown commented Apr 9, 2020

Hi @JiaxiongQ ,
After I prepared the 3 folders: RGBLeft, lidar and Normals_m from /Town11/SEQ (to test that the training work), I'm getting the following error:

  File "/home/unknown/depth_est/DeepLiDAR/submodels/", line 155, in forward
    inputS =,mask),1)
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 1. Got 512 and 256 in dimension 2 at /pytorch/aten/src/THC/generic/

(4, 1, 256, 512)
(4, 256, 512, 1)

I probably need to change the 1 in mask to be after 4 and it will be fixed, I'll try that out and update. But why doesn't it work out of the box? I didn't see any posts about this when training the first NN, did I do something wrong in the process?

EDIT: When changing the shape with np.transpose so that mask will have (4,1,256,512), it gives me a new errors, also other errors happen if I change sprarse instead.. any ideas how to solve this? I'm out of ideas, also didn't see anyone here saying they got this error when training. I double and triple checked my paths and the images and len of images (495 images) is for every of the 3 folders, so the data itself should be fine.

I meet the same problem about dimension mismatch, I have change a lot to fit it on sythtic dataset, You could debug the program step by step to change the dimension order to fix it. the recommend dimension order of PyTorch is (B, C, H, W).
I have changed the order like this :
inputl = inputl.cuda()# .permute(0,2,3,1)
sparse = sparse.cuda()# permute(0,2,3,1)
gt1 = gt1.cuda().permute(0,3,1,2)
mask1 = mask1.cuda().permute(0,3,1,2)
mask = mask.cuda().permute(0,3,1,2)

May it help you

Copy link

JiaxiongQ commented Apr 9, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
None yet

No branches or pull requests

4 participants