-
Notifications
You must be signed in to change notification settings - Fork 227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
difference of the predicted translation and ground truth vectors #155
Comments
Hi, not sure what you want exactly. If you want the trajectory from the pose vector, you can see how it's done in test_pose : https://github.com/ClementPinard/SfmLearner-Pytorch/blob/master/test_pose.py#L78 Basically, everything is given with respect to the middle frame, so you need to put back the reference to the first frame. Once it's done, if you want the trajectory for a longer sequence than just 5 frames, you will need to compose the 4x4 matrices so that the first ever frame is the reference (the identity matrix) and all the other matrices are given with respect to the first one. you translation vectors will be the first 3 rows of the last column. |
Hello, Apologies for not stating my question clearly. My goal is to include in the loss for training, the difference between the predicted translation vector and ground truth translation vector to see if we can we deal with depth ambiguity with that approach. I have modified the class SequenceFolder in sequence_folders.py to return the ground truth pose as well for each sample. In the train function in train.py I have added this code to calculate the loss for the translation vector, with respect to the ground truth
Unfortunately, I am not sure if the predicted translation vector is calculated with respect to the same frame that its the ground truth translation vector. |
Also the code from sequence_folders.py
|
Ok, thanks for clarifying By looking at the code I see that you are computing groundtruth poses with respect to the first frame and then computing predicted poses with respect to the target frame. So think your problem is here, you might want to multiply your inverse matrices by the inverse of the first one so that the first matrix is identity and the others are actual poses On a more general note, you might want to do the opposite of what you are doing. Instead of computing poses relative to the first of the sequence with a 4D matrix, maybe you can compute the equivalent 6D vectors, and have it with respect to the target frame (usually in the middle) instead of the first one, so that it already match the order outputted by the pose network. I actually did some of this work with my own DepthNet network where I tested pose supervision If you want to solve the scale problem on Kitti, you might want to have a look at packent-SFM from toyota where they supervise the velocity loss (and thus the depth scale as well) https://github.com/TRI-ML/packnet-sfm/blob/master/packnet_sfm/losses/velocity_loss.py |
Thank you very much Clement for you usefull feedback and for the guidance to the paper from Toyota, I was not aware of it! I assume that when you say the first frame, you are reffering to the first frame of the sequence (by default 3) and not the first frame of the scene if I understand correctly the code? Checking out your code, the compensate_pose transforms a transformation matrix with respect to another transformation matrix. Therefore, can I use it in the train.py as below (which is the modified version according to your comments of the code that I have already posted) ? reordered_output_poses = torch.cat([pose[:, :poses.shape[1]//2],
torch.zeros(b, 1, 6).to(pose),
pose[:, poses.shape[1]//2:]], dim=1)
# pose_vec2mat only takes B, 6 tensors, so we simulate a batch dimension of B * seq_length
unravelled_poses = reordered_output_poses.reshape(-1, 6)
unravelled_matrices = pose_vec2mat(unravelled_poses, rotation_mode=args.rotation_mode)
inv_transform_matrices = unravelled_matrices.reshape(b, -1, 3, 4)
rot_matrices = inv_transform_matrices[..., :3].transpose(-2, -1)
tr_vectors = -rot_matrices @ inv_transform_matrices[..., -1:]
new_gt_transf_matrix = compensate_pose(inv(gt_transf_matrix), inv(tgt_img)) # Here is the only modification
loss_4 = torch.sum(new_gt_transf_matrix[:, :, :, 3] - tr_vectors[:, :, :, 0])
loss = w1*loss_1 + w2*loss_2 + w3*loss_3 + w4 * loss_4 I am really sorry for the many and basic questions. I am very new to the field |
Yes, I think that could work that way. Now the realm of transformation matrix is a dark place where you spend time and time trying to figure out what order you should multiply the matrices and if you need to inverse or not, so I'd advise you to design some basic tests to make sure that it's working properly. What I did in my case was to reduce the dataset to only one sequence. The model will overfit like crazy but it will show whether the pose supervision loss and the photometric loss are consistent. If you can't get both to be low at the same time, it means there's probably a mistake somewhere. good luck ! |
Hello Clement, Aplogies for reopening the issue after closing it at first place. Initially, I tried the way that I mentioned, but I figure out that its way more complicated and I tested with training it on one sequence but I didnt see the desired results. So I tried to implement the approach that you mentioned, about multiplying the inverse matrices with the first inverse matrix of the sequence. Unfortunatelly, when I trained it only on one sequence, the photometrics loss decreased but not the ego motion error which remained roundly the same among all the epochs (200 in total). Here is the code that I implemented inside the train function in the script train.py.
I am new to the field so I cant be sure for my implementation. I would really appreciate if you could help me to figure out whats the problem |
Hello Clement,
First of all, I have to give you kudos for the amazing work you did in this repo.
Coming to the reason that I wrote the issue, I am trying to find the difference between the predicted translation vector and the ground truth translation vector.
Sadly, I cant manage to extract the predicted trasnlation vector from the output of the pose network. I am aware of the ambiquity on the predicted trasnlation vector. Any help to figure out would be really appreciated.
The text was updated successfully, but these errors were encountered: