-
Notifications
You must be signed in to change notification settings - Fork 227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
difference in validation with sparse ground truth and filled ground truth of depth #46
Comments
Main criteria is from sparse ground truth. The key idea is that interpolated data is not real data, thus comparing your prediction with that does not make much sense. The interpolation can be used for qualitative result where you can subjectively decide whether your prediction looks like the interpolated ground truth. The probleme with quantitative result with interpolated data resides in the plane boundaries. between a pixel belonging to a foreground plane (say the car) and the next one on the background, there is a discontinuity, but you don't know for sure where. Interpolate "blurs" the discontinuity and some actually good points from your prediction may appear wrong because the interpolated value is a mid point between fore and background while it should not be. Hope I was clear enough ! For depth evulation with KITTI, you can look at the first paper using the now usual measurements : https://papers.nips.cc/paper/5539-depth-map-prediction-from-a-single-image-using-a-multi-scale-deep-network.pdf |
Hi, @ClementPinard |
Officially it is supposed to be the exact same. The depth benchmark is just a ready-to-go depth image instead of LiDar data + calibration. Now, if you look at other datasets, you can see slight differences, especially with Odometry, where groundtruth pose has probably been smoothed compared to raw data + calibration. I think it's safe to say that the evaluation is pretty much the same here, because the LiDar and fixed calibration are pretty reliable. |
I do not suggest validating against interpolated points(Haven't seen anyone doing that) but you can use interpolated to train your network and my experiments shows there is a boost in it if your interpolation is good without weird artifact. @ClementPinard I have evaluated the "Lidar data + calibration" and the post process KITTI depth data for Eigen split. In reality, as you said they should be exactly same but there are quite different noises and artifacts that affects the LiDAR measurements and LiDAR and post process depth are not same and LiDAR is not reliable measurement. Right now the research of depth estimation for KITTI is at the point where using LiDAR for evaluation should be revisited. I have also shown if you use ground truth from new Kitti benchmark for training you get a huge performance boost (comparing row one and three) more info was discussed here: |
@a-jahani Please correct if I am wrong. There are two ways to obtain GT depth for the Kitti test setup (eigen split or kitti split doesn't matter): 1) Calibration + Velodyne data (Lidar) or 2) The official ground truth depth images (un interpolated) provided by the Official Kitti providers. People so far (including this work) used to follow 1) but the idea is to slowly shift towards 2) ?? |
@koutilya40192 Yes you are right in all of your sentences. evaluating on 1) is not good as the ground truth is wrong 2) is better but still it is not dense so your algorithm might predict very wrong result on those regions while getting good numbers. There is no interpolated version (completed depth) of the GT depth images by the offiicial Kitti providers. Some researchers interpolate themselfs using different methods and some use the interpolated for visualization only and some use it for training and none (as far as I know) use it for quantitative evaluation. For quantitative evaluation it's either 1) or 2) and I suggest use 2) and submit your result to the benchmark. |
The interpolated point is not good, especially because of depth discontinuities between fore and background The interpolated will be very wrong. The only way to have a dense ground truth is to interpolated the 3D point cloud to have a mesh and then to project the mesh to the camera frame. However you then need a much denser 3D point, and from different points of view, because here we only have the POV of the car. It's going to take some work to have a truly dense depth ground truth enabled dataset to validate these algorithms with. As for applying the 2), I'll see what we can do to provide a script that does exactly that, be it only indicating from the README where to get the data or a brand new testing script. |
Thanks for your responses @a-jahani and @ClementPinard . That really clears a lot of my doubts. |
Hi,
The depth predictions are validated with sparse ground truth depths of KITTI here, but there are also other papers validating against full ground truth (filled by interpolating). Will there be a large difference in the validation result between these two methods? Which one is the main criteria nowadays in monocular depth estimation?
The text was updated successfully, but these errors were encountered: