-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some questions about the code #13
Comments
Following up on the issue of IoU: Your evaluation is done on the rescaled 0-200 frame range, not on the original video length. I find the calculated IoU can be very different especially when the video is very short. It's because you didn't linearly rescale the ground truth start/end time to 0~200, but instead sample the frames and mark whether they are from the target segment. It introduces a large error when there are few frames. This makes me skeptical of the result numbers reported in the paper. |
Hi, @ChenyunWu, thank you for your reply, yeah, I am also skeptical about the result. And I found two papers to follow it, they are Jointly Cross- and Self-Modal Graph Attention Network for Query-Based Moment Localization and Fine-grained Iterative Attention Network for TemporalLanguage Localization in Videos, which are accepted by MM 2020. The paper makes me very confused. |
As for IoU, I think plus 1 in the formula because of the condition that start==end. In this condition, the frames should be counted as 1 rather than 0. That's what I think, maybe it's not correct totally. |
Thank you very much for your code, but I am a little confused about your code.
First, when you calculate the IOU, why do you add one to the numerator and the denominator?
Second, the start_frame variable in TACOSGCN class was confusing. the fps variable means 1/interval, then timestamp means the start time, so, the two variables should not be equal to start_frame when multiplied. The same problem exists in the ActivityNetGCN class.
Third, in the paper, you say to use 4 windows widths of [8, 16, 32, 64] for TACoS, but in your code, why you use [6, 18, 32] for TACos? And where is the feature of your sliding windows? Can you provide it? In particular the Activitynet dataset.
The text was updated successfully, but these errors were encountered: