Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some questions about the code #13

Open
onlyonewater opened this issue Aug 30, 2020 · 3 comments
Open

Some questions about the code #13

onlyonewater opened this issue Aug 30, 2020 · 3 comments

Comments

@onlyonewater
Copy link

Thank you very much for your code, but I am a little confused about your code.
First, when you calculate the IOU, why do you add one to the numerator and the denominator?
fb41f0628958c69be24b40d5c3a8222
Second, the start_frame variable in TACOSGCN class was confusing. the fps variable means 1/interval, then timestamp means the start time, so, the two variables should not be equal to start_frame when multiplied. The same problem exists in the ActivityNetGCN class.
e4fb1b9a904b6a458ebc1de674d246a
Third, in the paper, you say to use 4 windows widths of [8, 16, 32, 64] for TACoS, but in your code, why you use [6, 18, 32] for TACos? And where is the feature of your sliding windows? Can you provide it? In particular the Activitynet dataset.
f6371beb99d4ce96c29e31f0211aa64

@ChenyunWu
Copy link

ChenyunWu commented Oct 6, 2020

Following up on the issue of IoU: Your evaluation is done on the rescaled 0-200 frame range, not on the original video length. I find the calculated IoU can be very different especially when the video is very short. It's because you didn't linearly rescale the ground truth start/end time to 0~200, but instead sample the frames and mark whether they are from the target segment. It introduces a large error when there are few frames. This makes me skeptical of the result numbers reported in the paper.

@onlyonewater
Copy link
Author

onlyonewater commented Oct 6, 2020

Hi, @ChenyunWu, thank you for your reply, yeah, I am also skeptical about the result. And I found two papers to follow it, they are Jointly Cross- and Self-Modal Graph Attention Network for Query-Based Moment Localization and Fine-grained Iterative Attention Network for TemporalLanguage Localization in Videos, which are accepted by MM 2020. The paper makes me very confused.

@JeRainXiong
Copy link

As for IoU, I think plus 1 in the formula because of the condition that start==end. In this condition, the frames should be counted as 1 rather than 0.
Moreover, the frame length such as [0,199] should be counted as 199-0+1=200 frames.

That's what I think, maybe it's not correct totally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants