Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ask for reproducing #40

Open
HYOJINPARK opened this issue Jul 10, 2024 · 6 comments
Open

Ask for reproducing #40

HYOJINPARK opened this issue Jul 10, 2024 · 6 comments

Comments

@HYOJINPARK
Copy link

Hi, thanks for great code and amazing work

I do my best to to make similar performance but I get some trouble.
Could I get some of your advice?

  1. Download dataset
    Unfortunately, some of dataset is impossible to get it and this is what I can download right now.

TimeIT

total number of file : 104403 success : 101377 lost : 3026
yttemporal180m : total num : 31627 but valid 31197
DiDeMo : total num : 33002 but valid 32954
ActivityNet_asr_denseCap : total num : 10009 but valid 9057
vitt : total num : 5141 but valid 5057
COIN : total num : 9029 but valid 7760
QuerYD : total num : 14602 but valid 14429
HiREST : total num : 918 but valid 874
TVSum : total num : 50 but valid 49
SumMe : total num : 25 but valid 0

Valley

total number of file : 72303 success : 14906 lost : 57397
vatex : total num : 36710 but valid 14906
jukin : total num : 35593 but valid 0

  1. ActivityNet
    I downloaded ActivityNet dataset from ther private Google drive.
    And I merged V1_2, V1_3 and missing_files (train and val except test) and did util/compress_video_data.py
    However, valid the nubmer of video is just 9057 not 10009.
    When I check the folder of "Anet_videos_15fps_short256", therer is all the video (10009) in there.

  2. Preprocessing
    I followed Data.md for utils/process_hirest.py and utils/process_valley.py
    I did util/compress_video_data.py only for the AcitivityNet

  3. Reproducing results

image

result1 is without activityNet
result2 is with activityNet but without activityNet preprocessing
result3 is with activityNet and with activityNet preprocessing

  1. I use 8 GPU and follow same training config file (stage2_finetune_time104k_valley72k.yaml)

Question

  1. Do you have result without using Valley dataset?
  2. Should I do utils/compress_video_data.py for all other dataset?
  3. sometimes I get this warnig in log, is it matter?
    Failed to load examples with video: ....dataset/vatex/videos/--SOz3xjWfA_000037_000047.mp4. Will randomly sample an example as a replacement.
  4. how did you download jukin dataset?

Thanks for reading

@RenShuhuai-Andy
Copy link
Owner

Hi, thanks for your interest.

  1. Unfortunately, we don't have results w/o Valley, but I believe that it mainly contributes to general video tasks (e.g., qa, captioning), instead of the time-sensitive tasks (e.g., temporal grounding). If you focus on the latter type of tasks, it's ok to only use TimeIT.

  2. Using utils/compress_video_data.py helps accelerate data loading and processing, you can do so if you want this :) For ActivityNet, I remember that it will print a lot of warning messages if you don't use utils/compress_video_data.py.

  3. This message means that the target video is broken or missed, so the program will sample another video as a replacement. Generally, it doesn't matter if the broken/missing situations are rare. Otherwise, the model performance may be influenced since too many training samples are missing.

  4. Please refer to https://huggingface.co/datasets/luoruipu1/Valley-Instruct-65k

@HYOJINPARK
Copy link
Author

HYOJINPARK commented Jul 10, 2024

Hi @RenShuhuai-Andy

Thanks for your reply.
Actually, the code of link " https://huggingface.co/datasets/luoruipu1/Valley-Instruct-65k" for Jukin dataset does not work anymore.
I guess they block to download dataset.

Do you use compress_video_data.py for every video dataset?
Also, which ActivityNet version is used?
Actually I suprised that the accuracy is increased after applying preprocessing to activitynet ( 36.9 ->39.0).
Even though, it is still low accuracy.

@RenShuhuai-Andy
Copy link
Owner

I guess they block to download dataset.

Sorry to hear that, you can find if there is another way to download the dataset.

Do you use compress_video_data.py for every video dataset?

No, we only use compress_video_data.py for youcook2 and activitynet

which ActivityNet version is used?

release 1.3 (the latest release) if I remember correctly

Actually I suprised that the accuracy is increased after applying preprocessing to activitynet ( 36.9 ->39.0).

Yes it may happen

Even though, it is still low accuracy.

Actually, I'm confused about your posed table.

What's the eval dataset? Charades-STA?

What's the training dataset? only Charades for result 1, Charades+ActivityNet for result 2 and 3?

@HYOJINPARK
Copy link
Author

Yes it is Charades-STA
It is zero-shot and thus I used TImeIT and valley ( only partial of vatex) by following stage2_finetune_time104k_valley72k.yaml

Result1 is TImeIT (without ActivityNet) + vatex
Result2 and Result3 are TImeIT (with ActivityNet) + vatex

I did not use Charades-STA, and zero-shot performance is 32.2 (iou=0.5) and 13.4 (iou 0.7) as following Table2.

@RenShuhuai-Andy
Copy link
Owner

Hi, sorry for the late reply.

The reproduced performance is indeed much lower. According to our ablation study in table 7, we can achieve 34.9 R@1 (iou=0.5) with only DVC and TVG data.

Can you post your training config? What about increasing the training steps (e.g., double the training steps)?

image image

@GroundMoRe
Copy link

I used the official timechat_7b.pth but only obtain:
44.22, 27.20, 11.69 on Charades-STA.

I just follow the configs in eval.sh:
TASK='tvg'
ANNO_DIR='/code/data/TimeIT/data/temporal_video_grounding/charades/charades_annotation'
VIDEO_DIR='data/Charades/videos/'
DATASET='charades'
SPLIT='test'
PROMPT_FILE="prompts/${TASK}_description_zeroshot.txt"
GT_FILE="${ANNO_DIR}/${SPLIT}.caption_coco_format.json"
ASR_DIR='data/Charades/whisper_outputs_with_time/tiny.en.cleaned/'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants