data_clean.py how to use? #62

aa12356jm · 2022-11-03T12:03:33Z

how use the file "data_clean.py " to process dataset "Something-Something V2", thanks

BeautyMess · 2022-11-17T02:11:52Z

About data_clean.py

Actually the "data_clean.py" is a wrong file by mistake. It is not for this project.
#33

How to resize

The real preprocessing file is "resize_videos.py" in MMAction2.
#7
The link of the file is https://github.com/open-mmlab/mmaction2/blob/master/tools/data/resize_videos.py
And for the command, I use it like:

python resize_video.py ~/dataset/Video_dataset/something-something/20bn-something-something-v2 ~/dataset/Video_dataset/something-something/20bn-something-something-v2_320p --level 1 --to-mp4 --scale 320 --ext webm

It did two things:
a. resize the short edge into 320 and the long edge by the aspect ratio.
b. convert the video from webm format into mp4 format
After that, you could get the dataset which is 320p and in mp4 format.

How to generate csv label

As they mentioned in Data Preparation, you should get label files in csv format instedad of the original json format. So the conversion is a must.
The final label file is just like this:

~/dataset/Video_dataset/something-something/20bn-something-something-v2_320p/74225.mp4 140
~/dataset/Video_dataset/something-something/20bn-something-something-v2_320p/116154.mp4 127
~/dataset/Video_dataset/something-something/20bn-something-something-v2_320p/198186.mp4 173

The original label file is like:

{"id":"78687","label":"holding potato next to vicks vaporub bottle","template":"Holding [something] next to [something]","placeholders":["potato","vicks vaporub bottle"]},

You should remove "[" and "]" in "template" item as the label value rather than the "label" item.
And my implement is like(not official):

# generate csv annotation for something-something-V2
def json_to_csv(label_dict, json_file, out_file, video_prefix):
    with open(json_file) as f:
        jf = json.load(f)
    csv_file = open(out_file, 'w')
    for i in range(len(jf)):
        item = jf[i]
        # print(item["id"] + ".mp4," + str(label_dict[item["label"]]) + "\n")
        item_label = str(
            label_dict[item["template"].replace("[", "").replace("]", "")])
        csv_file.write(video_prefix + item["id"] + ".mp4 " + item_label + "\n")
    csv_file.close()
video_dir = ~/dataset/Video_dataset/something-something/20bn-something-something-v2_320p/"
label_dir = "~/dataset/Video_dataset/something-something/labels/"
label_name = "labels.json"
train_json = "train.json"
test_json = "test.json"
val_json = "validation.json"
csv_dir = os.path.join(label_dir, "label_csv")
if not os.path.exists(csv_dir):
      os.mkdir(csv_dir)
with open(os.path.join(label_dir, label_name)) as f:
      label_dict = json.load(f)
json_to_csv(label_dict,
                os.path.join(label_dir, val_json),
                os.path.join(csv_dir, "val.csv"),
                video_prefix=video_dir)

Finally

If someone can conatct with the author, please remind him removing the stupid mistake and give a complete guide on how to reproduce the experimental result

congee524 · 2022-11-21T02:11:28Z

hello, the file has been removed.

sinaazar · 2023-01-22T22:26:50Z

Hi, so should the resize of SSV2 be the same as kinetics (320) or 240? because in the data readme 240 is specified.

congee524 · 2023-02-02T06:26:05Z

It would not be. In fact, regardless of the input size, the crop will finally be resized to a uniform size of 224.

congee524 closed this as completed Nov 21, 2022

ShoufaChen mentioned this issue Feb 3, 2023

[ Preprocessing SSv2 ] ShoufaChen/AdaptFormer#20

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data_clean.py how to use? #62

data_clean.py how to use? #62

aa12356jm commented Nov 3, 2022

BeautyMess commented Nov 17, 2022

congee524 commented Nov 21, 2022

sinaazar commented Jan 22, 2023

congee524 commented Feb 2, 2023

data_clean.py how to use? #62

data_clean.py how to use? #62

Comments

aa12356jm commented Nov 3, 2022

BeautyMess commented Nov 17, 2022

About data_clean.py

How to resize

How to generate csv label

Finally

congee524 commented Nov 21, 2022

sinaazar commented Jan 22, 2023

congee524 commented Feb 2, 2023