Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data_clean.py how to use? #62

Closed
aa12356jm opened this issue Nov 3, 2022 · 4 comments
Closed

data_clean.py how to use? #62

aa12356jm opened this issue Nov 3, 2022 · 4 comments

Comments

@aa12356jm
Copy link

how use the file "data_clean.py " to process dataset "Something-Something V2", thanks

@BeautyMess
Copy link

About data_clean.py

Actually the "data_clean.py" is a wrong file by mistake. It is not for this project.
#33

How to resize

The real preprocessing file is "resize_videos.py" in MMAction2.
#7
The link of the file is https://github.com/open-mmlab/mmaction2/blob/master/tools/data/resize_videos.py
And for the command, I use it like:

python resize_video.py ~/dataset/Video_dataset/something-something/20bn-something-something-v2 ~/dataset/Video_dataset/something-something/20bn-something-something-v2_320p --level 1 --to-mp4 --scale 320 --ext webm

It did two things:
a. resize the short edge into 320 and the long edge by the aspect ratio.
b. convert the video from webm format into mp4 format
After that, you could get the dataset which is 320p and in mp4 format.

How to generate csv label

As they mentioned in Data Preparation, you should get label files in csv format instedad of the original json format. So the conversion is a must.
The final label file is just like this:

~/dataset/Video_dataset/something-something/20bn-something-something-v2_320p/74225.mp4 140
~/dataset/Video_dataset/something-something/20bn-something-something-v2_320p/116154.mp4 127
~/dataset/Video_dataset/something-something/20bn-something-something-v2_320p/198186.mp4 173

The original label file is like:

{"id":"78687","label":"holding potato next to vicks vaporub bottle","template":"Holding [something] next to [something]","placeholders":["potato","vicks vaporub bottle"]},

You should remove "[" and "]" in "template" item as the label value rather than the "label" item.
And my implement is like(not official):

# generate csv annotation for something-something-V2
def json_to_csv(label_dict, json_file, out_file, video_prefix):
    with open(json_file) as f:
        jf = json.load(f)
    csv_file = open(out_file, 'w')
    for i in range(len(jf)):
        item = jf[i]
        # print(item["id"] + ".mp4," + str(label_dict[item["label"]]) + "\n")
        item_label = str(
            label_dict[item["template"].replace("[", "").replace("]", "")])
        csv_file.write(video_prefix + item["id"] + ".mp4 " + item_label + "\n")
    csv_file.close()
video_dir = ~/dataset/Video_dataset/something-something/20bn-something-something-v2_320p/"
label_dir = "~/dataset/Video_dataset/something-something/labels/"
label_name = "labels.json"
train_json = "train.json"
test_json = "test.json"
val_json = "validation.json"
csv_dir = os.path.join(label_dir, "label_csv")
if not os.path.exists(csv_dir):
      os.mkdir(csv_dir)
with open(os.path.join(label_dir, label_name)) as f:
      label_dict = json.load(f)
json_to_csv(label_dict,
                os.path.join(label_dir, val_json),
                os.path.join(csv_dir, "val.csv"),
                video_prefix=video_dir)

Finally

If someone can conatct with the author, please remind him removing the stupid mistake and give a complete guide on how to reproduce the experimental result

@congee524
Copy link
Collaborator

hello, the file has been removed.

@sinaazar
Copy link

Hi, so should the resize of SSV2 be the same as kinetics (320) or 240? because in the data readme 240 is specified.

@congee524
Copy link
Collaborator

It would not be. In fact, regardless of the input size, the crop will finally be resized to a uniform size of 224.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants