Code and data for the CVPR 2020 paper 'Action Modifiers: Learning from Adverbs in Instructional Videos'.
The files containing the adverb annotations can be found in train.csv
and test.csv
. The files contain the following columns:
Column Name | Type | Example | Description |
---|---|---|---|
id | int | 955 | Unique id for this adverb-action annotation |
vid_id | string | S7wF6S5ywo4 | YouTube id for the video the annotation is for |
weak_timestamp | float | 19.435 | Value in seconds of the action-adverb in the narration |
clustered_adverb | string | quickly | Annotated adverb |
clustered_action | string | cut | Annotated action |
task_num | int | 105259 | The id for the task in the HowTo100M dataset |
adverb | string | fast | The original adverb from the narration |
action | string | slice | The original action from the narration |
The features can be downloaded here: https://drive.google.com/open?id=12POBotvtWimAv-PtRswCbUWYucUJ8Aic
This contains two files per entry in train.csv
or test.csv
, one for RGB features, one for flow features.
Files are named <annotation_id>_<modality>.npz
.
The videos can be downloaded using: python utils/download_videos.py <train.csv|test.csv> <download_dir> --trim 20
The --trim 20
argument extracts 20 seconds around the weak timestamp as used to extract features.
antonym.csv
lists each adverb and its antonym
adverb_clusters.csv
lists the clusters of adverbs with the following columns:
Column Name | Type | Example | Description |
---|---|---|---|
adverb_id | int | 0 | ID of this adverb |
cluster_key | string | coarsely | Main adverb representing the cluster |
adverbs | list of strings | ['coarsley', 'coarse', 'thickly', 'not finely', 'not fine'] | Narrated adverbs in this cluster |
action_clusters.csv
is defined similarly
To train the model run:
python train.py --feature-dir <path_to_directory_containing_features> --checkpoint-dir <path_to_save_checkpoints_to>
To train the model without first training the action embedding run
python train.py --no-pretrain-action --temporal-agg <sdp|average|single> --feature_dir <path_to_directory_containing_features> --checkpoint-dir <path_to_save_checkpoints_to>
To test a model run:
python test.py --laod <checkpoint_path> --temporal-agg <sdp|average|single> --feature-dir <path_to_features>
Models corresponding to results in the paper can be found under models/
they are:
- full_model.ckpt - the final result in the paper
- sdp.ckpt - the proposed model without the first stage of only training the action embedding
- average.ckpt - action modifiers without the temporal attention
- single.ckpt - action modifiers with only the second around the weak timestamp
- action.ckpt - a pretrained action embedding with scaled dot-product attention without action modifiers
To parse subtitles for action-adverb pairs you first need to download the subtitles and punctuated texts. Alternatively you can punctuate your own subtitles with this tool
Then run:
python get_action_adverb_pairs.py <path_to_subtitles> <path_to_punctuated texts> output.csv --adverb-file data/adverbs.csv --action-file data/actions.csv --task-list data/tasks.csv
--adverb-file
, --action-file
and --task-list
are optional arguments use to filter the search space.