FAQs

When do you plan to submit the code?

It's there!!!

We haven't released the training code yet, but we will do that once we run a functional test.

C3D for all the frames, are you kidding me?

Our program generate_proposals.py explicitly says that an HDF5-file with C3D features for all the frames in the video is needed to run successfully. It sounds like an overkilling requirement but take a look of our visual encoder interface before get sad/mad 😄.

The visual encoder interface was designed to deal with different striding and sampling schemes. Therefore, we have two solutions for you:

Are you an easy-going person? Here you go:

Get C3D features every eight frames and uses NaNs for the other frames.
Save the HDF5-file enabling compression flags and chunked storage so you won't waste space.
Run our demo program as it is.

Take a look of the c3d_features dataset of the video_test_0000541 group in the sample HDF5-file. You will find a lot of NaN values, but it does not break the code 😉.

Do you like principled solutions?

Change the f_stride of the visual encoder accordingly to your desired sampling scheme.
Make sure that the duration used to request features is setup accordingly to your sampling scheme.

Weird unexpected differences with different Lasagne/Theano versions

I tried our best model with the stable version of Lasagne (0.1) using Theano 0.8.2, and I got different results.

Ensure that you are using the same environment of our software stack and you did something similar to the installation steps described here. Especially, the steps involving pip.

You can find an explanation for this behavior here.

Could you share with us the evaluation script used to obtain the Average Recall curve, and Recall vs. tIoU curve?

You can find scripts to compute both curves in this IPython Notebook.

Can you share the DAPs proposal results?

DAPs proposal results in THUMOS14 test set with annotations here.
DAPs proposal results in ActivityNet v1.2 validation set here.

Could we have access to the numbers reported in Figure 4?

Please download the numbers from the following URLs: Figure 4a , Figure 4b. Additionally, we shared an IPython Notebook where you can find hints on how to read and plot the data.

Could we have access to the numbers reported in Figure 5?

Sure, download it from here. If you want to keep the same colors and style, check out this notebook.

Can you give me more details about the subsets in Figure 5?

Figure 5 compares the Average Recall in two different datasets ActivityNet version 1.2 and THUMOS-14.

THUMOS-14 (green line) corresponds to the results of our model in the videos from the test set of THUMOS-14.
ActivityNet (purple line) corresponds to the results of our model in the videos from the validation set of ActivityNet.
ActivityNet \cap THUMOS-14 (yellow line) corresponds to the results of our model in the videos from the validation set of ActivityNet that contain instances whose action label match any of the action labels in THUMOS-14.
ActivityNet <= 1024 frames (pink line) corresponds to the results of our model in the videos from the validation set of ActivityNet whose annotations span up to 1024 frames. Note that the videos can be longer than 1024 frames.

Notes

The last two subsets are disjoint i.e. action labels associated with instances appearing in videos from ActivityNet <= 1024 frames do not match any action label in THUMOS-14.
Figure 5 shows the performance of a model trained using the videos in the validation set of THUMOS14.

The following python script saves the subsets of ActivityNet on disk as CSV files. Make sure to have pandas and requests to execute it.

import io
import pandas as pd
import requests

# Retrieve ActivityNet v1.2 annotations := ground-truth
ground_truth_url = ('https://gist.githubusercontent.com/escorciav/'
                    'f21e798a9bab759b583864c8994ec63f/raw/'
                    '45682887f395dcf6c80fd6608404fc78ce12b75b/'
                    'activitynet_v1-2_val_groundtruth.csv')
s = requests.get(ground_truth_url).content
ground_truth = pd.read_csv(io.StringIO(s.decode('utf-8')), sep=' ')

# Define a couple of constants
ANET_OVERLAP_THUMOS14 = [159, 82, 233, 224, 195, 116, 80, 106, 169]
MAX_DURATION = 1024

# Get subsets
idx_similar_length = (ground_truth['f-end'] - ground_truth['f-init']) <= MAX_DURATION
idx_overlapped = ground_truth['label-idx'].isin(ANET_OVERLAP_THUMOS14)
df_similar_length = ground_truth.loc[idx_similar_length & (~idx_overlapped), :]
df_overlapp_thumos14 = ground_truth.loc[idx_similar_length & idx_overlapped, :]

# Dump subset into disk as CSV
for i, j in [(df_similar_length, 'subset_similar_length.csv'),
             (df_overlapp_thumos14, 'subset_overlap_thumos14.csv')]:
    i.to_csv(j, index=None, sep=' ')

P.D. sorry but Github-flavored Markdown does not render latex.

Cherry pick for lazy bash users

Do you want all the proposals of a bunch of videos?

Create a CSV-file with the video-name and do something like:

for i in {2..215}; do
  video_name=$(sed -n $i"p" [your CSV file])
  generate_proposals.py -iv $video_name -ic3d [HDF5 file with C3D] -imd [NPZ file with DAPs model];
done

Do you want to use the ipython notebook above and you have a bunch of CSV-files?

touch all_results
for i in *.csv; do
  sed -n "2,$(wc -l $i | awk '{print $1}')p" $i >> all_results
done
mv all_results all_results.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly