This is the official repository for Low-power, Continuous Remote Behavioral Localization with Event Cameras accepted at CVPR 2024 by Friedhelm Hamann, Suman Ghosh, Ignacio Juarez Martinez, Tom Hart, Alex Kacelnik, Guillermo Gallego
If you use this work in your research, please consider citing:
@InProceedings{Hamann24cvpr,
author = {Hamann, Friedhelm and Ghosh, Suman and Martinez, Ignacio Juarez and Hart, Tom and Kacelnik, Alex and Gallego, Guillermo},
title = {Low-power Continuous Remote Behavioral Localization with Event Cameras},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2024},
pages = {18612-18621}
}
You can use Miniconda to set up an environment:
conda create --name eventpenguins python=3.8
conda activate eventpenguins
Install PyTorch by choosing a command that matches your CUDA version. You can find the compatible commands on the PyTorch official website (tested with PyTorch 2.2.2), e.g.:
conda install pytorch torchvision pytorch-cuda=12.1 -c pytorch -c nvidia
Install other required packages:
pip install -r requirements.txt
- Create a folder for the data:
cd <project-root>
mkdir data
-
Download the data and save it in
<project-root>/data
. -
Create the pre-processed dataset with the following command:
python scripts/preprocess.py --data_root data/EventPenguins --output_dir data --recording_info_path config/annotations/recording_info.csv
This crops the events according to the pre-annotated nests and stores the recordings according to the split specified in the paper.
- Create a folder for models:
mkdir models
-
Download the pre-trained model weights from here and save them in the
models
folder. -
Run inference with the following command:
python scripts/inference.py --config config/exp/inference.yaml --verbose
The EventPenguins dataset contains 24 ten-minute recordings, with 16 annotated nests.
An overview of the data can be found in config/annotations/recording_info.csv
.
Each recording has a roi_group_id
, which links to the location of the 16 pre-annotated regions of interest, which can be found in config/annotations/rois
(new set of ROIs when the camera was moved).
The dataset is structured as follows:
EventPenguins/
├── <yy-mm-dd>_<hh-mm-ss>/ # (these folders are referred to as "recordings")
│ ├── frames/
│ │ ├── 000000000000.png
│ │ ├── 000000000001.png
│ │ └── ...
│ ├── events.h5
│ ├── frame_timestamps.txt # [us]
│ └── metadata.yaml
└── ...
Please note that we do not use the grayscale frames in our method but provide them for completeness.
The processed data is stored in a single HDF5 file named preprocessed.h5
. The file structure is organized as follows:
- Each ten-minute recording is stored in a group labeled by its timestamp (e.g.,
22-01-12_17-26-00
). - Each group (timestamp) contains multiple subgroups, each corresponding to a specific ROI (nest) identified by an ID (e.g.,
N01
). - Each ROI subgroup contains:
- An
events
dataset, where each event is represented as a row[x, y, t, p]
indicating the event's x-position, y-position, timestamp (us), and polarity, respectively. - Attributes
height
andwidth
indicating the dimensions of the ROI.
- An
Each subgroup (ROI) has the following attributes:
height
: The height of the ROI in pixels.width
: The width of the ROI in pixels.
Each main group (recording timestamp) has the following attribute:
split
: Indicates the data split (e.g.,train
,test
,validate
) that the recording belongs to.
The annotations are in config/annotations/annotations.json
.
The structure is very similar to ActivityNet, with an additional layer to consider different nests.
{
"version": "VERSION 0.0",
"database": {
"<yy-mm-dd>_<hh-mm-ss>": {
"annotations": {
"<roi_id>": [
{
"label": <label>,
"segment": [
<t_start>,
<t_end>
]
},
...
]
}
}
}
}
<yy-mm-dd>_<hh-mm-ss>
is the identifier for a ten-minute recordingroi_id
is an integer number encoding the nestt_start
andt_end
are the start and end times of an action in seconds- The
label
is one of["ed", "adult_flap", "chick_flap"]
.
"adult_flap"
and "chick_flap"
are other types of wing flapping easily confused with the ecstatic display (ed
).
We provide these labels for completeness, but they are not considered in our method.
The evaluation for activity detection is largely inspired by ActivityNet. We thank the authors for their excellent work.