Skip to content

Latest commit

 

History

History
195 lines (133 loc) · 9.34 KB

File metadata and controls

195 lines (133 loc) · 9.34 KB

Feature extraction for hateful meme challenge

This code is mainly based on https://github.com/MILVLG/bottom-up-attention.pytorch. Please see below for installation requirements (Section Requirements and Installation). You need to install the specific detectron2 version as specified in the repository, and apex.

Additionally, you need to download the pre-trained model. The one used for the meme challenge is https://awma1-my.sharepoint.com/:u:/g/personal/yuz_l0_tn/EaXvCC3WjtlLvvEfLr3oa8UBLA21tcLh4L8YLbYXl6jgjg?e=SFMoeu. Place it in the same directory as this README.

Once everything is set up, run the following two commands for extracting the features:

python extract_features.py --mode caffe --config-file configs/bua-caffe/extract-bua-caffe-r101-box-only.yaml --image-dir ../../data/img/ --out-dir ../../data/own_features_bbox/ --resume
python extract_features.py --mode caffe --config-file configs/bua-caffe/extract-bua-caffe-r101-gt-bbox.yaml --image-dir ../../data/img/ --gt-bbox-dir ../../data/own_features_bbox/ --out-dir ../../data/own_features_FasterRCNN/ --resume

The directory ../../data/own_features_bbox/ will contain bounding boxes that are extracted by the FasterRCNN model, and this directory has to be created before running the job. The directory ../../data/own_features_FasterRCNN/ will contain the features extracted by the FasterRCNN model, and again has to be created before running the job.

Finally, after having extracted the features, you need to run the file convert_feature_export.py in the data folder of the meme challenge on the output directory:

python convert_feature_export.py --input_dir ../../data/own_features_FasterRCNN/ --output_dir ../../data/own_features/

This file converts the features extracted by the FasterRCNN model into the same format as the MMF features. Those final features should be used for training.

Original README of: bottom-up-attention.pytorch

This repository contains a PyTorch reimplementation of the bottom-up-attention project based on Caffe.

We use Detectron2 as the backend to provide completed functions including training, testing and feature extraction. Furthermore, we migrate the pre-trained Caffe-based model from the original repository which can extract the same visual features as the original model (with deviation < 0.01).

Some example object and attribute predictions for salient image regions are illustrated below. The script to obtain the following visualizations can be found here

example-image

Table of Contents

  1. Prerequisites
  2. Training
  3. Testing
  4. Feature Extraction
  5. Pre-trained models

Prerequisites

Requirements

Note that most of the requirements above are needed for Detectron2.

Installation

  1. Clone the project including the required version of Detectron2

    # clone the repository inclduing Detectron2(@5e2a6f6) 
    $ git clone --recursive https://github.com/MILVLG/bottom-up-attention.pytorch
  2. Install Detectron2

    $ cd detectron2
    $ pip install -e .

Note that the latest version of Detectron2 is incompatible with our project and may result in a running error. Please use the recommended version of Detectron2 (@5e2a6f6) which is downloaded in step 1.

  1. Compile the rest tools using the following script:

    # install apex
    $ git clone https://github.com/NVIDIA/apex.git
    $ cd apex
    $ python setup.py install
    $ cd ..
    # install the rest modules
    $ python setup.py build develop

Setup

If you want to train or test the model, you need to download the images and annotation files of the Visual Genome (VG) dataset. If you only need to extract visual features using the pre-trained model, you can skip this part.

The original VG images (part1 and part2) are to be downloaded and unzipped to the datasets folder.

The generated annotation files in the original repository are needed to be transformed to a COCO data format required by Detectron2. The preprocessed annotation files can be downloaded here and unzipped to the dataset folder.

Finally, the datasets folders will have the following structure:

|-- datasets
   |-- vg
   |  |-- image
   |  |  |-- VG_100K
   |  |  |  |-- 2.jpg
   |  |  |  |-- ...
   |  |  |-- VG_100K_2
   |  |  |  |-- 1.jpg
   |  |  |  |-- ...
   |  |-- annotations
   |  |  |-- train.json
   |  |  |-- val.json

Training

The following script will train a bottom-up-attention model on the train split of VG. We are still working on this part to reproduce the same results as the Caffe version.

$ python3 train_net.py --mode detectron2 \
         --config-file configs/bua-caffe/train-bua-caffe-r101.yaml \ 
         --resume
  1. mode = {'caffe', 'detectron2'} refers to the used mode. We only support the mode with Detectron2, which refers to detectron2 mode, since we think it is unnecessary to train a new model using the caffe mode.

  2. config-file refers to all the configurations of the model.

  3. resume refers to a flag if you want to resume training from a specific checkpoint.

Testing

Given the trained model, the following script will test the performance on the val split of VG:

$ python3 train_net.py --mode caffe \
         --config-file configs/bua-caffe/test-bua-caffe-r101.yaml \ 
         --eval-only --resume
  1. mode = {'caffe', 'detectron2'} refers to the used mode. For the converted model from Caffe, you need to use the caffe mode. For other models trained with Detectron2, you need to use the detectron2 mode.

  2. config-file refers to all the configurations of the model, which also include the path of the model weights.

  3. eval-only refers to a flag to declare the testing phase.

  4. resume refers to a flag to declare using the pre-trained model.

Feature Extraction

Similar with the testing stage, the following script will extract the bottom-up-attention visual features with provided hyper-parameters:

$ python3 extract_features.py --mode caffe \
         --config-file configs/bua-caffe/extract-bua-caffe-r101.yaml \ 
         --image-dir <image_dir> --gt-bbox-dir <out_dir> --out-dir <out_dir>  --resume
  1. mode = {'caffe', 'detectron2'} refers to the used mode. For the converted model from Caffe, you need to use the caffe mode. For other models trained with Detectron2, you need to use the detectron2 mode.

  2. config-file refers to all the configurations of the model, which also include the path of the model weights.

  3. image-dir refers to the input image directory.

  4. gt-bbox-dir refers to the ground truth bbox directory.

  5. out-dir refers to the output feature directory.

  6. resume refers to a flag to declare using the pre-trained model.

Moreover, using the same pre-trained model, we provide a two-stage strategy for extracting visual features, which results in (slightly) more accurate visual features:

# extract bboxes only:
$ python3 extract_features.py --mode caffe \
         --config-file configs/bua-caffe/extract-bua-caffe-r101-bbox-only.yaml \ 
         --image-dir <image_dir> --out-dir <out_dir>  --resume

# extract visual features with the pre-extracted bboxes:
$ python3 extract_features.py --mode caffe \
         --config-file configs/bua-caffe/extract-bua-caffe-r101-gt-bbox.yaml \ 
         --image-dir <image_dir> --gt-bbox-dir <bbox_dir> --out-dir <out_dir>  --resume

Pre-trained models

We provided pre-trained models here. The evaluation metrics are exactly the same as those in the original Caffe project. More models will be continuously updated.

Model Mode Backbone Objects [email protected] Objects weighted [email protected] Download
Faster R-CNN Caffe, K=36 ResNet-101 9.3% 14.0% model
Faster R-CNN Caffe, K=[10,100] ResNet-101 10.2% 15.1% model
Faster R-CNN Caffe, K=100 ResNet-152 11.1% 15.7% model

License

This project is released under the Apache 2.0 license.

Contact

This repo is currently maintained by Jing Li (@J1mL3e_) and Zhou Yu (@yuzcccc).