SNARE dataset and code for MATCH and LaGOR models.
Language Grounding with 3D Objects
@article{snare,
title={Language Grounding with {3D} Objects},
author={Jesse Thomason and Mohit Shridhar and Yonatan Bisk and Chris Paxton and Luke Zettlemoyer},
journal={arXiv},
year={2021},
url={https://arxiv.org/abs/2107.12514}
}
$ git clone https://github.com/snaredataset/snare.git
$ virtualenv -p $(which python3) --system-site-packages snare_env # or whichever package manager you prefer
$ source snare_env/bin/activate
$ pip install --upgrade pip
$ pip install -r requirements.txt
Edit root_dir
in cfgs/train.yaml to reflect your working directory.
Download pre-extracted image features, language features, and pre-trained checkpoints from here and put them in the data/
folder.
$ python train.py train.model=zero_shot_cls train.aggregator.type=maxpool
$ python train.py train.model=single_cls train.aggregator.type=maxpool
$ python train.py train.model=rotator train.aggregator.type=two_random_index train.lr=5e-5 train.rotator.pretrained_cls=<path_to_pretrained_single_cls_ckpt>
Run scripts/train_classifiers.sh
and scripts/train_rotators.sh
to reproduce the results from the paper.
To train the rotators, edit scripts/train_rotators.sh
and replace the PRETRAINED_CLS
with the path to the checkpoint you wish to use to train the rotator:
PRETRAINED_CLS="<root_path>/clip-single_cls-random_index/checkpoints/<ckpt_name>.ckpt'"
If you want to extract CLIP vision and language features from raw images:
- Download models-screenshot.zip from ShapeNetSem, and extract it inside
./data/
. - Edit and run
python scripts/extract_clip_features.py
to saveshapenet-clipViT32-frames.json.gz
andlangfeat-512-clipViT32.json.gz
Please send your ...test.json
prediction results to Mohit Shridhar. We will get back to you as soon as possible.
Instructions:
- Include a name for your model, your team name, and affiliation (if not anonymous).
- Submissions are limited to a maximum of one per week. Please do not create fake email accounts and send multiple submissions.
Rankings:
Rank | Model | All | Visual | Blind |
---|---|---|---|---|
1 | DA4LG (Anonymous) 5 Feb 2024 |
81.9 | 88.5 | 75.0 |
2 | MAGiC (Mitra et al.) 8 Jun 2023 |
81.7 | 87.7 | 75.4 |
3 | DA4LG (Anonymous) 27 Jan 2024 |
80.9 | 87.7 | 73.7 |
4 | VLG (Corona et al.) 15 Mar 2022 |
79.0 | 86.0 | 71.7 |
5 | LOCKET (Anonymous) 14 Oct 2022 |
79.0 | 86.1 | 71.5 |
6 | VLG (Corona et al.) 13 Nov 2021 |
78.7 | 85.8 | 71.3 |
7 | LOCKET (Anonymous) 23 Oct 2022 |
77.7 | 85.5 | 69.5 |
8 | LAGOR (Thomason et. al) 15 Sep 2021 |
77.0 | 84.3 | 69.4 |
9 | MATCH (Thomason et. al) 15 Sep 2021 |
76.4 | 83.7 | 68.7 |