You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
GPViT: A High Resolution Non-Hierarchical Vision Transformer with Group Propagation
This repository contains the official PyTorch implementation of GPViT, a high-resolution non-hierarchical vision transformer architecture designed for high-performing visual recognition, which is introduced in our paper:
Our code base is built upon the MM-series toolkits. Specifically, classification is based on MMClassification; object detection is based on MMDetection; and semantic segmentation is based on MMSegmentation. Users can follow the official site of those toolkit to set up their environments. We also provide a sample setting up script as following:
Please follow MMClassification, MMDetection and MMSegmentation to set up the ImageNet, COCO and ADE20K datasets. For ImageNet experiment, we convert the dataset to LMDB format to accelerate training and testing. For example, you can convert you own dataset by running:
@InProceedings{yang2023gpvit,
title={{GPViT: A High Resolution Non-Hierarchical Vision Transformer with Group Propagation}},
author={Chenhongyi Yang and Jiarui Xu and Shalini De Mello and Elliot J. Crowley and Xiaolong Wang},
journal={ICLR}
year={2023},
}