Hierarchical Transformer
for Efficient Image Super-Resolution

Xiang Zhang¹ · Yulun Zhang² · Fisher Yu¹

¹ETH Zürich ²MoE Key Lab of Artificial Intelligence, Shanghai Jiao Tong University

ECCV 2024 - Oral

[Paper] | [Supp] | [Video] | [🤗Hugging Face] | [Visual Results] | [Models]

Abstract: Transformers have exhibited promising performance in computer vision tasks including image super-resolution (SR). However, popular transformer-based SR methods often employ window self-attention with quadratic computational complexity to window sizes, resulting in fixed small windows with limited receptive fields. In this paper, we present a general strategy to convert transformer-based SR networks to hierarchical transformers (HiT-SR), boosting SR performance with multi-scale features while maintaining an efficient design. Specifically, we first replace the commonly used fixed small windows with expanding hierarchical windows to aggregate features at different scales and establish long-range dependencies. Considering the intensive computation required for large windows, we further design a spatial-channel correlation method with linear complexity to window sizes, efficiently gathering spatial and channel information from hierarchical windows. Extensive experiments verify the effectiveness and efficiency of our HiT-SR, and our improved versions of SwinIR-Light, SwinIR-NG, and SRFormer-Light yield state-of-the-art SR results with fewer parameters, FLOPs, and faster speeds (~7x).

🔥 News

2024-09: 🤗HiT-SR is available at 🤗Hugging Face. Thank Niels!
2024-08: 🧑‍💻HiT-SRF is available at neosr. Thank muslll!
2024-07: 🎉HiT-SR is accepted by ECCV 2024! This repo is released.

🛠️ Setup

Python 3.8
PyTorch 1.8.0 + Torchvision 0.9.0
NVIDIA GPU + CUDA

git clone https://github.com/XiangZ-0/HiT-SR.git
conda create -n HiTSR python=3.8
conda activate HiTSR
pip install -r requirements.txt
python setup.py develop

💿 Datasets

Training and testing sets can be downloaded as follows:

Training Set	Testing Set	Visual Results
DIV2K (800 training images, 100 validation images) [organized training dataset DIV2K: One Drive]	Set5 + Set14 + BSD100 + Urban100 + Manga109 [complete testing dataset: One Drive]	One Drive

Download training and testing datasets and put them into the corresponding folders of datasets/. See datasets for the detail of the directory structure.

🚀 Models

Method	#Param. (K)	FLOPs (G)	Dataset	PSNR (dB)	SSIM	Model Zoo	Visual Results
HiT-SIR	792	53.8	Urban100 (x4)	26.71	0.8045	One Drive	One Drive
HiT-SNG	1032	57.7	Urban100 (x4)	26.75	0.8053	One Drive	One Drive
HiT-SRF	866	58.0	Urban100 (x4)	26.80	0.8069	One Drive	One Drive

The output size is set to 1280x720 to compute FLOPs.

🏋 Training

Download training (DIV2K, already processed) and testing (Set5, Set14, BSD100, Urban100, Manga109, already processed) datasets, place them in datasets/.

Run the following scripts. The training configuration is in options/Train/.

# HiT-SIR, input=64x64, 4 GPUs
python -m torch.distributed.launch --nproc_per_node=4 --master_port=1234 basicsr/train.py -opt options/Train/train_HiT_SIR_x2.yml --launcher pytorch
python -m torch.distributed.launch --nproc_per_node=4 --master_port=1234 basicsr/train.py -opt options/Train/train_HiT_SIR_x3.yml --launcher pytorch
python -m torch.distributed.launch --nproc_per_node=4 --master_port=1234 basicsr/train.py -opt options/Train/train_HiT_SIR_x4.yml --launcher pytorch

# HiT-SNG, input=64x64, 4 GPUs
python -m torch.distributed.launch --nproc_per_node=4 --master_port=4321 basicsr/train.py -opt options/Train/train_HiT_SNG_x2.yml --launcher pytorch
python -m torch.distributed.launch --nproc_per_node=4 --master_port=4321 basicsr/train.py -opt options/Train/train_HiT_SNG_x3.yml --launcher pytorch
python -m torch.distributed.launch --nproc_per_node=4 --master_port=4321 basicsr/train.py -opt options/Train/train_HiT_SNG_x4.yml --launcher pytorch

# HiT-SRF, input=64x64, 4 GPUs
python -m torch.distributed.launch --nproc_per_node=4 --master_port=1234 basicsr/train.py -opt options/Train/train_HiT_SRF_x2.yml --launcher pytorch
python -m torch.distributed.launch --nproc_per_node=4 --master_port=1234 basicsr/train.py -opt options/Train/train_HiT_SRF_x3.yml --launcher pytorch
python -m torch.distributed.launch --nproc_per_node=4 --master_port=1234 basicsr/train.py -opt options/Train/train_HiT_SRF_x4.yml --launcher pytorch

The training experiments will be stored in experiments/.

🧪 Testing

Test with ground-truth images

Download the pre-trained models and place them in experiments/pretrained_models/.

We provide pre-trained models for efficient image SR: HiT-SIR, HiT-SNG, and HiT-SRF (x2, x3, x4).
Download testing datasets (Set5, Set14, BSD100, Urban100, Manga109), place them in datasets/.

Run the following scripts. The testing configuration is in options/Test/ (e.g., test_HiT_SIR_x2.yml).

Note 1: You can set use_chop: True (default: False) in YML to chop the image for testing.

# No self-ensemble
# HiT-SIR, reproduces results in Table 2 of the main paper
python basicsr/test.py -opt options/Test/test_HiT_SIR_x2.yml
python basicsr/test.py -opt options/Test/test_HiT_SIR_x3.yml
python basicsr/test.py -opt options/Test/test_HiT_SIR_x4.yml

# HiT-SNG, reproduces results in Table 2 of the main paper
python basicsr/test.py -opt options/Test/test_HiT_SNG_x2.yml
python basicsr/test.py -opt options/Test/test_HiT_SNG_x3.yml
python basicsr/test.py -opt options/Test/test_HiT_SNG_x4.yml

# HiT-SRF, reproduces results in Table 2 of the main paper
python basicsr/test.py -opt options/Test/test_HiT_SRF_x2.yml
python basicsr/test.py -opt options/Test/test_HiT_SRF_x3.yml
python basicsr/test.py -opt options/Test/test_HiT_SRF_x4.yml

The output is stored in results/. All visual results of our pre-trained models can be accessed via one drive.

Test without ground-truth images

Download the pre-trained models and place them in experiments/pretrained_models/.

We provide pre-trained models for efficient image SR: HiT-SIR, HiT-SNG, and HiT-SRF (x2, x3, x4).
Put your dataset (single LR images) in datasets/single. Some example images are in this folder.
Run the following scripts. The testing configuration is in options/test/ (e.g., test_single_x2.yml).

Note 1: The default model is HiT-SRF. You can use other models like HiT-SIR by modifying the YML.

Note 2: You can set use_chop: True (default: False) in YML to chop the image for testing.
```
# Test on your dataset without ground-truth images
python basicsr/test.py -opt options/Test/test_single_x2.yml
python basicsr/test.py -opt options/Test/test_single_x3.yml
python basicsr/test.py -opt options/Test/test_single_x4.yml
```
The output is stored in results/.

📊 Results

We apply our HiT-SR approach to improve SwinIR-Light, SwinIR-NG and SRFormer-Light, corresponding to our HiT-SIR, HiT-SNG, and HiT-SRF. Compared with the original structure, our improved models achieve better SR performance while reducing computational burdens.

Performance improvements of HiT-SR (SIR, SNG, and SRF indicate SwinIR-Light, SwinIR-NG, and SRFormer-Light, respectively).

Efficiency improvements of HiT-SR (SIR, SNG, and SRF indicate SwinIR-Light, SwinIR-NG, and SRFormer-Light, respectively). The complexity metrics are calculated under x2 upscaling on an A100 GPU, with the output size set to 1280x720.

Overall improvements of HiT-SR

Convergence improvements of HiT-SR

More detailed results can be found in the paper. All visual results of can be downloaded here.

More results (click to expan)

Quantitative comparison

Local attribution map (LAM) comparison (more marked pixels indicate better information aggragation ability)

Qualitative comparison on challenging scenes

📎 Citation

If you find the code helpful in your research or work, please consider citing the following paper.

@inproceedings{zhang2024hitsr,
    title={HiT-SR: Hierarchical Transformer for Efficient Image Super-Resolution},
    author={Zhang, Xiang and Zhang, Yulun and Yu, Fisher},
    booktitle={ECCV},
    year={2024}
}

🏅 Acknowledgements

This project is built on DAT, SwinIR, NGramSwin, SRFormer, and BasicSR. Special thanks to their excellent works!

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
basicsr		basicsr
datasets		datasets
experiments		experiments
figs		figs
options		options
results		results
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
VERSION		VERSION
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hierarchical Transformer
for Efficient Image Super-Resolution

ECCV 2024 - Oral

[Paper] | [Supp] | [Video] | [🤗Hugging Face] | [Visual Results] | [Models]

📑 Contents

🔥 News

🛠️ Setup

💿 Datasets

🚀 Models

🏋 Training

🧪 Testing

Test with ground-truth images

Test without ground-truth images

📊 Results

📎 Citation

🏅 Acknowledgements

About

Releases

Packages

Contributors 2

Languages

License

XiangZ-0/HiT-SR

Folders and files

Latest commit

History

Repository files navigation

Hierarchical Transformer for Efficient Image Super-Resolution

ECCV 2024 - Oral

[Paper] | [Supp] | [Video] | [🤗Hugging Face] | [Visual Results] | [Models]

📑 Contents

🔥 News

🛠️ Setup

💿 Datasets

🚀 Models

🏋 Training

🧪 Testing

Test with ground-truth images

Test without ground-truth images

📊 Results

📎 Citation

🏅 Acknowledgements

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Hierarchical Transformer
for Efficient Image Super-Resolution

Packages