Below are the network architectures and, in general, training strategies supported in this repository. Note that it is possible to combine ideas from all those below, from training strategy to even modifying the networks with components from different networks. They can serve as a baseline for your experiments as well.
-
SRGAN (2017), that originally uses the
SRResNet
network, and introduced the idea of using a generative adversarial network (GAN
) for image superresolution (SR
). It was the first framework capable of inferring photo-realistic natural images for 4× with a loss function which consists of an adversarial loss (GAN), a feature loss (using a pretrained VGG classification network) and a content (pixel) loss. -
Enhanced SRGAN (2018). Enhanced SRGAN achieves consistently better visual quality with more realistic and natural textures than
SRGAN
and won the first place in the PIRM2018-SR Challenge. Originally uses theRRDB
network. ESRGAN remains until today (2021) as the base for many projects and research papers that continuing building upon it. For more details, please refer to ESRGAN repo.
-
ESRGAN+ Repo (2020). A follow up paper that introduced two main changes to ESRGAN's
RRDB
network and can be enabled with the network optionsplus
andgaussian
. -
SFTGAN (2018). Adopts Spatial Feature Transform (SFT) to incorporate other conditions/priors, like semantic prior for image
SR
, representing by segmentation probability maps. For more details, please refer to SFTGAN repo.
- PPON (2019). The model and training strategy for "Progressive Perception-Oriented Network for Single Image Super-Resolution", which the authors compare favorably against ESRGAN. Training is done progressively, by freezing and unfreezing layers in phases, which are: Content Reconstruction, Structure Reconstruction and Perceptual Reconstruction. For more details, please refer to PPON repo.
- PAN (2020). Pixel Attention Network for Efficient Image Super-Resolution. Aims at designing a lightweight network for image super resolution (
SR
) that can potentially be used in real-time. More details in PAN repo.
-
The Consistency Enforcing Module (CEM) module from Explorable-Super-Resolution (2020). Can be used to wrap any network (during training or testing) around a module that has no trainable parameters, but enforces results to be consistent with the
LR
images, instead of just theHR
images as is the common case. More information on CEM here. Note that the rest of the explorableSR
framework is TBD, but is available in the ESR repo. -
SRFlow (2020). Repo. Aims at fixing one common pitfall of other frameworks, in that the results of the models are deterministic. SRFlow proposes using a normalizing flow (based on GLOW) which allows the network to learn the conditional distribution of the output given the low-resolution input. It doesn't require the
GAN
formulation and can be trained using only the Negative Log Likelihood (NLL
). In this repo, it has also been modified to use any of the regular losses on the deterministic version of the super-resolved image. Check how to train for more details.
In addition, since they are based on ESRGAN
and don't modify the general training strategy or the network architecture, but only the data used for training, Real-SR (2020), BSRGAN (2021) and Real-ESRGAN (2021) are supported. Real-SR
by means of the realistic kernels and noise injection from image patches and BSRGAN
and Real-ESRGAN
through the on the fly augmentations pipeline. More information in the augmentations document. These strategies can be combined with any of the networks above.
- pix2pix (2017) Image-to-Image Translation with Conditional Adversarial Networks. Uses the conditional GANs formulation as a general-purpose solution to image-to-image translation problems when paired images are available, in a way that doesn't require hand-engineered mapping functions or losses. More information in how to train, the Pix2pix Pytorch repo and the project page.
- CycleGAN (2017) Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. Different to previous approaches,
CycleGAN
was one of the first works to use an approach for learning to translate an image from a source domain A to a target domain B in the absence of paired examples. More information in how to train, the CycleGAN Pytorch repo and the project page.
- WBC (2020) Learning to Cartoonize Using White-box Cartoon Representations. Unlike the black-box strategies like
Pix2pix
andCycleGAN
use, white-box cartoonization (WBC
) is designed to use domain knowledge about how cartoons (anime) are made and decomposes the training task in image representations that correspond to the cartoon images workflow, each with different objectives. In general, the representations are: smooth surfaces (surface
), sparse color blocks (structure
) and contours and fine textures (texture
). LikeCycleGAN
, it uses unpaired images and by tuning the scale of each representation, as well as the scale of the guided filter, different results can be obtained. More information in how to train. You can build your own datasets, but for reference the ones used byWBC
are:- landscape photos: the photos for the style transfer
CycleGAN
dataset (6227
). - landscape cartoon: frames extracted and cropped from Miyazaki Hayao (
3617
), Hosoda Mamoru (5107
) and Shinkai Makoto (5891
) films. - face photos: FFHQ photos (
#00000-10000
). - face cartoon: faces extracted from works by PA Works (
5000
) and Kyoto Animation (5000
).
- landscape photos: the photos for the style transfer
Important: Video network training can be considered fully functional, but experimental, with an overhaul to the pipeline pending for now (Help welcomed).
- SOFVSR (2020) Deep Video Super-Resolution using HR Optical Flow Estimation. Instead of the usual strategy of estimating optical flow for temporal consistency in the low-resolution domain, SOFVSR does so at the high-resolution level to prevent inconsistencies between low-resolution flows and high-resolution frames. This network has been modifified in this repo to also work with an
ESRGAN
network in the super-resolution step, as well as using 3 channel images as input, but requires more testing. More information in the SOFVSR repo.
- EVSRGAN Video ESRGAN and SR3D networks, inspired by the paper 3DSRnet: "Video Super-resolution using 3D Convolutional Neural Networks".
EVSRGAN
uses the regularESRGAN
network as backbone, but modifies it with 3D Convolutions to account for the time dimension, whileSR3D
more closely resembles the network proposed in3DSRnet
. Require more testing.
- EDVR (2019): Video Restoration with Enhanced Deformable Convolutional Networks. Uses deformable convolutions to align frames at a feature level, instead of explicitly estimating optical flow. More information in project page.
- DVD (2017) Real-time Deep Video Deinterlacing, implemented for the specific case of efficient video de-interlacing.
- Initial integration of RIFE (2020). Combining all 3 separate model files in a single structure. RIFE repo. (Training not yet available, pending for video pipeline overhaul).
@misc{traiNNer,
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/victorca25/traiNNer}}
}
@InProceedings{wang2018esrgan,
author = {Wang, Xintao and Yu, Ke and Wu, Shixiang and Gu, Jinjin and Liu, Yihao and Dong, Chao and Qiao, Yu and Loy, Chen Change},
title = {ESRGAN: Enhanced super-resolution generative adversarial networks},
booktitle = {The European Conference on Computer Vision Workshops (ECCVW)},
month = {September},
year = {2018}
}
@InProceedings{wang2018sftgan,
author = {Wang, Xintao and Yu, Ke and Dong, Chao and Loy, Chen Change},
title = {Recovering realistic texture in image super-resolution by deep spatial feature transform},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2018}
}
@article{Hui-PPON-2019,
title={Progressive Perception-Oriented Network for Single Image Super-Resolution},
author={Hui, Zheng and Li, Jie and Gao, Xinbo and Wang, Xiumei},
booktitle={arXiv:1907.10399v1},
year={2019}
}
@InProceedings{Liu2019abpn,
author = {Liu, Zhi-Song and Wang, Li-Wen and Li, Chu-Tak and Siu, Wan-Chi},
title = {Image Super-Resolution via Attention based Back Projection Networks},
booktitle = {IEEE International Conference on Computer Vision Workshop(ICCVW)},
month = {October},
year = {2019}
}
@inproceedings{bahat2020explorable,
title={Explorable Super Resolution},
author={Bahat, Yuval and Michaeli, Tomer},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={2716--2725},
year={2020}
}
@inproceedings{lugmayr2020srflow,
title={SRFlow: Learning the Super-Resolution Space with Normalizing Flow},
author={Lugmayr, Andreas and Danelljan, Martin and Van Gool, Luc and Timofte, Radu},
booktitle={ECCV},
year={2020}
}
@inproceedings{zhang2021designing,
title={Designing a Practical Degradation Model for Deep Blind Image Super-Resolution},
author={Zhang, Kai and Liang, Jingyun and Van Gool, Luc and Timofte, Radu},
booktitle={arxiv},
year={2021}
}
@Article{wang2021realesrgan,
title={Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data},
author={Xintao Wang and Liangbin Xie and Chao Dong and Ying Shan},
journal={arXiv:2107.10833},
year={2021}
}
@inproceedings{CycleGAN2017,
title={Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networkss},
author={Zhu, Jun-Yan and Park, Taesung and Isola, Phillip and Efros, Alexei A},
booktitle={Computer Vision (ICCV), 2017 IEEE International Conference on},
year={2017}
}
@inproceedings{isola2017image,
title={Image-to-Image Translation with Conditional Adversarial Networks},
author={Isola, Phillip and Zhu, Jun-Yan and Zhou, Tinghui and Efros, Alexei A},
booktitle={Computer Vision and Pattern Recognition (CVPR), 2017 IEEE Conference on},
year={2017}
}
@InProceedings{Wang_2020_CVPR,
author = {Wang, Xinrui and Yu, Jinze},
title = {Learning to Cartoonize Using White-Box Cartoon Representations},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}
@Article{Wang2020tip,
author = {Longguang Wang and Yulan Guo and Li Liu and Zaiping Lin and Xinpu Deng and Wei An},
title = {Deep Video Super-Resolution using {HR} Optical Flow Estimation},
journal = {{IEEE} Transactions on Image Processing},
year = {2020},
}
@InProceedings{wang2019edvr,
author = {Wang, Xintao and Chan, Kelvin C.K. and Yu, Ke and Dong, Chao and Loy, Chen Change},
title = {EDVR: Video Restoration with Enhanced Deformable Convolutional Networks},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
month = {June},
year = {2019}
}
@article{zhu2017real,
title={Real-time Deep Video Deinterlacing},
author={Zhu, Haichao and Liu, Xueting and Mao, Xiangyu and Wong, Tien-Tsin},
journal={arXiv preprint arXiv:1708.00187},
year={2017}
}
@article{huang2020rife,
title={RIFE: Real-Time Intermediate Flow Estimation for Video Frame Interpolation},
author={Huang, Zhewei and Zhang, Tianyuan and Heng, Wen and Shi, Boxin and Zhou, Shuchang},
journal={arXiv preprint arXiv:2011.06294},
year={2020}
}