A Keras-based implementation of
- Wide Activation for Efficient and Accurate Image Super-Resolution (WDSR), winner of the NTIRE 2018 super-resolution challenge.
- Enhanced Deep Residual Networks for Single Image Super-Resolution (EDSR), winner of the NTIRE 2017 super-resolution challenge.
- Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network (SRGAN).
This projects also supports fine-tuning of EDSR models as generators in SRGAN-like networks.
On branch wip-tf2 EDSR, WDSR and SRGAN models have been migrated to Tensorflow 2.0. Jupyter notebooks are provided to demonstrate how to train these models. For SRGAN pre-trained models are available. A bug in the SRGAN discriminator has been fixed which significantly improves SRGAN results.
Weight normalization is now available as layer wrapper in Tensorflow Addons which
has been used to implement WDSR models. Also, DIV2K images don't need to be downloaded manually any more but are automatically
downloaded by a DIV2K
data provider.
Although branch wip-tf2 is work in progress model implementations are complete and can be trained as described in the corresponding papers with the provided Jupyter notebooks. Model and training code has been dramatically simplified, not only because of new features available in Tensorflow 2.0 and Tensorflow Addons.
- Environment setup
- Getting started
- Dataset
- Training
- Evaluation
- Pre-trained models
- JPEG compression
- Weight normalization
- Other implementations
- Limitations
On a system with a GPU create a new conda environment with *)
conda env create -f environment-gpu.yml
On a system without a GPU create an environment with
conda env create -f environment-cpu.yml
Activate the environment with
source activate sisr
*) It is assumed that appropriate CUDA and
cuDNN versions for the current tensorflow-gpu
version are already installed on your system. These libraries are not automatically installed when using environment-gpu.yml
.
This section uses pre-trained models to super-resolve images with factor x4.
Here, the pre-trained WDSR-A model wdsr-a-32-x4 is
used. Click on the link to download the model. It is an experimental model not described in the WDSR paper that was
trained with a pixel-wise loss function (mean absolute error). Assuming that the path to the downloaded model is
~/Downloads/wdsr-a-32-x4-psnr-29.1736.h5
, the following command super-resolves images in directory ./demo
with factor x4 and writes the results to directory ./output
:
python demo.py -i ./demo -o ./output --model ~/Downloads/wdsr-a-32-x4-psnr-29.1736.h5
The output
directory only contains the super-resolved images. Below are figures that additionally compare the
super-resolution (SR) results with the corresponding low-resolution (LR) and high-resolution (HR) images and an x4
resize with bicubic interpolation (code for generating these figures not included yet).
A problem with pixel-wise loss functions is that they fail to recover high-frequency details. Super-resolution results are typically overly smooth with lower perceptual quality, especially at scale x4. A perceptual loss as described in the SRGAN paper (a combination of a VGG-based content loss and an adversarial loss) is able to generate more realistic textures with higher perceptual quality but at the cost of lower PSNR values.
An EDSR baseline model that has been fine-tuned as generator in an SRGAN-like network can be downloaded from here.
Please note that support for SRGAN training is still work in progress. Assuming that the path
to the downloaded model is ~/Downloads/edsr-16-x4-gen-epoch-088.h5
, the following command super-resolves the image in
directory ./demo/gan
with factor x4 and writes the result to directory ./output
:
python demo.py -i ./demo/gan -o ./output --model ~/Downloads/edsr-16-x4-gen-epoch-088.h5
The output
directory only contains the super-resolved image. The following figure additionally compares the result
with that obtained from an EDSR model that has been trained with a pixel-wise loss only (mean squared error). One can
clearly see how training with a perceptual loss in a GAN improves recovery of high-frequency content.
If you want to train and evaluate models, you need to download the DIV2K dataset
and extract the downloaded archives to a directory of your choice (DIV2K
in the following example). The resulting
directory structure should look like:
DIV2K
DIV2K_train_HR
DIV2K_train_LR_bicubic
X2
X3
X4
DIV2K_train_LR_unknown
X2
X3
X4
DIV2K_valid_HR
DIV2K_valid_LR_bicubic
...
DIV2K_valid_LR_unknown
...
You only need to download DIV2K archives for those downgrade operators (unknown, bicubic) and super-resolution scales (x2, x3, x4) that you'll actually use for training.
Before the DIV2K images can be used they must be converted to numpy arrays and stored in a separate location. Conversion
to numpy arrays dramatically reduces image pre-processing times. Conversion can be done with the convert.py
script:
python convert.py -i ./DIV2K -o ./DIV2K_BIN numpy
In this example, converted images are written to the DIV2K_BIN
directory. By default, training and evaluation scripts
read from this directory which can be overriden with the --dataset
command line option.
WDSR, EDSR and SRResNet *) models can be trained with a pixel-wise loss function with train.py
.
Default for WDSR and EDSR is mean absolute error, for SRResNet it is mean squared error. For example, a WDSR-A
baseline model with 8 residual blocks can be trained for scale x2 with
python train.py --dataset ./DIV2K_BIN --outdir ./output --profile wdsr-a-8 --scale 2
The --dataset
option sets the location of the DIV2K dataset and the --output
option the output directory (defaults
to ./output
). Each training run creates a timestamped sub-directory in the specified output directory which contains
saved models, all command line options (default and user-defined) in an args.txt
file as well as
TensorBoard logs. The super-resolution factor is set with
the --scale
option. The downgrade operator can be set with the --downgrade
option. It defaults to bicubic
and can
be changed to unknown
, bicubic_jpeg_75
or bicubic_jpeg_90
(see also section JPEG compression).
By default, the model is validated against randomly cropped images from the DIV2K validation set. If you'd rather
want to evaluate the model against full-sized DIV2K validation images after each epoch you need to set the --benchmark
command line option. This however slows down training significantly and makes only sense for smaller models. Alternatively,
you can evaluate saved models later with evaluate.py
as described in the section Evaluation.
To train models for higher scales (x3 or x4) it is possible to re-use the weights of models pre-trained for a smaller
scale (x2). This can be done with the --pretrained-model
option. For example,
python train.py --dataset ./DIV2K_BIN --outdir ./output --profile wdsr-a-8 --scale 4 \
--pretrained-model ./output/20181016-063620/models/epoch-294-psnr-34.5394.h5
trains a WDSR-A baseline model with 8 residual blocks for scale x4 re-using the weights of model epoch-294-psnr-34.5394.h5
,
a WDSR-A baseline model with the same number of residual blocks trained for scale x2.
For a more detailed overview of available command line options and profiles take a look at train.py
or run
python train.py -h
. Section Pre-trained models also shows the training command for each available
pre-trained model.
*) SRResNet is the super-resolution model used in the SRGAN paper.
Training with a perceptual loss as described in the SRGAN paper requires a model that has been pre-trained with a pixel-wise loss. At the moment, only SRResNet and EDSR models at scale x4 are supported for SRGAN training. For example, SRResNet can be pre-trained with
python train.py --dataset ./DIV2K_BIN --profile sr-resnet
An EDSR baseline model that can be used as generator in an SRGAN-like network can be pre-trained with
python train.py --dataset ./DIV2K_BIN --profile edsr-gen --scale 4 --num-res-blocks 16
Selected models from pre-training can then be used as starting point for SRGAN training. For example,
python train_gan.py --dataset ./DIV2K_BIN --generator sr-resnet --label-noise 0.0 \
--pretrained-model <path-to-pretrained-model>
starts SRGAN training as described in the SRGAN paper using a VGG54 content loss and SRResNet as generator whereas
python train_gan.py --dataset ./DIV2K_BIN --generator edsr-gen --scale 4 --num-res-blocks 16 \
--pretrained-model <path-to-pretrained-model>
uses an EDSR baseline model with 16 residual blocks as generator. SRGAN training is still work in progress.
An alternative to the --benchmark
training option is to evaluate saved models with evaluate.py
and then select the
model with the highest PSNR. For example,
python evaluate.py --dataset ./DIV2K_BIN -i ./output/20181016-063620/models -o eval.json
evaluates all models in directory ./output/20181016-063620/models
and writes the results to eval.json
. This JSON
file maps model filenames to PSNR values. The evaluate.py
script also writes the model with the best PSNR to stdout
at the end of evaluation:
Best PSNR = 34.5394 for model ./output/20181016-063620/models/epoch-294-psnr-37.4630.h5
The higher PSNR value in the model filename must not be confused with the value generated by evaluate.py
. The PSNR value
in the filename was generated during training by validating against smaller, randomly cropped images which tends to yield
higher PSNR values.
The following list contains available pre-trained models. They were trained with images 1-800 from the DIV2K training set using the specified downgrade operator. Random crops and transformations were made as described in the EDSR paper. Model performance is measured in dB PSNR on the DIV2K validation set (images 801-900, RGB channels, without self-ensemble).
Model | Scale | Residual blocks |
Downgrade | Parameters | PSNR | Training |
---|---|---|---|---|---|---|
edsr-16-x2 1) | x2 | 16 | bicubic | 1.37M | 34.64 dB | commandpython train.py --profile edsr-16 \ |
edsr-16-x4-gen-pre 2) | x4 | 16 | bicubic | 1.52M | 28.89 dB | commandpython train.py --profile edsr-gen \ |
edsr-16-x4-gen 3) | x4 | 16 | bicubic | 1.52M | - | commandpython train_gan.py --generator edsr-gen \ |
wdsr-a-16-x2 4) | x2 | 16 | bicubic | 1.19M | 34.68 dB | commandpython train.py --profile wdsr-a-16 \ |
wdsr-a-32-x2 5) | x2 | 32 | bicubic | 3.55M | 34.80 dB | commandpython train.py --profile wdsr-a-32 \ |
wdsr-a-32-x4 5) | x4 | 32 | bicubic | 3.56M | 29.17 dB | commandpython train.py --profile wdsr-a-32 \ |
wdsr-b-32-x2 6) | x2 | 32 | bicubic | 0.59M | 34.63 dB | commandpython train.py --profile wdsr-b-32 \ |
1) EDSR baseline, see also EDSR project page.
2) EDSR baseline pre-trained for usage as generator in an SRGAN-like network.
3) EDSR baseline fine-tuned as generator in an SRGAN-like network.
4) WDSR baseline, see also WDSR project page.
5) Experimental WDSR-A models trained with an expansion ratio of 6 (default is 4).
6) Experimental WDSR-B model.
There is experimental support for adding JPEG compression artifacts to LR images and training with compressed images.
The following commands convert bicubic downscaled DIV2K training and validation images to JPEG images with quality 90
:
python convert.py -i ./DIV2K/DIV2K_train_LR_bicubic \
-o ./DIV2K/DIV2K_train_LR_bicubic_jpeg_90 \
--jpeg-quality 90 jpeg
python convert.py -i ./DIV2K/DIV2K_valid_LR_bicubic \
-o ./DIV2K/DIV2K_valid_LR_bicubic_jpeg_90 \
--jpeg-quality 90 jpeg
After having converted these JPEG images to numpy arrays, as described in section Dataset, models can be
trained with the --downgrade bicubic_jpeg_90
option to additionally learn to recover from JPEG compression artifacts.
Two models trained in that manner are available as pre-trained models:
Model | Scale | Residual blocks |
Downgrade | Parameters | PSNR | Training |
---|---|---|---|---|---|---|
wdsr-a-32-x2-q90 | x2 | 32 | bicubic + JPEG | 3.55M | 32.12 dB | commandpython train.py --profile wdsr-a-32 \ |
wdsr-a-32-x4-q90 | x4 | 32 | bicubic + JPEG | 3.56M | 27.63 dB | commandpython train.py --profile wdsr-a-32 \ |
WDSR models are trained with weight normalization. This branch uses a modified Adam optimizer
for that purpose. The meanwhile outdated branch wip-conv2d-weight-norm
instead uses a specialized Conv2DWeightNorm
layer and a default Adam optimizer (experimental work inspired by the official WDSR Tensorflow
port). Current plan is to replace this layer with a default Conv2D
layer and a Tensorflow weight normalization wrapper
when the wrapper becomes officially available.
- Official PyTorch implementation
- Official Torch implementation
- Tensorflow implementation by Josh Miller.
Code in this project requires the Keras Tensorflow backend.