An implementation of fast-neural-style in PyTorch! Style Transfer learns the aesthetic style of a style image
, usually an art work, and applies it on another content image
. This repository contains codes the can be used for:
- fast
image-to-image
aesthetic style transfer, image-to-video
aesthetic style transfer, and for- training
style-learning
transformation network
This implemention follows the style transfer approach outlined in Perceptual Losses for Real-Time Style Transfer and Super-Resolution paper by Justin Johnson, Alexandre Alahi, and Fei-Fei Li, along with the supplementary paper detailing the exact model architecture of the mentioned paper. The idea is to train a separate feed-forward neural network (called Transformation Network) to transform/stylize
an image and use backpropagation to learn its parameters, instead of directly manipulating the pixels of the generated image as discussed in A Neural Algorithm of Artistic Style aka neural-style paper by Leon A. Gatys, Alexander S. Ecker, and Matthias Bethge. The use of feed-forward transformation network allows for fast stylization of images, around 1000x faster than neural style.
This implementation made some modifications in Johnson et. al.'s proposed architecture, particularly:
- The use of
reflection padding in every Convolutional Layer
, instead of big single reflection padding before the first convolution layer Ditching of the Tanh output
. The generated image are the raw outputs of the convolutional layer. While the Tanh model produces visually pleasing results, the model fails to transfer the vibrant and loud colors of the style image (i.e. generated images are usually darker). This however makes for a goodretro style effect
.- Use of
Instance Normalization
, instead of Batch Normalization after Convolutional and Deconvolutional layers, as discussed in Instance Normalization: The Missing Ingredient for Fast Stylization paper by Dmitry Ulyanov, Andrea Vedaldi, and Victor Lempitsky.
The original caffe pretrained weights of VGG16 were used for this implementation, instead of the pretrained VGG16's in PyTorch's model zoo.
It took about 1.5 seconds for a GTX 1060 to stylize University of the Philippines Diliman - Oblation (1400×936) by LeAnne Jazul/Rappler. From Top to Right: Udnie Style, Mosaic Style, Tokyo Ghoul Style, Original Picture, Udnie Style with Original Color Preservation
It took 6 minutes and 43 seconds to stylize a 2:11 minute-24 fps-1280x720 video on a GTX 1080 Ti.
More videos in this Youtube playlist. Unfortunately, Youtube's compression isn't friendly with style transfer videos, possibily because each frame is shaky with respect to its adjacent frames, hence obvious loss in video quality. Raw and lossless output video can be downloaded in my
Dropbox folder, or Gdrive Folder
webcam.py can output 1280x720 videos at a rate of at least 4-5 frames per second on a GTX 1060.
Most of the codes here assume that the user have access to CUDA capable GPU, at least a GTX 1050 ti or a GTX 1060
- Pre-trained VGG16 network weights - put it in
models/
directory - MS-COCO Train Images (2014) - 13GB - put
train2014
directory indataset/
directory - torchvision -
torchvision.models
contains the VGG16 and VGG19 model skeleton
- PyTorch
- opencv2
- NumPy
- FFmpeg (Optional) - Installation Instruction here
All arguments, parameters and options are hardcoded
inside these 5 python files. Before using the codes, please arrange your files and folders as defined below.
train.py
: trains the transformation network that learns the style of the style image
. Each model in transforms
folder was trained for roughly 23 minutes, with single pass (1 epoch) of 40,000 training images, and a batch size of 4, on a GTX 1080 Ti.
python train.py
Options
TRAIN_IMAGE_SIZE
: sets the dimension (height and weight) of training images. Bigger GPU memory is needed to train with larger images. Default is256
px.DATASET_PATH
: folder containing the MS-COCOtrain2014
images. Default is"dataset"
NUM_EPOCHS
: Number of epochs of training pass. Default is1
with 40,000 training imagesSTYLE_IMAGE_PATH
: path of the style imageBATCH_SIZE
: training batch size. Default is 4CONTENT_WEIGHT
: Multiplier weight of the loss between content representations and the generated image. Default is8
STYLE_WEIGHT
: Multiplier weight of the loss between style representations and the generated image. Default is50
ADAM_LR
: learning rate of the adam optimizer. Default is0.001
SAVE_MODEL_PATH
: path of pretrained-model weights and transformation network checkpoint files. Default is"models/"
SAVE_IMAGE_PATH
: save path of sample tranformed training images. Default is"images/out/"
SAVE_MODEL_EVERY
: Frequency of saving of checkpoint and sample transformed images. 1 iteration is defined as 1 batch pass. Default is500
with batch size of4
, that is 2,000 imagesSEED
: Random seed to keep the training variations as little as possible
transformer.py
: contains the architecture definition of the trasnformation network. It includes 2 models, TransformerNetwork()
and TransformerNetworkTanh()
. TransformerNetwork
doesn't have an extra output layer, while TransformerNetworkTanh
, as the name implies, has for its output, a Tanh layer and a default output multiplier of 150
. TransformerNetwork
faithfully copies the style and colorization of the style image, while Tanh model produces images with darker color; which brings a retro style effect
.
Options
norm
: sets the normalization layer to either Instance Normalization"instance"
or Batch Normalization"batch"
. Default is"instance"
tanh_multiplier
: output multiplier of the Tanh model. The bigger the number, the bright the image. Default is150
experimental.py
: contains the model definitions of the experimental transformer network architectures. These experimental transformer networks largely borrowed ideas from the papers Aggregated Residual Transformations for Deep Neural Networks or more commonly known as ResNeXt
, and Densely Connected Convolutional Networks or more commonly known as DenseNet
. These experimental networks are designed to be lightweight, with the goal of minimizing the compute and memory needed for better real-time performance.
See table below for the comparison of different transformer networks.
See transforms folder for some pretrained weights. For more pretrained weights, see my Gdrive or Dropbox.
stylize.py
: Loads a pre-trained transformer network weight and applies style (1) to a content image or (2) to the images inside a folder
python stylize.py
Options
STYLE_TRANSFORM_PATH
: path of the pre-trained weights of the the transformation network. Sample pre-trained weights are availabe intransforms
folder, including their implementation parameters.PRESERVER_COLOR
: set toTrue
if you want to preserve the original image's color after applying style transfer. Default value isFalse
video.py
: Extracts all frames of a video, apply fast style transfer on each frames, and combine the styled frames into an output video. The output video doesn't retain the original audio. Optionally, you may use FFmpeg to merge the output video and the original video's audio.
python video.py
Options
VIDEO_NAME
: path of the original videoFRAME_SAVE_PATH
: parent folder of the save path of the extracted original video frames. Default is"frames/"
FRAME_CONTENT_FOLDER
: folder of the save path of the extracted original video frames. Default is"content_folder/"
FRAME_BASE_FILE_NAME
: base file name of the extracted original video frames. Default is"frame"
FRAME_BASE_FILE_TYPE
: save image file time ".jpg"STYLE_FRAME_SAVE_PATH
: path of the styled frames. Default is"style_frames/"
STYLE_VIDEO_NAME
: name(or save path) of the output styled video. Default is"helloworld.mp4"
STYLE_PATH
: pretrained weight of the style of the transformation network to use for video style transfer. Default is"transforms/aggressive.pth"
BATCH_SIZE
: batch size of stylization of extracted original video frames. A 1080ti 11GB can handle a batch size of 20 for 720p videos, and 80 for a 480p videos. Dafult is1
USE_FFMPEG
(Optional): Set toTrue
if you want to use FFmpeg in extracting the original video's audio and encoding the styled video with the original audio.
webcam.py
: Captures and saves webcam output image, perform style transfer, and again saves a styled image. Reads the styled image and show in window.
python webcam.py
Options
STYLE_TRANSFORM_PATH
: pretrained weight of the style of the transformation network to use for video style transfer. Default is"transforms/aggressive.pth"
WIDTH
: width of the webcam output window. Default is1280
HEIGHT
: height of the webcam output window. Default is720
master_folder
~ dataset
~ train2014
coco*.jpg
...
~ frames
~ content_folder
frame*.jpg
...
~ images
~ out
*.jpg
*.jpg
~ models
*.pth
~ style_frames
frames*.jpg
~ transforms
*.pth
*.py
Network | size (Kb) | no. of parameters | final loss (million) |
---|---|---|---|
transformer/TransformerNetwork | 6,573 | 1,679,235 | 9.88 |
experimental/TransformerNetworkDenseNet | 1,064 | 269,731 | 11.37 |
experimental/TransformerNetworkUNetDenseNetResNet | 1,062 | 269,536 | 12.32 |
experimental/TransformerNetworkV2 | 6,573 | 1,679,235 | 10.05 |
experimental/TransformerResNextNetwork | 1,857 | 470,915 | 10.31 |
experimental/TransformerResNextNetwork_Pruned(0.3) | 44 | 8,229 | 19.29 |
experimental/TransformerResNextNetwork_Pruned(1.0) | 260 | 63,459 | 12.72 |
TransformerResNextNetwork
and TransformerResNextNetwork_Pruned(1.0)
provides the best tradeoff between compute, memory size, and performance.
- FFmpeg support for encoding videos with video style transfer
Color-preserving Real-time Style TransferWebcam demo of fast-neural-style- Web-app deployment of fast-neural-style (ONNX)
@misc{rusty2018faststyletransfer,
author = {Rusty Mina},
title = {fast-neural-style: Fast Style Transfer in Pytorch!},
year = {2018},
howpublished = {\url{https://github.com/iamRusty/fast-neural-style-pytorch}},
note = {commit xxxxxxx}
}
This implementation borrowed some implementation details from:
- Justin Johnson's fast-neural-style in Torch, and
- the PyTorch Team's PyTorch Examples: fast-neural-style
- This repository also borrows some markdown formatting, as well as license description from Logan Engstrom's fast-style-transfer in Tensorflow
- Neural Style in PyTorch - PyTorch implementation of the original A Neural Algorithm of Artistic Style aka neural-style paper by Gatys et. al.
Copyright (c) 2018 Rusty Mina. Free for academic or research use, as long as proper attribution is given and this copyright notice is retained.