Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to understand TVLoss? #302

Open
EthanZhangYi opened this issue Aug 6, 2016 · 4 comments
Open

How to understand TVLoss? #302

EthanZhangYi opened this issue Aug 6, 2016 · 4 comments

Comments

@EthanZhangYi
Copy link

EthanZhangYi commented Aug 6, 2016

This is the first time that I use Torch and Lua. I read the CVPR paper Image Style Transfer Using Convolutional Neural Networks and code neural_style.lua of this repository. I can not understand the TVLoss module in the code. What is it used for? I do not find any description or discussion on the TVLoss in the CVPR paper. Only Content Loss and Style Loss are proposed in the paper. Could anyone give me some help?

@EthanZhangYi
Copy link
Author

@jcjohnson Could you please give me some help?

@jcjohnson
Copy link
Owner

The total variation (TV) loss encourages spatial smoothness in the generated image. It was not used by Gatys et al in their CVPR paper but it can sometimes improve the results; for more details and explanation see Mahendran and Vedaldi "Understanding Deep Image Representations by Inverting Them" CVPR 2015.

@EthanZhangYi
Copy link
Author

Thank you! @jcjohnson Your answer is so helpful for me.

@bmaltais
Copy link

bmaltais commented Aug 10, 2016

I ran some test where I produce the same style transfer using various tv values. I noted the total style loss and noticed that the smallest the loss for the style the better looking the resulting image was. Here are the results I got:

tv-value,iter 50, iter 100 ...
-------------------------------------------------------
0.000085, 29815,8702,3179,1257,552
0.0000850051,28868,8288,3101,1330,563
0.00008505,31854,8479,3080,1432,603
0.0000851,31620,8698,2954,1168,554
0.0000851035,31432,8940,3100,1308,566
0.0000855,31894,8308,3174,1295,584
0.000086, 33001,9533,3212,1334,607
0.0000875,27660,8669,3362,1370,614
0.00009,  29824,9244,3373,1456,603
0.00010,  30223,8553,3381,1361,634
0.0002000,29600,10388,3968,1844,868
0.00100,  48000,22791,14244,10523,7780

So based on my testing the best tv value was 0.000085

Results may vary based on src and dst images... but the nice thing is that you only need something like 250 iterations to pick the winner... so try 0.000085, 0.0001 or 0.0002 and see which is best for you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants