-
Notifications
You must be signed in to change notification settings - Fork 19.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Need help with Keras neural doodle - preliminary results for improvements #3705
Comments
Looks very cool. I don't have much in the way of practical suggestions, but maybe modulating the contributions of the 3 losses with a weighted average could help achieve different results? As for the differences with the Torch implementation: results look pretty similar to me. Maybe simply changing the range or distribution of the random inputs you start with would fix it. |
@dolaameng Really cool implementation! I made a few modifications to your code (a few improvements from the paper Improving the Neural Algorithm of Artistic Style), and made a few changes here and there. The results seem slightly better, although more iterations will definitely produce better results. A few of the improvements are :
The images generated are a tad bit sharper. Also, using such a large TV regularization weight is fine since you can apply a sharpen filter using imfilter later to sharpen images if needed. The below is without sharpening filter applied and 100 epochs. When using guided style transfer on the Renoit, initialize the image with the content image itself rather than with noise input. The output is far sharper, although the content weight must be altered to adjust for this initialization (I went with 0.1 for content weight). Note that this image is after 60 iterations, and it was still getting roughly 3-4 % improvement in loss every epoch. Larger number of iterations would produce better images. I found that using high TV regularization weight when provided with a content image to be detrimental to the final output. Instead going the opposite directions and using a TV weight of 8.5e-5 produced the best result - in terms of how visually pleasing the output image is as well as the absolute loss value. This tv weight value was found via cross validation on the original neural style script, but works here as well. Link to the discussion |
@titu1994 : Thanks for your comments! (But I didn't find the link to your modified codes. Appreciate it if you can re-share them). I have tried some of your comments:
@fchollet : I don't think I have more ideas to try for this example for now, unless getting more comments later. Do you think we can create a PR based on this version? I will tidy up the codes and documentations. |
@dolaameng As to point 4, I believe that most implementations of style loss always use MSE (l2 norm). All of the papers I have read related to style transfer state using MSE between gram matrix of style features and gram matrix of output of the jth layer. So I feel that MSE should be used. As to points 1 and 3, using more layers always comes at the cost of additional time sadly. I too saw a 1.4x increased running time. It is not useful using conv5_2 without a stronger content weight as you have noticed, that's why I went with 4x the normal content weight (0.1). As to the tv loss comment, I meant that when I used other tv loss (in the range 1e-3 to 1e-8), I found that 8.5e-5 produced the smallest absolute loss value after 100 epochs, similar to the discussion. This was however done on the original neural style transfer script, and then later reproduced on my modified version of that script. I detailed some of these test results in my guide for the script here (see the "Tips for Total Variation Regularization" section. The results are tested on various images using a large variety of styles, so I think it should still apply to this script. |
Thanks @titu1994 will read it up! |
Made the code compatible with both 'th' and 'tf' image_dim_ordering. PR #3724 created. |
@titu1994 @dolaameng Regarding my original TVloss observations I can say that reducing TVloss value to something really small (like 0.00001) will usually increase style loss over 0.000085. So reducing TVloss will not always result is less style loss... but it is not always the cases. I noticed that when attempting to do "super resolution" using the content image as the style image applied over the same content image with twice the resolution using a TVloss of 0 was producing the best results over the typical 0.000085. Food for thoughts. |
I was trying to modify the neural_style_transfer example for an neural doodle implementation.
There are original Torch implementations here and here. My Keras implementation can be found here.
Some results on the Monet and Renoir examples can be found below.
The top row are the inputs to the algorithm, and the bottom row are doodle results from original Torch implementation v.s. Keras implementation(60 iterations). There are some differences, e.g., the surface of water? It is probably because I used a very heavy weight for total_variation_loss (5000!), otherwise there seems to be some salt-pepper noise in the generated image.
Slight differences between original Torch and Keras implementationsare also observed in the renoir example, where a content image is used to help generation, besides style image and doodle (target_mask). But both doodle results are better than results from neural_style_transfer example, e.g., in drawing the clean sky.
My implementation is using VGG19 for images and a series of AveragePooling for masks. The image is generated by minimizing the combination of
content_loss
,style_loss
, andtotal_variation_loss
. It's very similar to the Torch versions but with some differences based on my understanding - I am still struggling with reading the Torch codes...I appreciate it if someone can help read the code, explain the difference with the Torch and suggest potential improvements. Thanks!
The text was updated successfully, but these errors were encountered: