Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kindle-send breaks on svg images #28

Closed
volkerwestphal opened this issue Jul 21, 2023 · 10 comments
Closed

Kindle-send breaks on svg images #28

volkerwestphal opened this issue Jul 21, 2023 · 10 comments

Comments

@volkerwestphal
Copy link

Describe the bug
Kindle-send breaks the download if the webpage includes a .svg image.

To Reproduce
Steps to reproduce the behavior:
kindle-send download https://michael.stapelberg.ch/posts/2022-07-02-rsync-how-does-it-work

results in

SKIPPING https://michael.stapelberg.ch/posts/2022-07-02-rsync-how-does-it-work : Error retrieving "https://michael.stapelberg.ch/posts/2022-05-29-rsync-logical-view.svg" from source: ....
cannot get file, bad return code ...  missing data prefix

Expected behavior
In an ideal world, kindle-send would convert the svg into a supported image format and deliver the epub.
In the real world, it's sufficient to skip the image and continue the job.

Versions

  • OS: Windows 11
  • Version: 1.0.3, v2.0.0-rc-1

Nikhil and Mattias, thank you for creating kindle-send.

@przemekd
Copy link
Contributor

@volkerwestphal I can see the root cause of this error is a redirection (from https://michael.stapelberg.ch/posts/2022-07-02-rsync-how-does-it-work to https://michael.stapelberg.ch/posts/2022-07-02-rsync-how-does-it-work/, notice the slash at the end after the redirection). If you use kindle-send download https://michael.stapelberg.ch/posts/2022-07-02-rsync-how-does-it-work/ command instead downloading that image works.

A separate problem is that on many e-book readers there is only a limited set of media files supported. Related issue is #27.

@nikhil1raghav do you think kindle-send should have a capability to convert svg, webp, etc. files to more common formats? Or maybe that's something that https://github.com/bmaupin/go-epub should support?

@volkerwestphal
Copy link
Author

Well, redirects are daily business on the web. These are fairly easy to handle and should not disturb any robust program.

Regarding the limited media type support, I think it's unreasonable to expect a small tool like kindle-send to support a myriad of formats. The basic formats are fine and cover the majority of content. However, kindle-send should not break when it comes across unsupported media. Simply skip over it and proceed.

What's the point? While surfing (mostly HN) I often stumble upon long and interesting articles on the web. I put these links in a file and carry on. Once in a while I start kindle-send -linkfile ... to create a compilation epub of these article. I don't care if a single article is missing a fancy artwork. But it bugs if kindle-send stops working the list because of a single link.

@przemekd
Copy link
Contributor

Well, redirects are daily business on the web. These are fairly easy to handle and should not disturb any robust program.

Yep, I fully agree here. But they do make some images not available in the final documents. I've checked the code and it seems that go-readability is here to blame. Let me create an issue on their repo.

I don't care if a single article is missing a fancy artwork.

Sure, but would you accept a PR to add an optional conversion capabilities to allow Kindle users to see more images in their articles prepared and sent by kindle-send? ;)

@volkerwestphal
Copy link
Author

would you accept a PR to add an optional conversion capabilities ...

This is something to ask @nikhil1raghav.

Adding more image formats doesn't solve the actual problem. It means adding code that is only rarely used. You never get to support all image formats out there.

However, updating kindle-send to handle unknown media in a graceful way sets it safe for years to come. You can always add important formats later if the necessity arises.

@przemekd
Copy link
Contributor

Sure, I meant to tag @nikhil1raghav

BTW the last release does exactly that:

In the real world, it's sufficient to skip the image and continue the job.

In my case despite the error the output file is produced.

kindle-send download https://michael.stapelberg.ch/posts/2022-07-02-rsync-how-does-it-work
Loaded configuration
Fetched https://michael.stapelberg.ch/posts/2022-07-02-rsync-how-does-it-work --> rsync, article 3: How does rsync work?
No title supplied, inheriting title of first readable article : rsync, article 3: How does rsync work? 
Embedding images in  rsync, article 3: How does rsync work?
Downloading Images
Couldn't add image https://michael.stapelberg.ch/posts/2022-05-29-rsync-logical-view.svg : Error retrieving "https://michael.stapelberg.ch/posts/2022-05-29-rsync-logical-view.svg" from source: Error retrieving "https://michael.stapelberg.ch/posts/2022-05-29-rsync-logical-view.svg" from source: 
 stat https://michael.stapelberg.ch/posts/2022-05-29-rsync-logical-view.svg: no such file or directory
 cannot get file, bad return code
 missing data prefix
Downloading Images
Downloaded image https://michael.stapelberg.ch/posts/2022-07-02-rsync-how-does-it-work/2022-05-29-rsync-exo1-backup_hua2f50278895cfbee4dc18c7ea60b6d4a_2093260_600x0_resize_q75_box.jpg
Setting img src from https://michael.stapelberg.ch/posts/2022-07-02-rsync-how-does-it-work/2022-05-29-rsync-exo1-backup_hua2f50278895cfbee4dc18c7ea60b6d4a_2093260_600x0_resize_q75_box.jpg to ../images/img8984686547664710865.png 
Added 1 articles
Downloaded 1 files :
1. rsync, article 3: How does rsync work?.epub

@volkerwestphal
Copy link
Author

Also did a retest, still using 2.0.0-rc1, still on Windows.
No output file is produced:

C:\Users\....\Kindlesend>kindle-send-2.0.0-rc1.exe download https://michael.stapelberg.ch/posts/2022-07-02-rsync-how-does-it-work
Loaded configuration
Fetched https://michael.stapelberg.ch/posts/2022-07-02-rsync-how-does-it-work --> rsync, article 3: How does rsync work?
No title supplied, inheriting title of first readable article : rsync, article 3: How does rsync work?
Embedding images in  rsync, article 3: How does rsync work?
Downloading Images
Downloaded image https://michael.stapelberg.ch/posts/2022-05-29-rsync-logical-view.svg
Downloading Images
Downloaded image https://michael.stapelberg.ch/posts/2022-07-02-rsync-how-does-it-work/2022-05-29-rsync-exo1-backup_hua2f50278895cfbee4dc18c7ea60b6d4a_2093260_600x0_resize_q75_box.jpg
Setting img src from https://michael.stapelberg.ch/posts/2022-05-29-rsync-logical-view.svg to ../images/img2669478670900678324.png
Setting img src from https://michael.stapelberg.ch/posts/2022-07-02-rsync-how-does-it-work/2022-05-29-rsync-exo1-backup_hua2f50278895cfbee4dc18c7ea60b6d4a_2093260_600x0_resize_q75_box.jpg to ../images/img8984686547664710865.png
Added 1 articles
SKIPPING https://michael.stapelberg.ch/posts/2022-07-02-rsync-how-does-it-work : Error retrieving "https://michael.stapelberg.ch/posts/2022-05-29-rsync-logical-view.svg" from source:
 open https://michael.stapelberg.ch/posts/2022-05-29-rsync-logical-view.svg: Die Syntax für den Dateinamen, Verzeichnisnamen oder die Datenträgerbezeichnung ist falsch.
 cannot get file, bad return code missing data prefix
Downloaded 1 files :

(The german text basically gives the same message as the line below it.)

Along with your screenshot (most probably taken on Linux) points the problem in another directory: Under Windows, you can't have a file name with a question mark in it.:

C:\Users\...\Kindlesend>echo > "How does rsync work?.epub"
Die Syntax für den Dateinamen, Verzeichnisnamen oder die Datenträgerbezeichnung ist falsch.

No file was created. There at least nine characters invalid for use in file names under Windows, along with a couple of invalid strings.

Sources:

It looks like kindle-send uses the title of the website as a filename without sanitizing.
In this case, the embedded svg image itself is not the root cause of this problem.

@przemekd
Copy link
Contributor

@volkerwestphal I've created a new issue #29 that represents what happens here. I am not really sure if @nikhil1raghav is still around maintaining this repo. I've created a fork to fix some bugs on my own. I also created a release that should fix this file naming problem. You can test it out.

@nikhil1raghav
Copy link
Owner

@przemekd will be glad to merge your fix. Right now not getting much time to fix the bugs. PRs are always welcome. Thanks for fixing this.

@przemekd
Copy link
Contributor

@nikhil1raghav Great! Let me prepare PRs to fix some of these issues I've already spotted. I'll get back to you soon. And thanks a lot for this little tool, it's very handy!

@volkerwestphal
Copy link
Author

I close this issue with the following insights:

  • The url of the rsync blog post was missing a trailing slash. Server sends a redirect, this is not properly handled by go-readability. In further consequence, the tag <img src="2022-05-29-rsync-logical-view.svg"> is incorrectly resolved and results in a 404. Handling of redirects has been reported to upstream as Redirection breaks fixRelativeURIs go-shiori/go-readability#35.
  • The missing image still breaks kindle-send even in v2.0.0-rc2. This bug is important enough to deserve it's own, clean issue number. Will open a fresh ticket later.
  • A second issue identified in the analysis was that the automatic generation of filenames in kindle-send resulted in filenames which included a number of invalid characters. I confirm that this has been fixed in version 2.0.0-rc2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants