Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deviantART broken image files #112

Closed
Elytreus opened this issue Aug 29, 2014 · 6 comments
Closed

deviantART broken image files #112

Elytreus opened this issue Aug 29, 2014 · 6 comments

Comments

@Elytreus
Copy link

The downloader works great most of the time, but once in a while it downloads broken images. The pictures can not be opened with any program and are in some cases just a few bytes big. If I download these images manually they are completely fine.

The files look like this:
broken images

@Bendito999
Copy link

Sometimes it seems to do this when the deviant art download link times out. If you open one of these files in notepad, it is an html redirect page that takes you to the original page that it was downloaded from (which then contains an updated link to redownload). The ripper program is renaming from .html back to the expected .png or .jpg. Now we just need to figure out how to make the deviant art ripper extract these "failed links" and retry with updated links reparsed from the page.

Though I don't know the inner workings of this program enough to implement this logic, I will throw together a quick little script that will check for these hidden .html files and redownload the real ones. I am still working on it, though.

Until thats done, I devised a manual way that works for me.

Edit: Simpler method:

  1. Uncheck Overwrite Existing files and Preserve Order, and change max download threads to 1 in RipMe.
    2.Delete all of the tiny "failed files" from the album folder (put "size:tiny" in the search bar to find these)
    3.Rerun Ripme until the corrupt files start piling up (there's a nice rerun button in "history"
    4.Stop Ripme
  2. Repeat from step 2

Edit 2::
I modified another downloading script to take all of the broken files in a directory (whether they are named wrong or not) and redownload them. It doesn't search recursively (yet), and doesn't delete old .html files, but is a work in progress. You will need Python 2.7 with Mechanize library
http://pastebin.com/JZa1Pr2z
Though I haven't tried it, the original script that the actual deviant art downloading routines came from may be promising, as the logic in it seems sound.
https://github.com/voyageur/dagr

Complicated Manual Method (doesn't download pictures only available as thumbnail): If you sort by size and find all of these failed links, you can rename all of the extensions from whatever they are to .html with a program called Bulk Rename Utility.
http://www.bulkrenameutility.co.uk/Download.php

image
To do the .html rename, sort by size, change the pictured setting, and select the .png and .jpg images. Press Rename

Because Firefox disallows opening many html files at once, move all of these html files into their own folder. Open command prompt and navigate to that folder.

First, prep Firefox for the abuse by installing Imageblock
https://addons.mozilla.org/en-US/firefox/addon/image-block/

and Downthemall
https://addons.mozilla.org/en-US/firefox/addon/downthemall/

You should have a little button for image block, set it to block images.
Also, go to Firefox's options and under the "Tabs" tab, uncheck "Don't load tabs until selected"

Open up a new Window in Firefox

Then run these 2 commands in the command window we opened earlier:
dir /b > url_list.txt

for /F %i in (url_list.txt) do "C:\Program Files (x86)\Mozilla Firefox\firefox.exe" -new-tab "%i"

You may have to change the "Program Files (x86)" to "Program Files"

Once loaded, use the DownThemAll firefox extension, and press DownThemAll "All Tabs". You may need to find this button by going to "3 lines" Firefox option menu, then customizing your toolbar to contain the DTA buttons. Once that is successful...
In the fast filter section, type "download", and all of the links across all tabs with "download" in the description should be selected. It should catch everything ripme missed.

I suggest changing DownThemAll preferences under its Network Tab to "concurrent downloads=1" and under "Advanced" Max number of Segments Per Download to 1/Disabled. This makes this look less like a mass downloading operation to Deviant Art.

The bad part is that, if it was a huge album, we might run into this problem once again while DownThemAll is running, as the time once again expired. Just run the opening command line again, and tell DTA to skip any files it already downloaded when it starts asking about overwriting files.

@Elytreus
Copy link
Author

Thank you for your quick and detailed response. I will try your method but I hope the creator of the program will find a solution, as I am not familiar with programming.

@Bendito999
Copy link

Yeah, sorry about the excessively complicated instructions. I'm sure there's a better way (I edited the previous post with simpler instructions that seem to work), and a cleanup script you can run in a directory shouldn't be too hard (Edit:look in first post), as I run into this problem when manually downloading from deviantart.

Edit (Better programming change solution, but possibly would require more extensive restructuring):
The "run ahead" scanning behavior is actually fine, but the links to the actual image need to be generated on the fly. The subroutine that takes a deviant "page" with the picture on it and turns it into a full-definition download link needs to be moved from the 'scanning' phase to the downloader phase.
The que will fill up with the raw page links, and the download links would be generated "just in time" for downloading (preferably one at a time) as a part of the download routine.

Old (possibly simpler but not foolproof) Delay solution:
Instead of running ahead and queing up expiring download links, it should wait after scanning 1 page
send to download thread
wait for the download thread to complete links on that page,
then advance to getting fresher links on the next page.
Repeat

@4pr0n
Copy link
Owner

4pr0n commented Oct 21, 2014

I've had issues getting some download links from deviantart pages. Exhibit A & B:
https://github.com/4pr0n/ripme/blob/master/src/main/java/com/rarchives/ripme/ripper/rippers/DeviantartRipper.java#L162

@rautamiekka
Copy link

I haven't ever encountered this ancient problem, and we're up to RipMe 1.5.5 nowadays, but I'm still on 1.5.2.

Window$ 7 Ultimate SP1 x64, Oracle Java SE 8 Update 144 x64, I've ripped dozens of entire galleries (which still are on the disk) and every so often sync my favs folders, no real problems.

@metaprime
Copy link
Collaborator

metaprime commented Aug 14, 2017

Yeah the dA ripper still has a lot of problems but I think we can probably close this one at this point. I've never seen it either.

cyian-1756 added a commit to cyian-1756/ripme that referenced this issue Nov 4, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants