Request: Add support for simply-hentai #89

ShyWest · 2018-05-26T18:34:52Z

Would be possible to add support for https://www.simpy-hentai.com/? It's a hentai web similar to nhentai, hbrowse, and the like. I could try to do it myself, but there's no documentation about how to do it, and I would rather not submit a half baked patch that will have to be reviewed and rewriten.

Every doujin/manga/gallery has it's own main page containing a cover, the title, meta data such as tags, language, author/s, number of pages, etc. Language info is not always present and I believe one work can have several authors, but I can't find and example now.

Depending on several factors, the URLs for these main pages can differ.

An original work will use the subdomain original-work, the main domain and an slug, like this: https://original-work.simply-hentai.com/seductive-uniform-ch-1-21
A doujin will use the subdomain www, the main domain and two slugs, one for the series it parodies and another for the title, like this: https://www.simply-hentai.com/fresh-precure/eas-sama-no-sakusei-jigoku
However, some series have their own subdomain, followed by the main domain and one slug, like this: https://pokemon.simply-hentai.com/mao-friends-9bc39
Being a hentai web, some URLs have non-European characters, like this one: https://www.simply-hentai.com/dragon-ball/21号改造計画

Each work has a page showing thumbnails for every page, and follows the structure (url)/all-pages, like this:
https://pokemon.simply-hentai.com/mao-friends-9bc39/all-pages

Each page can be viewed separately and their links follow the structure (url)/page/(page_id), like this: https://pokemon.simply-hentai.com/mao-friends-9bc39/page/4052558

There are also extra sections for gifs galleries and videos who has URLs very similar to the previous ones, so some sort of detection would be needed to avoid trying downloading a manga that isn't there.

Each work has an associated json file containing the URLs to the files itself following the structure (url)/all-pages.json, like this: https://pokemon.simply-hentai.com/mao-friends-9bc39/all-pages.json.

The content of said file is like this:

{
    "4052555": {
        "giant": "https://cdn2.sh-cdn.com/images/v2/vertical/giant_thumb/2017-09/Album/58880/4052555.jpg",
        "full": "https://cdn2.sh-cdn.com/images/v2/vertical/full/2017-09/58880/4052555.jpg",
        "path": "https://pokemon.simply-hentai.com/mao-friends-9bc39/page/4052555",
        "bookmarked": false
    },
    "4052558": {
        "giant": "https://cdn2.sh-cdn.com/images/v2/vertical/giant_thumb/2017-09/Album/58880/4052558.jpg",
        "full": "https://cdn2.sh-cdn.com/images/v2/vertical/full/2017-09/58880/4052558.jpg",
        "path": "https://pokemon.simply-hentai.com/mao-friends-9bc39/page/4052558",
        "bookmarked": false
    },
    ...
}

Each page is defined by an id, a giant thumb, the link to view said page and whether it was bookmarked or not by the user. The giant thumb, although big, it's smaller than the full page, so the full page (property full) is the one that should be downloaded. The only way to know the actual page number is by their position in the list.

The main page for each work offers a download option, but it's just a list of filelockers to get an encrypted zip file.

As far as I know, the web doesn't offer an API.

Hope this info is useful.

The text was updated successfully, but these errors were encountered:

mikf · 2018-05-26T19:30:36Z

Hope this info is useful.

Why yes, this is very useful. That makes this a whole lot easier. Thanks.

Is it necessary (or would it be useful) to add login support or is everything available without being logged in?

ShyWest · 2018-05-26T23:20:46Z

Everything is available to anonymous users. I did a custom script a while back and didn't have any issues about limits nor throttling after downloading no less than one thousand pages. And that was before discovering the json containing the full index, so I was crawling the whole thing. I did put one second of sleep between requests, though. I like to be nice to servers just in case.

You can bookmark and favorite works with an account, if you want to go the extra mile and add support for that. But the site is perfectly usable without one.

All videos hosted on their own servers seem be to dead, but myhentai.tv embeds, which are most of the videos, work fine.

mikf · 2018-05-30T09:55:34Z

H-manga/galleries, single images and gifs, and even videos should work now.

I've noticed that the download speed for anything not cached by their CDN is incredibly slow and may even result in a read-timeout, but downloads still finish, given enough time, so I guess it's fine.

Video hosted on their own servers are also all gone, but most of the videos listed are hosted on another service and they work just fine.

Anyway, notify me if you find anything that doesn't work the way it should.

ShyWest · 2018-05-31T21:52:00Z

Thanks, I have been trying it with a bunch of links and it's working nicely for the most part. Last time I crawled the site it didn't timeout that often, I suppose they have changed the infrastructure on their CDN.

Anyway, I said for the most part because I found out that the all-pages.json file isn't as reliable as I though. I found one link where the json file doesn't exists and the web gets stuck on and endless loop of redirections: https://original-work.simply-hentai.com/dolls-anzai-rina-hen-dolls-rina-anzais-story. Maybe gallery-dl should use a crawler strategy as a fallback in such cases?

Thanks again for your work and releasing it as Free Software with Linux support and all. It's appreciated.

mikf · 2018-06-01T13:44:32Z

I've tried the link you posted and it works just fine ... well, at least now it does. Maybe the site had some sort of hiccup when you tried or it needs some time to generate the all-pages.json file on demand?

I also tested the 10 newest and 10 oldest galleries to try to reproduce this problem, but to no avail, i.e. everything worked as it should.

ShyWest · 2018-06-01T14:03:21Z

Nope, failed again for me, third day in a row. Removed my config file just in case, here's the log:

[gallery-dl][debug] Starting KeywordJob for 'https://original-work.simply-hentai.com/dolls-anzai-rina-hen-dolls-rina-anzais-story'
[simplyhentai][debug] Using SimplyhentaiGalleryExtractor for 'https://original-work.simply-hentai.com/dolls-anzai-rina-hen-dolls-rina-anzais-story'
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): original-work.simply-hentai.com
[urllib3.connectionpool][debug] https://original-work.simply-hentai.com:443 "GET /dolls-anzai-rina-hen-dolls-rina-anzais-story HTTP/1.1" 200 None
[urllib3.connectionpool][debug] https://original-work.simply-hentai.com:443 "GET /dolls-anzai-rina-hen-dolls-rina-anzais-story/all-pages.json HTTP/1.1" 301 None
[urllib3.connectionpool][debug] https://original-work.simply-hentai.com:443 "GET /dolls-anzai-rina-hen-dolls-rina-anzais-story/all-pages.json HTTP/1.1" 301 None
[urllib3.connectionpool][debug] https://original-work.simply-hentai.com:443 "GET /dolls-anzai-rina-hen-dolls-rina-anzais-story/all-pages.json HTTP/1.1" 301 None
[urllib3.connectionpool][debug] https://original-work.simply-hentai.com:443 "GET /dolls-anzai-rina-hen-dolls-rina-anzais-story/all-pages.json HTTP/1.1" 301 None
[urllib3.connectionpool][debug] https://original-work.simply-hentai.com:443 "GET /dolls-anzai-rina-hen-dolls-rina-anzais-story/all-pages.json HTTP/1.1" 301 None
[urllib3.connectionpool][debug] https://original-work.simply-hentai.com:443 "GET /dolls-anzai-rina-hen-dolls-rina-anzais-story/all-pages.json HTTP/1.1" 301 None
[urllib3.connectionpool][debug] https://original-work.simply-hentai.com:443 "GET /dolls-anzai-rina-hen-dolls-rina-anzais-story/all-pages.json HTTP/1.1" 301 None
[urllib3.connectionpool][debug] https://original-work.simply-hentai.com:443 "GET /dolls-anzai-rina-hen-dolls-rina-anzais-story/all-pages.json HTTP/1.1" 301 None
[urllib3.connectionpool][debug] https://original-work.simply-hentai.com:443 "GET /dolls-anzai-rina-hen-dolls-rina-anzais-story/all-pages.json HTTP/1.1" 301 None
[urllib3.connectionpool][debug] https://original-work.simply-hentai.com:443 "GET /dolls-anzai-rina-hen-dolls-rina-anzais-story/all-pages.json HTTP/1.1" 301 None
[urllib3.connectionpool][debug] https://original-work.simply-hentai.com:443 "GET /dolls-anzai-rina-hen-dolls-rina-anzais-story/all-pages.json HTTP/1.1" 301 None
[urllib3.connectionpool][debug] https://original-work.simply-hentai.com:443 "GET /dolls-anzai-rina-hen-dolls-rina-anzais-story/all-pages.json HTTP/1.1" 301 None
[urllib3.connectionpool][debug] https://original-work.simply-hentai.com:443 "GET /dolls-anzai-rina-hen-dolls-rina-anzais-story/all-pages.json HTTP/1.1" 301 None
[urllib3.connectionpool][debug] https://original-work.simply-hentai.com:443 "GET /dolls-anzai-rina-hen-dolls-rina-anzais-story/all-pages.json HTTP/1.1" 301 None
[urllib3.connectionpool][debug] https://original-work.simply-hentai.com:443 "GET /dolls-anzai-rina-hen-dolls-rina-anzais-story/all-pages.json HTTP/1.1" 301 None
[urllib3.connectionpool][debug] https://original-work.simply-hentai.com:443 "GET /dolls-anzai-rina-hen-dolls-rina-anzais-story/all-pages.json HTTP/1.1" 301 None
[urllib3.connectionpool][debug] https://original-work.simply-hentai.com:443 "GET /dolls-anzai-rina-hen-dolls-rina-anzais-story/all-pages.json HTTP/1.1" 301 None
[urllib3.connectionpool][debug] https://original-work.simply-hentai.com:443 "GET /dolls-anzai-rina-hen-dolls-rina-anzais-story/all-pages.json HTTP/1.1" 301 None
[urllib3.connectionpool][debug] https://original-work.simply-hentai.com:443 "GET /dolls-anzai-rina-hen-dolls-rina-anzais-story/all-pages.json HTTP/1.1" 301 None
[urllib3.connectionpool][debug] https://original-work.simply-hentai.com:443 "GET /dolls-anzai-rina-hen-dolls-rina-anzais-story/all-pages.json HTTP/1.1" 301 None
[urllib3.connectionpool][debug] https://original-work.simply-hentai.com:443 "GET /dolls-anzai-rina-hen-dolls-rina-anzais-story/all-pages.json HTTP/1.1" 301 None
[urllib3.connectionpool][debug] https://original-work.simply-hentai.com:443 "GET /dolls-anzai-rina-hen-dolls-rina-anzais-story/all-pages.json HTTP/1.1" 301 None
[urllib3.connectionpool][debug] https://original-work.simply-hentai.com:443 "GET /dolls-anzai-rina-hen-dolls-rina-anzais-story/all-pages.json HTTP/1.1" 301 None
[urllib3.connectionpool][debug] https://original-work.simply-hentai.com:443 "GET /dolls-anzai-rina-hen-dolls-rina-anzais-story/all-pages.json HTTP/1.1" 301 None
[urllib3.connectionpool][debug] https://original-work.simply-hentai.com:443 "GET /dolls-anzai-rina-hen-dolls-rina-anzais-story/all-pages.json HTTP/1.1" 301 None
[urllib3.connectionpool][debug] https://original-work.simply-hentai.com:443 "GET /dolls-anzai-rina-hen-dolls-rina-anzais-story/all-pages.json HTTP/1.1" 301 None
[urllib3.connectionpool][debug] https://original-work.simply-hentai.com:443 "GET /dolls-anzai-rina-hen-dolls-rina-anzais-story/all-pages.json HTTP/1.1" 301 None
[urllib3.connectionpool][debug] https://original-work.simply-hentai.com:443 "GET /dolls-anzai-rina-hen-dolls-rina-anzais-story/all-pages.json HTTP/1.1" 301 None
[urllib3.connectionpool][debug] https://original-work.simply-hentai.com:443 "GET /dolls-anzai-rina-hen-dolls-rina-anzais-story/all-pages.json HTTP/1.1" 301 None
[urllib3.connectionpool][debug] https://original-work.simply-hentai.com:443 "GET /dolls-anzai-rina-hen-dolls-rina-anzais-story/all-pages.json HTTP/1.1" 301 None
[urllib3.connectionpool][debug] https://original-work.simply-hentai.com:443 "GET /dolls-anzai-rina-hen-dolls-rina-anzais-story/all-pages.json HTTP/1.1" 301 None
[simplyhentai][error] HTTP request failed:  Exceeded 30 redirects.

I can't open the json file in my browser, either, it gets redirected endlessly too. Same with wget. I tried other links and they work flawlessly. It's not gallery-dl's fault, but it's baffling.

mikf · 2018-06-01T18:47:23Z

Since I don't have this infinite redirect problem, I kind of need to know what works and what doesn't on your side to fix this:

Can you access https://original-work.simply-hentai.com/dolls-anzai-rina-hen-dolls-rina-anzais-story/all-pages in your browser?
What are the responses for the following wget calls (if browser access works)?

# should return the HTML version
$ wget --header='Accept: text/html' https://original-work.simply-hentai.com/dolls-anzai-rina-hen-dolls-rina-anzais-story/all-pages

# should get the same JSON data as all-pages.json would; or cause infinite redirects ...
$ wget --header='Accept: application/json' https://original-work.simply-hentai.com/dolls-anzai-rina-hen-dolls-rina-anzais-story/all-pages

ShyWest · 2018-06-01T19:34:27Z

Yes, I can access it. It works as intended, thumbnails and all.

Response from the first command: https://pastebin.com/A8FSZE5Z

Response from the second command: https://pastebin.com/Gmp92KzX

That trick with the header worked. The server still refuses to serve me the json file using the proper URL.

mikf · 2018-06-01T20:23:03Z

I've changed the HTTP request to .../all-pages. Hopefully it works for all galleries now.

The Accept header thing is something I found by accident, basically: I wanted to get the thumbnail links from the .../all-pages page and convert them to their original form, but it served me JSON data instead of HTML.

As it turns out, the webserver only sends the HTML version if you send an Accept: text/html header, like a browser would, or JSON for Accept: */* and Accept: application/json, or a 404 Not Found otherwise.

ShyWest · 2018-06-03T16:20:21Z

Go figure. Can't decide whether that's clever or obscure web design. The last patch seems to work fine on my end. Thank you for your time, again.

mikf added the site:support label May 26, 2018

mikf added a commit that referenced this issue May 27, 2018

[simplyhentai] add gallery extractor (#89)

55b0913

mikf added a commit that referenced this issue May 30, 2018

[simplyhentai] add image extractor (#89)

f9a6a19

mikf added a commit that referenced this issue May 30, 2018

[simplyhentai] add video extractor (#89)

cdcc342

All videos hosted on their own servers seem be to dead, but myhentai.tv embeds, which are most of the videos, work fine.

mikf closed this as completed May 30, 2018

mikf reopened this Jun 1, 2018

mikf added a commit that referenced this issue Jun 1, 2018

[simplyhentai] avoid redirects for all-pages.json (#89)

a47c613

mikf closed this as completed Jun 8, 2018

mikf added the nsfw label Jul 17, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request: Add support for simply-hentai #89

Request: Add support for simply-hentai #89

ShyWest commented May 26, 2018 •

edited

Loading

mikf commented May 26, 2018

ShyWest commented May 26, 2018

mikf commented May 30, 2018

ShyWest commented May 31, 2018

mikf commented Jun 1, 2018

ShyWest commented Jun 1, 2018

mikf commented Jun 1, 2018

ShyWest commented Jun 1, 2018

mikf commented Jun 1, 2018

ShyWest commented Jun 3, 2018

Request: Add support for simply-hentai #89

Request: Add support for simply-hentai #89

Comments

ShyWest commented May 26, 2018 • edited Loading

mikf commented May 26, 2018

ShyWest commented May 26, 2018

mikf commented May 30, 2018

ShyWest commented May 31, 2018

mikf commented Jun 1, 2018

ShyWest commented Jun 1, 2018

mikf commented Jun 1, 2018

ShyWest commented Jun 1, 2018

mikf commented Jun 1, 2018

ShyWest commented Jun 3, 2018

ShyWest commented May 26, 2018 •

edited

Loading