New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[hanime] Add new extractor #24328

Closed

BrutuZ wants to merge 10 commits into ytdl-org:master from BrutuZ:hanime

BrutuZ commented Mar 12, 2020

Before submitting a pull request make sure you have:

At least skimmed through adding new extractor tutorial and youtube-dl coding conventions sections
Searched the bugtracker for similar pull requests
Checked the code with flake8

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

I am the original author of this code and I am willing to release it under Unlicense
I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

Bug fix
Improvement
New extractor
New feature

Description of your pull request and other information

Extractor for hanime.tv (NSFW) using as many fields as they provide

BrutuZ added 2 commits

March 12, 2020 01:15


          [hanime] Add new extractor

10cab64


          Parse all resolutions

Collaborator

dstftw commented Mar 12, 2020

Read coding conventions.

dstftw added the pending-fixes label

BrutuZ added 2 commits

March 12, 2020 15:23


          Added m3u8 to format list with https protocol

91a186a

Calculate TBR from Filesize and Duration, if provided
Use parsing and conversion functions


          Add ZeroDivisionError to exception list

80dc340

on int_or_none and float_or_none

Author

BrutuZ commented Mar 12, 2020

Did that cover all convention recommendations or I still missed anything?

dstftw requested changes

View reviewed changes

youtube_dl/extractor/hanime.py Outdated

+                      video_slug = self._match_id(url)
+                      webpage = self._download_webpage(url, video_slug)
+                      page_json = self._html_search_regex(r'window.__NUXT__=(.+?);<\/script>', webpage, 'Inline JSON')

Collaborator

dstftw Mar 12, 2020

Extract dict if you expect dict.
Relax regex.
Escape dots.

youtube_dl/extractor/hanime.py Outdated

+                      webpage = self._download_webpage(url, video_slug)
+                      page_json = self._html_search_regex(r'window.__NUXT__=(.+?);<\/script>', webpage, 'Inline JSON')
+                      page_json = self._parse_json(page_json, video_slug).get('state').get('data').get('video').get('hentai_video')

Collaborator

dstftw Mar 12, 2020

Read coding conventions on mandatory data.

youtube_dl/extractor/hanime.py Outdated

+                          'API Call', headers={'X-Directive': 'api'}).get('videos_manifest').get('servers')[0].get('streams')
+                      title = page_json.get('name') or api_json.get[0].get('video_stream_group_id')
+                      tags = [t.get('text') for t in page_json.get('hentai_tags')]

Collaborator

dstftw Mar 12, 2020

Breaks.

youtube_dl/extractor/hanime.py Outdated

+                      formats = []
+                      for f in api_json:
+                          item_url = url_or_none(f.get('url')) or url_or_none('https://hanime.tv/api/v1/m3u8s/%s.m3u8' % f.get('id'))

Collaborator

dstftw Mar 12, 2020

Breaks.

youtube_dl/extractor/hanime.py Outdated Show resolved Hide resolved

youtube_dl/extractor/hanime.py Outdated Show resolved Hide resolved

youtube_dl/extractor/hanime.py Outdated Show resolved Hide resolved

youtube_dl/extractor/hanime.py Outdated Show resolved Hide resolved

youtube_dl/utils.py Outdated Show resolved Hide resolved

youtube_dl/utils.py Outdated Show resolved Hide resolved

BrutuZ added 2 commits

March 12, 2020 20:44


          Revert ZeroDivisionError exception

ef753bc


          Requested changes

9aaf20b

BrutuZ commented

View reviewed changes

Author

BrutuZ left a comment

Think everything was addressed. Hopefully got the meaning of the single-words right xD


          int_or_none

0f7e8dc

BrutuZ requested a review from dstftw

March 13, 2020 19:48

BrutuZ added 2 commits

March 13, 2020 21:53


          Fix tags list

79b0d33


          Ignore potential premium DDL links

3202fbc

Iterate over server list instead of always using first index
Add a couple fallbacks

dstftw requested changes

View reviewed changes

youtube_dl/extractor/hanime.py Outdated

+                          'https://members.hanime.tv/api/v3/videos_manifests/%s' % video_slug,
+                          video_slug,
+                          'API Call', headers={'X-Directive': 'api'}), lambda x: x['videos_manifest']['servers'], list) or []
+                      title = page_json.get('name')

Collaborator

dstftw Mar 14, 2020

Mandatory.

youtube_dl/extractor/hanime.py Outdated

+                          video_slug,
+                          'API Call', headers={'X-Directive': 'api'}), lambda x: x['videos_manifest']['servers'], list) or []
+                      title = page_json.get('name')
+                      duration = parse_duration('%sms' % page_json.get('duration_in_ms'))

Collaborator

dstftw Mar 14, 2020

Again: float_or_none, not parse_duration.

youtube_dl/extractor/hanime.py Outdated

+                      title = page_json.get('name')
+                      duration = parse_duration('%sms' % page_json.get('duration_in_ms'))
+                      tags = []
+                      for tag in page_json.get('hentai_tags'):

Collaborator

dstftw Mar 14, 2020

Breaks.

youtube_dl/extractor/hanime.py Outdated

+                  def _real_extract(self, url):
+                      video_slug = self._match_id(url)
+                      page_json = self._html_search_regex(r'<script>.+__NUXT__=(.+?);<\/script>', self._download_webpage(url, video_slug), 'Inline JSON')

Collaborator

dstftw Mar 14, 2020

Nothing changed.

youtube_dl/extractor/hanime.py Outdated

+                          for stream in server['streams']:
+                              if stream.get('compatibility') != 'all':
+                                  continue
+                              item_url = sanitize_url(stream.get('url')) or sanitize_url('https://hanime.tv/api/v1/m3u8s/%s.m3u8' % stream.get('id'))

Collaborator

dstftw Mar 14, 2020

Nothing changed.

youtube_dl/extractor/hanime.py

+                              format = {
+                                  'width': width,
+                                  'height': height,
+                                  'filesize_approx': float_or_none(parse_filesize('%sMb' % stream.get('filesize_mbs'))),

Collaborator

dstftw Mar 14, 2020

See above.

youtube_dl/extractor/hanime.py Outdated

Comment on lines 93 to 94

		{'preference': 0, 'id': 'Poster', 'url': page_json.get('poster_url')},
		{'preference': 1, 'id': 'Cover', 'url': page_json.get('cover_url')},

Collaborator

dstftw Mar 14, 2020

Nothing changed.

dstftw added the do-not-merge label


          Changed more parsing logic

81e1dda

Author

BrutuZ commented Mar 14, 2020

Since I'm not an actual programmer, could I kindly ask to get more than a couple words on the requested changes? Laconic answers can (and some have) become a time-consuming guessing game 😕

BrutuZ requested a review from dstftw

March 20, 2020 14:37

BrutuZ closed this

cypheron mentioned this pull request

Evaluation / overview of new proposed extractors / sites #28054

Open

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge pending-fixes