Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AsianCrush] fix extractor, add support for yuyutv and Midnight Pulp #21290

Closed
wants to merge 2 commits into from

Conversation

ealgase
Copy link
Contributor

@ealgase ealgase commented Jun 3, 2019

Please follow the guide below

  • You will be asked some questions, please read them carefully and answer honestly
  • Put an x into all the boxes [ ] relevant to your pull request (like that [x])
  • Use Preview tab to see how your pull request will actually look like

Before submitting a pull request make sure you have:

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

  • I am the original author of this code and I am willing to release it under Unlicense
  • I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

  • Bug fix
  • Improvement
  • New extractor
  • New feature

Description of your pull request and other information

The AsianCrush extractor was partially broken (didn't extract description properly). This pull request fixes that, and also adds support for sister sites yuyutv and Midnight Pulp (closing #21281).

@ealgase
Copy link
Contributor Author

ealgase commented Jun 5, 2019

@remitamine sorry to bother you, but could you please take a look at this? (I think I've been understanding the code standards better now, so hopefully you won't have to leave as much feedback on this)

Copy link
Collaborator

@dstftw dstftw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merge video and playlist extractors into single video extractor and single playlist extractor.

@ealgase
Copy link
Contributor Author

ealgase commented Jun 7, 2019

I can do that for the main AsianCrushIE, but for the Playlist IE, it currently requires an additional site specific variable (the _SITE_TITLE).

@dstftw
Copy link
Collaborator

dstftw commented Jun 7, 2019

Nothing stops from rewriting it to use re.sub.

@ealgase
Copy link
Contributor Author

ealgase commented Jun 7, 2019

I don't understand? I'm referring to the last bit of this:

        title = remove_end(
            self._html_search_regex(
                r'(?s)<h1\b[^>]\bid=["\']movieTitle[^>]+>(.+?)</h1>', webpage,
                'title', default=None) or self._og_search_title(
                webpage, default=None) or self._html_search_meta(
                'twitter:title', webpage, 'title',
                default=None) or self._search_regex(
                r'<title>([^<]+)</title>', webpage, 'title', fatal=False),
            ' | %s' % self._SITE_TITLE)

I don't see how re.sub would allow that last bit to work without knowing the title of the site.

@dstftw
Copy link
Collaborator

dstftw commented Jun 7, 2019

re.sub(r'\s*\|\s*.+?$', '', title).

@ealgase ealgase force-pushed the asiancrush-clones branch from ed1b7b0 to 8861b4b Compare June 8, 2019 03:58
@ealgase
Copy link
Contributor Author

ealgase commented Jun 8, 2019

OK, I've made the requested changes.

youtube_dl/extractor/asiancrush.py Outdated Show resolved Hide resolved
'kaltura:%s:%s' % (partner_id, kaltura_id),
ie=KalturaIE.ie_key(), video_id=kaltura_id,
video_title=title)
description = self._html_search_regex(r'<div class="description">(.+?)</div>', webpage, 'description', fatal=False, flags=re.DOTALL)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move flags into regex. Carry long lines.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused about "Move flags into regex", I can't find a difference between the way this is implemented in other extractors.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What difference are you even talking about?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You said "Move flags into regex", I don't understand what you're asking me to do.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you aware what flags are at all?

youtube_dl/extractor/asiancrush.py Outdated Show resolved Hide resolved
youtube_dl/extractor/asiancrush.py Outdated Show resolved Hide resolved
youtube_dl/extractor/asiancrush.py Outdated Show resolved Hide resolved


class AsianCrushIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?asiancrush\.com/video/(?:[^/]+/)?0+(?P<id>\d+)v\b'
IE_NAME = 'asiancrush'
_VALID_URL = r'https?://(?:www\.)?(?P<host>(?:asiancrush\.com|yuyutv\.com|midnightpulp\.com))/video/(?:[^/]+/)?0+(?P<id>\d+)v\b'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move .com part outside the inner group.

@@ -96,15 +148,16 @@ def _real_extract(self, url):
entries.append(self.url_result(
mobj.group('url'), ie=AsianCrushIE.ie_key()))

title = remove_end(
title = re.sub(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Breaks on None title.

@dstftw
Copy link
Collaborator

dstftw commented Jul 15, 2019

Does not work:

> py -3.7 .\youtube_dl\__main__.py https://www.yuyutv.com/video/013886v/the-act-of-killing/ -v --proxy 127.0.0.1:8118
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['https://www.yuyutv.com/video/013886v/the-act-of-killing/', '-v']
[debug] Encodings: locale cp1251, fs utf-8, out utf-8, pref cp1251
[debug] youtube-dl version 2019.07.14
[debug] Git HEAD: 1ef4607
[debug] Python version 3.7.0 (CPython) - Windows-10-10.0.10240-SP0
[debug] exe versions: ffmpeg N-85653-gb4330a0, ffprobe N-85653-gb4330a0, phantomjs 2.1.1, rtmpdump 2.4
[asiancrush] 13886: Downloading webpage
[asiancrush] 13886: Downloading webpage
[Kaltura] 1_66x4rg7o: Downloading video info JSON
[Kaltura] 1_66x4rg7o: Downloading m3u8 information
[debug] Default format spec: bestvideo+bestaudio/best
[debug] Invoking downloader on 'http://cdnapi.kaltura.com/p/513551/sp/51355100/playManifest/entryId/1_66x4rg7o/format/url/protocol/http/flavorId/1_hgns56wd'
ERROR: unable to download video data: HTTP Error 400: Bad Request
Traceback (most recent call last):
  File "C:\Dev\youtube-dl\master\youtube_dl\YoutubeDL.py", line 1915, in process_info
    success = dl(filename, info_dict)
  File "C:\Dev\youtube-dl\master\youtube_dl\YoutubeDL.py", line 1854, in dl
    return fd.download(name, info)
  File "C:\Dev\youtube-dl\master\youtube_dl\downloader\common.py", line 366, in download
    return self.real_download(filename, info_dict)
  File "C:\Dev\youtube-dl\master\youtube_dl\downloader\http.py", line 341, in real_download
    establish_connection()
  File "C:\Dev\youtube-dl\master\youtube_dl\downloader\http.py", line 109, in establish_connection
    ctx.data = self.ydl.urlopen(request)
  File "C:\Dev\youtube-dl\master\youtube_dl\YoutubeDL.py", line 2227, in urlopen
    return self._opener.open(req, timeout=self._socket_timeout)
  File "C:\Python\Python37\lib\urllib\request.py", line 531, in open
    response = meth(req, response)
  File "C:\Python\Python37\lib\urllib\request.py", line 641, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Python\Python37\lib\urllib\request.py", line 563, in error
    result = self._call_chain(*args)
  File "C:\Python\Python37\lib\urllib\request.py", line 503, in _call_chain
    result = func(*args)
  File "C:\Python\Python37\lib\urllib\request.py", line 755, in http_error_302
    return self.parent.open(new, timeout=req.timeout)
  File "C:\Python\Python37\lib\urllib\request.py", line 531, in open
    response = meth(req, response)
  File "C:\Python\Python37\lib\urllib\request.py", line 641, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Python\Python37\lib\urllib\request.py", line 569, in error
    return self._call_chain(*args)
  File "C:\Python\Python37\lib\urllib\request.py", line 503, in _call_chain
    result = func(*args)
  File "C:\Python\Python37\lib\urllib\request.py", line 649, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 400: Bad Request

pull bot referenced this pull request in Vikash-Kothary/youtube-dl Jul 15, 2019
@dstftw dstftw closed this in f614968 Jul 15, 2019
Lamieur referenced this pull request in Lamieur/youtube-dl Aug 3, 2019
Lamieur referenced this pull request in Lamieur/youtube-dl Aug 3, 2019
meunierd referenced this pull request in meunierd/youtube-dl Feb 13, 2020
meunierd referenced this pull request in meunierd/youtube-dl Feb 13, 2020
Lamieur referenced this pull request in Lamieur/youtube-dl Apr 20, 2020
… cocoro.tv (closes #21281, closes #21290)"

This reverts commit a136b6e.
Lamieur referenced this pull request in Lamieur/youtube-dl Apr 20, 2020
Lamieur referenced this pull request in Lamieur/youtube-dl Apr 20, 2020
… cocoro.tv (closes #21281, closes #21290)"

This reverts commit a136b6e.
Lamieur referenced this pull request in Lamieur/youtube-dl Apr 20, 2020
pareronia referenced this pull request in pareronia/youtube-dl Jun 22, 2020
pareronia referenced this pull request in pareronia/youtube-dl Jun 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants