-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new extractor for Dagelijkse kost #28119
Conversation
youtube_dl/extractor/canvas.py
Outdated
mobj = re.match(self._VALID_URL, url) | ||
display_id = mobj.group('id') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use _match_id
method.
youtube_dl/extractor/canvas.py
Outdated
webpage = self._download_webpage(url, display_id) | ||
|
||
title = strip_or_none(self._search_regex( | ||
r'<h1[^>]+class="dish-metadata__title headline-1"[^>]*>(.+?)</h1>', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use get_element_by_class
function.
youtube_dl/extractor/canvas.py
Outdated
'id': video_id, | ||
'display_id': display_id, | ||
'title': title, | ||
'description': self._og_search_description(webpage), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fallback to other available values.
youtube_dl/extractor/canvas.py
Outdated
|
||
class DagelijkseKostIE(InfoExtractor): | ||
IE_DESC = 'dagelijksekost.een.be' | ||
_VALID_URL = r'https?://dagelijksekost\.een\.be/(?:[^/]+/)*(?P<id>[^/?#&]+)' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
match only supported URLs.
youtube_dl/extractor/canvas.py
Outdated
|
||
return { | ||
'_type': 'url_transparent', | ||
'url': 'https://mediazone.vrt.be/api/v1/dako/assets/%s' % (video_id), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parentheses not needed.
youtube_dl/extractor/canvas.py
Outdated
webpage = self._download_webpage(url, display_id) | ||
|
||
title = strip_or_none(get_element_by_class( | ||
"dish-metadata__title", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use single quotes consistently.
youtube_dl/extractor/canvas.py
Outdated
@@ -15,11 +15,12 @@ | |||
merge_dicts, | |||
str_or_none, | |||
url_or_none, | |||
get_element_by_class, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alphabetic order.
youtube_dl/extractor/canvas.py
Outdated
description = strip_or_none(get_element_by_class( | ||
"dish-description", | ||
webpage) or self._og_search_description( | ||
webpage, default=None)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
twitter:description
and description
meta tags are also available.
youtube_dl/extractor/canvas.py
Outdated
title = strip_or_none(get_element_by_class( | ||
'dish-metadata__title', | ||
webpage) or self._og_search_title( | ||
webpage, default=None)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check the result of the fallback(in comparision with the primary source).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I drop the fallback, or apply a regex to make up for the differences?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
twitter:title
meta tag does not contain | Dagelijkse kost
in the end.
if you want to keep og:title
then you can use remove_end
function to clean the title.
youtube_dl/extractor/canvas.py
Outdated
or self._html_search_meta( | ||
('description', 'twitter:description'), webpage) | ||
or self._og_search_description(webpage, default=None)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
combine into a single call to _html_search_meta
.
youtube_dl/extractor/canvas.py
Outdated
get_element_by_class( | ||
'dish-description', webpage) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check that extracted description is what is expected(should not contain html tags).
youtube_dl/extractor/canvas.py
Outdated
title = strip_or_none(get_element_by_class( | ||
'dish-metadata__title', | ||
webpage) or self._og_search_title( | ||
webpage, default=None)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
twitter:title
meta tag does not contain | Dagelijkse kost
in the end.
if you want to keep og:title
then you can use remove_end
function to clean the title.
youtube_dl/extractor/canvas.py
Outdated
description = strip_or_none( | ||
|
||
clean_html(get_element_by_class( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
strip_or_none
no longer needed.
youtube_dl/extractor/canvas.py
Outdated
or self._html_search_meta( | ||
('description', 'twitter:description', 'og:description'), | ||
webpage, | ||
default=None)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think it would be better not to silence the warning when the description is not found(it's not expcted to not been able to extract description, failing would likely indicate that a change in the website has happened).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I do the same for title?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes.
6853771
to
e22ee2b
Compare
Please follow the guide below
x
into all the boxes [ ] relevant to your pull request (like that [x])Before submitting a pull request make sure you have:
In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:
What is the purpose of your pull request?
Description of your pull request and other information
This PR adds a new extractor for "Dagelijkse kost" to
canvas.py
. The extractor is based on the existing extractor for een and canvas.