Add new extractor for Dagelijkse kost #28119

paretje · 2021-02-08T17:13:48Z

Please follow the guide below

You will be asked some questions, please read them carefully and answer honestly
Put an x into all the boxes [ ] relevant to your pull request (like that [x])
Use Preview tab to see how your pull request will actually look like

Before submitting a pull request make sure you have:

Searched the bugtracker for similar pull requests
Read adding new extractor tutorial
Read youtube-dl coding conventions and adjusted the code to meet them
Covered the code with tests (note that PRs without tests will be REJECTED)
Checked the code with flake8

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

I am the original author of this code and I am willing to release it under Unlicense
I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

Bug fix
Improvement
New extractor
New feature

Description of your pull request and other information

This PR adds a new extractor for "Dagelijkse kost" to canvas.py. The extractor is based on the existing extractor for een and canvas.

remitamine · 2021-02-08T17:56:45Z

youtube_dl/extractor/canvas.py

+        mobj = re.match(self._VALID_URL, url)
+        display_id = mobj.group('id')


use _match_id method.

remitamine · 2021-02-08T17:57:31Z

youtube_dl/extractor/canvas.py

+        webpage = self._download_webpage(url, display_id)
+
+        title = strip_or_none(self._search_regex(
+            r'<h1[^>]+class="dish-metadata__title headline-1"[^>]*>(.+?)</h1>',


use get_element_by_class function.

remitamine · 2021-02-08T18:01:28Z

youtube_dl/extractor/canvas.py

+            'id': video_id,
+            'display_id': display_id,
+            'title': title,
+            'description': self._og_search_description(webpage),


fallback to other available values.

remitamine · 2021-02-08T18:20:48Z

youtube_dl/extractor/canvas.py

+
+class DagelijkseKostIE(InfoExtractor):
+    IE_DESC = 'dagelijksekost.een.be'
+    _VALID_URL = r'https?://dagelijksekost\.een\.be/(?:[^/]+/)*(?P<id>[^/?#&]+)'


match only supported URLs.

remitamine · 2021-02-08T18:21:56Z

youtube_dl/extractor/canvas.py

+
+        return {
+            '_type': 'url_transparent',
+            'url': 'https://mediazone.vrt.be/api/v1/dako/assets/%s' % (video_id),


parentheses not needed.

remitamine · 2021-02-08T20:33:38Z

youtube_dl/extractor/canvas.py

+        webpage = self._download_webpage(url, display_id)
+
+        title = strip_or_none(get_element_by_class(
+            "dish-metadata__title",


use single quotes consistently.

remitamine · 2021-02-08T20:33:53Z

youtube_dl/extractor/canvas.py

@@ -15,11 +15,12 @@
    merge_dicts,
    str_or_none,
    url_or_none,
+    get_element_by_class,


Alphabetic order.

remitamine · 2021-02-08T20:37:08Z

youtube_dl/extractor/canvas.py

+        description = strip_or_none(get_element_by_class(
+            "dish-description",
+            webpage) or self._og_search_description(
+            webpage, default=None))


twitter:description and description meta tags are also available.

remitamine · 2021-02-09T08:32:07Z

youtube_dl/extractor/canvas.py

+        title = strip_or_none(get_element_by_class(
+            'dish-metadata__title',
+            webpage) or self._og_search_title(
+            webpage, default=None))


check the result of the fallback(in comparision with the primary source).

Should I drop the fallback, or apply a regex to make up for the differences?

twitter:title meta tag does not contain | Dagelijkse kost in the end.
if you want to keep og:title then you can use remove_end function to clean the title.

remitamine · 2021-02-09T08:32:34Z

youtube_dl/extractor/canvas.py

+            or self._html_search_meta(
+                ('description', 'twitter:description'), webpage)
+            or self._og_search_description(webpage, default=None))


combine into a single call to _html_search_meta.

remitamine · 2021-02-09T08:35:07Z

youtube_dl/extractor/canvas.py

+            get_element_by_class(
+                'dish-description', webpage)


check that extracted description is what is expected(should not contain html tags).

remitamine · 2021-02-09T20:53:56Z

youtube_dl/extractor/canvas.py

+        title = strip_or_none(get_element_by_class(
+            'dish-metadata__title',
+            webpage) or self._og_search_title(
+            webpage, default=None))


twitter:title meta tag does not contain | Dagelijkse kost in the end.
if you want to keep og:title then you can use remove_end function to clean the title.

remitamine · 2021-02-09T20:56:26Z

youtube_dl/extractor/canvas.py

+        description = strip_or_none(
+
+            clean_html(get_element_by_class(


strip_or_none no longer needed.

remitamine · 2021-02-09T20:59:18Z

youtube_dl/extractor/canvas.py

+            or self._html_search_meta(
+                ('description', 'twitter:description', 'og:description'),
+                webpage,
+                default=None))


i think it would be better not to silence the warning when the description is not found(it's not expcted to not been able to extract description, failing would likely indicate that a change in the website has happened).

Should I do the same for title?

remitamine requested changes Feb 8, 2021

View reviewed changes

remitamine added the pending-fixes label Feb 8, 2021

paretje force-pushed the feature/dako branch from 2f56a38 to 94bd854 Compare February 8, 2021 19:32

paretje requested a review from remitamine February 8, 2021 20:01

remitamine requested changes Feb 8, 2021

View reviewed changes

paretje force-pushed the feature/dako branch from 94bd854 to 07dee7c Compare February 8, 2021 21:10

paretje requested a review from remitamine February 9, 2021 07:32

remitamine requested changes Feb 9, 2021

View reviewed changes

paretje force-pushed the feature/dako branch from 07dee7c to 560e828 Compare February 9, 2021 19:55

remitamine requested changes Feb 9, 2021

View reviewed changes

paretje force-pushed the feature/dako branch from 560e828 to 6853771 Compare February 9, 2021 21:40

[DagelijkseKost] Add new extractor

e22ee2b

paretje force-pushed the feature/dako branch from 6853771 to e22ee2b Compare February 10, 2021 18:37

remitamine merged commit f28f1b4 into ytdl-org:master Feb 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add new extractor for Dagelijkse kost #28119

Add new extractor for Dagelijkse kost #28119

paretje commented Feb 8, 2021

remitamine Feb 8, 2021

remitamine Feb 8, 2021

remitamine Feb 8, 2021

remitamine Feb 8, 2021

remitamine Feb 8, 2021

remitamine Feb 8, 2021

remitamine Feb 8, 2021

remitamine Feb 8, 2021

remitamine Feb 9, 2021

paretje Feb 9, 2021

remitamine Feb 9, 2021

remitamine Feb 9, 2021

remitamine Feb 9, 2021

remitamine Feb 9, 2021

remitamine Feb 9, 2021

remitamine Feb 9, 2021

paretje Feb 9, 2021

remitamine Feb 9, 2021

		mobj = re.match(self._VALID_URL, url)
		display_id = mobj.group('id')

		description = strip_or_none(

		clean_html(get_element_by_class(

Add new extractor for Dagelijkse kost #28119

Add new extractor for Dagelijkse kost #28119

Conversation

paretje commented Feb 8, 2021

Please follow the guide below

Before submitting a pull request make sure you have:

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

What is the purpose of your pull request?

Description of your pull request and other information

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment