[imagechest] Add new extractor for ImageChest #750

bbepis · 2020-05-13T01:12:16Z

I would hold off adding additional metadata properties for this extractor since there doesn't seem to be a consistent way of grabbing stuff like view counts and descriptions right now.

I also would suggest a documentation page (or giant section of comments like youtube-dl) explaining what should be done when creating a new extractor. I essentially had to reverse engineer other extractors and common.py for example just to figure out where I was going wrong when making this.

iamleot

Can text.extract() and text.extract_iter() can be used instead of re.search() and re.findall()? If that's the case can you please adjust them as suggested and then remove the no longer needed import re?

Thanks!

iamleot · 2020-05-18T17:19:08Z

gallery_dl/extractor/imagechest.py

+        if "Sorry, but the page you requested could not be found." in page:
+            raise exception.NotFoundError("gallery")
+
+        title = re.search(r'<meta property="og:title" content="([^"]+)"/>', page).group(1)


I think this can be converted to:

title, pos = text.extract(page, '<meta property="og:title" content="', '"')

(internally it avoids to use regular expression and that's probably faster.)

iamleot · 2020-05-18T17:21:48Z

gallery_dl/extractor/imagechest.py

+    def images(self, page):
+        """Return a list of all (image-url, metadata)-tuples"""
+
+        image_keys = re.findall(r'<meta property="og:image" content="([^"]+)"/>', page)


I think this can be converted to:

image_keys = list(text.extract_iter(page, '<meta property="og:image" content="', '"/>'))

(same rationale of previous comment)

mikf · 2020-05-18T18:40:44Z

@bbepis Thanks for the PR and sorry for the (sometimes?) rather poor documentation. Hope you didn't have too much trouble.

@iamleot Good catch. I've already taken the liberty to make those improvements and simplifycations myself: ab11b1c

bbepis · 2020-05-20T03:00:01Z

Ye python is not a language I'm very experienced in, sorry. Thanks for making the edits

bbepis added 2 commits May 13, 2020 11:07

[imagechest] Add new extractor for ImageChest

e4035c7

[imagechest] Fix flake8 compliance issues

86239c4

mikf merged commit 7b5711e into mikf:master May 18, 2020

iamleot reviewed May 18, 2020

View reviewed changes

mikf added a commit that referenced this pull request May 18, 2020

[imagechest] simplify code (#750)

ab11b1c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[imagechest] Add new extractor for ImageChest #750

[imagechest] Add new extractor for ImageChest #750

bbepis commented May 13, 2020

iamleot left a comment

iamleot May 18, 2020

iamleot May 18, 2020

mikf commented May 18, 2020

bbepis commented May 20, 2020

[imagechest] Add new extractor for ImageChest #750

[imagechest] Add new extractor for ImageChest #750

Conversation

bbepis commented May 13, 2020

iamleot left a comment

Choose a reason for hiding this comment

iamleot May 18, 2020

Choose a reason for hiding this comment

iamleot May 18, 2020

Choose a reason for hiding this comment

mikf commented May 18, 2020

bbepis commented May 20, 2020