-
-
Notifications
You must be signed in to change notification settings - Fork 995
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[imagechest] Add new extractor for ImageChest #750
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can text.extract()
and text.extract_iter()
can be used instead of re.search()
and re.findall()
? If that's the case can you please adjust them as suggested and then remove the no longer needed import re
?
Thanks!
gallery_dl/extractor/imagechest.py
Outdated
if "Sorry, but the page you requested could not be found." in page: | ||
raise exception.NotFoundError("gallery") | ||
|
||
title = re.search(r'<meta property="og:title" content="([^"]+)"/>', page).group(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be converted to:
title, pos = text.extract(page, '<meta property="og:title" content="', '"')
(internally it avoids to use regular expression and that's probably faster.)
gallery_dl/extractor/imagechest.py
Outdated
def images(self, page): | ||
"""Return a list of all (image-url, metadata)-tuples""" | ||
|
||
image_keys = re.findall(r'<meta property="og:image" content="([^"]+)"/>', page) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be converted to:
image_keys = list(text.extract_iter(page, '<meta property="og:image" content="', '"/>'))
(same rationale of previous comment)
Ye python is not a language I'm very experienced in, sorry. Thanks for making the edits |
I would hold off adding additional metadata properties for this extractor since there doesn't seem to be a consistent way of grabbing stuff like view counts and descriptions right now.
I also would suggest a documentation page (or giant section of comments like youtube-dl) explaining what should be done when creating a new extractor. I essentially had to reverse engineer other extractors and common.py for example just to figure out where I was going wrong when making this.