Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[imagechest] Add new extractor for ImageChest #750

Merged
merged 2 commits into from
May 18, 2020
Merged

Conversation

bbepis
Copy link
Contributor

@bbepis bbepis commented May 13, 2020

I would hold off adding additional metadata properties for this extractor since there doesn't seem to be a consistent way of grabbing stuff like view counts and descriptions right now.

I also would suggest a documentation page (or giant section of comments like youtube-dl) explaining what should be done when creating a new extractor. I essentially had to reverse engineer other extractors and common.py for example just to figure out where I was going wrong when making this.

@mikf mikf merged commit 7b5711e into mikf:master May 18, 2020
Copy link
Contributor

@iamleot iamleot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can text.extract() and text.extract_iter() can be used instead of re.search() and re.findall()? If that's the case can you please adjust them as suggested and then remove the no longer needed import re?

Thanks!

if "Sorry, but the page you requested could not be found." in page:
raise exception.NotFoundError("gallery")

title = re.search(r'<meta property="og:title" content="([^"]+)"/>', page).group(1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be converted to:

title, pos = text.extract(page, '<meta property="og:title" content="', '"')

(internally it avoids to use regular expression and that's probably faster.)

def images(self, page):
"""Return a list of all (image-url, metadata)-tuples"""

image_keys = re.findall(r'<meta property="og:image" content="([^"]+)"/>', page)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be converted to:

image_keys = list(text.extract_iter(page, '<meta property="og:image" content="', '"/>'))

(same rationale of previous comment)

mikf added a commit that referenced this pull request May 18, 2020
@mikf
Copy link
Owner

mikf commented May 18, 2020

@bbepis Thanks for the PR and sorry for the (sometimes?) rather poor documentation. Hope you didn't have too much trouble.

@iamleot Good catch. I've already taken the liberty to make those improvements and simplifycations myself: ab11b1c

@bbepis
Copy link
Contributor Author

bbepis commented May 20, 2020

Ye python is not a language I'm very experienced in, sorry. Thanks for making the edits

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants