Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I just want to help with date extraction #561

Closed
AndyTheFactory opened this issue Oct 24, 2023 · 2 comments
Closed

I just want to help with date extraction #561

AndyTheFactory opened this issue Oct 24, 2023 · 2 comments
Labels
enhancement New feature or request
Milestone

Comments

@AndyTheFactory
Copy link
Owner

Issue by aleksandar-devedzic
Thu Aug 25 21:36:08 2022
Originally opened as codelucas/newspaper#948


These are the names of tags that can be found in SCRIPT or META tags that represent dates, maybe you will find this helpful:

publishdatepublish-date
prism.publicationDate
coverageEndTime
uploadDate
date
published_date
published_time
pubdate
publish_date
Date
published_at
PublishDate
dcterms.created
rnews:datePublished
article:published_time
czhdev.publicationDate
OriginalPublicationDate
og:published_time
datePublished
article_date_original
czhdev.publicationDate
article.published
published_time_telegram
sailthru.date
DC.date.issued
date
parsely-pub-date
publishtime
publication_date
coverageEndTime,publishdate
publish-datepublishedAtDate
creationDateTime
pub_date
updated_time
dateModified
og:updated_time
last-modified
Last-Modified
DC.date.modified
krn:published_time
article:modified_time
modified_time
modifiedDateTime
dc.modified

@AndyTheFactory
Copy link
Owner Author

Comment by Cornatul
Fri Sep 30 07:49:30 2022


this is the source code that is taking care of the publishe tags
PUBLISH_DATE_TAGS = [ {'attribute': 'property', 'value': 'rnews:datePublished', 'content': 'content'}, {'attribute': 'property', 'value': 'article:published_time', 'content': 'content'}, {'attribute': 'name', 'value': 'OriginalPublicationDate', 'content': 'content'}, {'attribute': 'itemprop', 'value': 'datePublished', 'content': 'datetime'}, {'attribute': 'property', 'value': 'og:published_time', 'content': 'content'}, {'attribute': 'name', 'value': 'article_date_original', 'content': 'content'}, {'attribute': 'name', 'value': 'publication_date', 'content': 'content'}, {'attribute': 'name', 'value': 'sailthru.date', 'content': 'content'}, {'attribute': 'name', 'value': 'PublishDate', 'content': 'content'}, {'attribute': 'pubdate', 'value': 'pubdate', 'content': 'datetime'}, {'attribute': 'name', 'value': 'publish_date', 'content': 'content'}, ]

https://github.com/codelucas/newspaper/blob/master/newspaper/extractors.py line 198 till 235 , you could add your list to the dic array and open a pull request

@AndyTheFactory AndyTheFactory added the enhancement New feature or request label Oct 25, 2023
@AndyTheFactory AndyTheFactory added this to the Release 0.9.2 milestone Nov 12, 2023
@AndyTheFactory
Copy link
Owner Author

added in v0.9.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant