Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I just want to help with date extraction #948

Open
aleksandar-devedzic opened this issue Aug 25, 2022 · 1 comment
Open

I just want to help with date extraction #948

aleksandar-devedzic opened this issue Aug 25, 2022 · 1 comment

Comments

@aleksandar-devedzic
Copy link

These are the names of tags that can be found in SCRIPT or META tags that represent dates, maybe you will find this helpful:

publishdatepublish-date
prism.publicationDate
coverageEndTime
uploadDate
date
published_date
published_time
pubdate
publish_date
Date
published_at
PublishDate
dcterms.created
rnews:datePublished
article:published_time
czhdev.publicationDate
OriginalPublicationDate
og:published_time
datePublished
article_date_original
czhdev.publicationDate
article.published
published_time_telegram
sailthru.date
DC.date.issued
date
parsely-pub-date
publishtime
publication_date
coverageEndTime,publishdate
publish-datepublishedAtDate
creationDateTime
pub_date
updated_time
dateModified
og:updated_time
last-modified
Last-Modified
DC.date.modified
krn:published_time
article:modified_time
modified_time
modifiedDateTime
dc.modified

@izdrail
Copy link

izdrail commented Sep 30, 2022

this is the source code that is taking care of the publishe tags
PUBLISH_DATE_TAGS = [ {'attribute': 'property', 'value': 'rnews:datePublished', 'content': 'content'}, {'attribute': 'property', 'value': 'article:published_time', 'content': 'content'}, {'attribute': 'name', 'value': 'OriginalPublicationDate', 'content': 'content'}, {'attribute': 'itemprop', 'value': 'datePublished', 'content': 'datetime'}, {'attribute': 'property', 'value': 'og:published_time', 'content': 'content'}, {'attribute': 'name', 'value': 'article_date_original', 'content': 'content'}, {'attribute': 'name', 'value': 'publication_date', 'content': 'content'}, {'attribute': 'name', 'value': 'sailthru.date', 'content': 'content'}, {'attribute': 'name', 'value': 'PublishDate', 'content': 'content'}, {'attribute': 'pubdate', 'value': 'pubdate', 'content': 'datetime'}, {'attribute': 'name', 'value': 'publish_date', 'content': 'content'}, ]

https://github.com/codelucas/newspaper/blob/master/newspaper/extractors.py line 198 till 235 , you could add your list to the dic array and open a pull request

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants