-
-
Notifications
You must be signed in to change notification settings - Fork 312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Doesn't extract PublishedTime from datePublished field #385
Comments
Mmm, seems like the detector is only looking for Could you test if including the other two keys fixes the problem? And create a pull request? |
I've tried changing the code in Published time but getting some varied results. Running with test in PR: #386
I attempted to delve a bit deeper, but got a bit lost in what it's trying to do within linkeddata get. |
Ok, I see. The problem is that linkedData does not work exactly like other containers like meta. In meta, you can pass multiple arguments and it returns the first value found. In linkedData, the arguments are to retrieve properties inside other properties. For example: $ld->time('pagePublished', 'datePublished'); //Returns the value pagePublished->datePublished In this case, you should to this: $ld->time('pagePublished') ?: $ld->time('datePublished'); Maybe the ld container should be changed at some point for a similar behavior as others. For example, using a dot notation if we want to get inner values $ld->time('article.datePublished'); |
Dot notation would make more sense IMO, having methods with the same names and arguments working in such different ways isn't very clear. I have now added the 'double call' solution to pr #386 - which fixes the test included. Thanks for your guidance on this - working on a project where I'm using this to get info on various submited links, so hope to be able to add further PRs should I find other sites where the current extractors miss something! |
Great. Thanks for your contribution! |
I've taken a quick run at this as it seems pretty fixable, but can't work out where it's going wrong.
You can see an example on the online demo here: https://oscarotero.com/embed/demo/index.php?url=https%3A%2F%2Fwww.bbc.com%2Fnews%2Fuk-england-london-54204929
The BBC article has a 'datePublished' & 'dateModifed' set in the "All data collected" array, but nothing is extracted to the 'publishedTime' field.
Looking at the PublishedTime detector it is looking for a 'datepublished' option, but I haven't followed the logic to work out why it's failing.
The text was updated successfully, but these errors were encountered: