Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doesn't extract PublishedTime from datePublished field #385

Closed
paulhennell opened this issue Sep 18, 2020 · 5 comments
Closed

Doesn't extract PublishedTime from datePublished field #385

paulhennell opened this issue Sep 18, 2020 · 5 comments

Comments

@paulhennell
Copy link
Contributor

I've taken a quick run at this as it seems pretty fixable, but can't work out where it's going wrong.
You can see an example on the online demo here: https://oscarotero.com/embed/demo/index.php?url=https%3A%2F%2Fwww.bbc.com%2Fnews%2Fuk-england-london-54204929

The BBC article has a 'datePublished' & 'dateModifed' set in the "All data collected" array, but nothing is extracted to the 'publishedTime' field.

Looking at the PublishedTime detector it is looking for a 'datepublished' option, but I haven't followed the logic to work out why it's failing.

@oscarotero
Copy link
Owner

Mmm, seems like the detector is only looking for pagePublished (not datePublished or dataModified) in the linkedData store: https://github.com/oscarotero/Embed/blob/master/src/Detectors/PublishedTime.php#L26

Could you test if including the other two keys fixes the problem? And create a pull request?
Many thanks!

@paulhennell
Copy link
Contributor Author

I've tried changing the code in Published time but getting some varied results.

Running with test in PR: #386

//Returns date as expected & passes test
?: $ld->time('datePublished')

//Seems like it should work based on exisiting code, but always returns null
?: $ld->time('pagePublished', 'datePublished')

//Throws and error!
//Error: Call to undefined method ML\JsonLD\TypedValue::getProperty()
// From LinkedData.php L71
?: $ld->time('datePublished', 'pagePublished')

I attempted to delve a bit deeper, but got a bit lost in what it's trying to do within linkeddata get.

@oscarotero
Copy link
Owner

Ok, I see. The problem is that linkedData does not work exactly like other containers like meta. In meta, you can pass multiple arguments and it returns the first value found. In linkedData, the arguments are to retrieve properties inside other properties. For example:

$ld->time('pagePublished', 'datePublished'); //Returns the value pagePublished->datePublished

In this case, you should to this:

$ld->time('pagePublished') ?: $ld->time('datePublished');

Maybe the ld container should be changed at some point for a similar behavior as others. For example, using a dot notation if we want to get inner values $ld->time('article.datePublished');

@paulhennell
Copy link
Contributor Author

Dot notation would make more sense IMO, having methods with the same names and arguments working in such different ways isn't very clear.

I have now added the 'double call' solution to pr #386 - which fixes the test included.

Thanks for your guidance on this - working on a project where I'm using this to get info on various submited links, so hope to be able to add further PRs should I find other sites where the current extractors miss something!

@oscarotero
Copy link
Owner

Great. Thanks for your contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants