Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Embedded Metadata: Add basic LD-JSON support and some missing meta tags #2765

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

owcz
Copy link
Contributor

@owcz owcz commented Dec 31, 2021

  • Handled standard LD-JSON authors, date, and tags, which should cover most use cases and make EM much better with news sites and blogs (baby steps towards JSON-LD export/import #917)
    • Might need to handle more dates but this is a sufficient start
  • Handed a few missing yet standard meta tags to remove need for more one-off translators
  • Repaired translator's ability to import meta keywords as tags; this was built-in but had broken logic
    • Keywords tend to be okay but I could see it potentially importing garbage tags in the event it's better to remove it altogether

@@ -688,7 +724,7 @@ function addLowQualityMetadata(doc, newItem) {
}
}
// fall back to "keywords"
if (!newItem.tags.length) {
if (newItem.tags.length == 0) { // this needs to be tested; previously was !...length, which evaluated to true when zero
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might need to be tested; previously was !newItem.tags.length, which evaluated to true when zero

appeared to work fine on https://www.bloomberg.com/news/articles/2019-09-12/peloton-founder-goes-from-kickstarter-to-a-450-million-fortune

newItem.tags = [newItem.tags];
}
if (newItem.tags && newItem.tags.length && Zotero.parentTranslator) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure what Zotero.parentTranslator was checking for here—I'd think this should run whenever EM is finding a tag string to parse, not only when it's the parent? Unless the point is to only run when it has a child translator so the child translator can choose what to do with the sometimes garbage tags?

@@ -913,11 +954,14 @@ function finalDataCleanup(doc, newItem) {
newItem.tags = tags;
}
}
if (newItem.tags.length > 0) newItem.tags = scrubLowercaseTags(newItem.tags);

/* what else is automatically adding tags? if it's bad data, don't import it, but resetting all tags (per below) kills good LD-JSON tags
else {
// Unless called from another translator, don't include automatic tags,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What else is automatically adding tags? Resetting all tags here—the default—kills (now) good LD-JSON tags. Better to just avoid anything that adds garbage tags if we want to be vigilant, rather than adding and immediately wiping it out?

@owcz owcz mentioned this pull request Dec 31, 2021
@dstillman
Copy link
Member

Thanks for working on this. We could consider merging this for now, but this isn't really how we want to support JSON-LD. Zotero already supports RDF import — we ideally would just use that. And we actually wrote a JSON-LD translator already in #1897 that does so, and which uses more accurate JSON-LD parsing, but it hasn't yet been merged. There, it gets called from a new Generic translator, which evaluates the data from various sources, including the EM translator. See this comment for discussion of some of the problems with JSON-LD, including the fact that there can be multiple JSON-LD blocks on a page.

This will need to be reviewed in the context of #1092. Since those changes will require a change to core code to be rolled out — zotero/zotero#1656 will need to be moved to zotero/utilities — we could consider adding this in the meantime, but we need to confirm that it wouldn't result in bad data.

@dstillman
Copy link
Member

In particular, I'd want to hear @mrtcode's thoughts on this.

This was referenced Dec 31, 2021
… add tests for deprecated single-site translators
This was referenced Dec 31, 2021
This was referenced Dec 31, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants