Make EM the generic translator. Addresses #1092 #1434

adomasven · 2017-10-06T09:19:55Z

If EM only contains webpage data, run DOI translator's detectWeb within EM and if there are DOIs present, return undefined, otherwise return webpage(Gray?).

Change special-case code within the Connector to always allow running the EM translator. If EM returns undefined then list EM options with and without snapshot as the final two options in the context menu, otherwise list the EM translator according to its priority.

If there are DOIs present, EM will not overshadow the DOI translator and otherwise will overtake. If both rich EM data and DOI are present then both can co-exist. This way we can avoid any changes or special translation handling within zotero and translation server and have a translator for every page. It sacrifices code clarity in the intermediate term, but is a workable solution for the short-term until someone has the time and spirit to commit to the bigger change.

The changes in the connector are coming soon, but we should get this EM translator in circulation early. The consequence of this update is that connector will detect EM translator for all previous pages that only displayed webpage saving options.

Alternatively, we should at least merge the part in importRDF() that will allow the EM translator to save even when init() doesn't detect a viable item type, such that when push out the updated Connector and it tries to save with EM translation doesn't break.

Awaiting comments from @dstillman regarding connector related stuff.

dstillman · 2017-10-09T06:56:05Z

My main concern here is duplicating the page scan for DOI. It's probably quite fast, but we obviously want to be very careful with translators that run on every page load, so we should at least do some profiling to see how many milliseconds it would be adding.

But I agree that we can merge the importRDF part first (and that will let us start using it in translation-server too, which is a priority). Have you tested the saving here? Any unexpected differences between the data from EM and the existing webpage option for a generic page?

adomasven · 2017-10-09T11:39:07Z

Have you tested the saving here? Any unexpected differences between the data from EM and the existing webpage option for a generic page?

I have tested it for functionality and it works! Otherwise, yes, there are somewhat significant, but expected differences between EM and the current code we have.

adomasven · 2017-10-09T11:42:56Z

Okay, so I've pushed just the change that allows EM to save even when it doesn't detect any significant tags. Good to review and merge.

adomasven · 2017-10-09T11:54:22Z

My main concern here is duplicating the page scan for DOI. It's probably quite fast, but we obviously want to be very careful with translators that run on every page load, so we should at least do some profiling to see how many milliseconds it would be adding.

See the scan code. It is an XPath query and then running a regexp for each returned result. The load time we're adding for each page is proportional to the length of the page, at O(n log n). On this webpage (which is one of the testcases for DOI detectWeb() takes 0.9ms in Chrome. I think we're okay.

adomasven · 2017-10-10T09:37:47Z

@adam3smith please take a look at this when you have time

adam3smith · 2017-10-11T04:18:00Z

Should I be seeing this?

The changes in the connector are coming soon, but we should get this EM translator in circulation early. The consequence of this update is that connector will detect EM translator for all previous pages that only displayed webpage saving options.

Because that doesn't appear to be the case. e.g. still getting the grey Webpage icon on
http://www.example.com/

dstillman · 2017-10-11T06:00:21Z

No, we took that out for the moment — the only thing necessary for this version is make sure we're OK applying the EM low-quality save to all webpages, in place of the current special-case webpage code (which just saves document.title, URL, and access date), noting that these would include pages where no effort was made to embed metadata. I haven't looked at the whole function, but I would say, for example, that we don't want to assign the hostname as Library Catalog for all webpages (or at all for EM saves). And I'm not sure if including the keywords field as tags is a good idea.

zuphilip · 2017-10-11T07:19:19Z

And I'm not sure if including the keywords field as tags is a good idea.

The tags will be emptied if EM is not called from another translator, see https://github.com/adomasven/translators/blob/21dbb16f0f9e6a54ef5109a8fa58f75a3ba8a0c6/Embedded%20Metadata.js#L821-L823

adomasven · 2017-10-11T07:28:11Z

The tags will be emptied if EM is not called from another translator, see https://github.com/adomasven/translators/blob/21dbb16f0f9e6a54ef5109a8fa58f75a3ba8a0c6/Embedded%20Metadata.js#L821-L823

We should probably do the same thing for libraryCatalog.

Otherwise the function adds:

Title
Creators (new for generic saves)
Abstract note from <meta name="description"/> (new)
Url
Access date

Creators and abstract note quality will vary wildly. I'm not sure if even the worst possible abstract is worse than no abstract. A bad creator, on the other hand is worse than no creator when citing. So maybe add creators only for saves from another translator too? (Although if creators come from highwire metadata, then they're okay to include. This is specifically applicable to low quality fields.)

dstillman · 2017-10-11T07:53:16Z

We can look around, but I think we want to keep the author meta tag — I’m sure it’s often wrong, but it’s a legit field that, unlike keywords, doesn’t have a long history of (worthless) SEO abuse. Seems like people should be able to use it to specify the author of a generic webpage.

adomasven · 2017-10-11T09:35:07Z

I've updated this to exclude libraryCatalog for non-child-EM-translator saves. There will be a follow up after we merge some changes into the connector.

Awaiting approval.

adam3smith · 2017-10-11T12:46:58Z

I'm sorry I'm dense here (and by all means feel free to merge this if it's blocking other things). But if you want me to look at it more closely, could someone give me a page where to see the changed behavior at work?

adomasven · 2017-10-11T13:00:08Z

Ugh, sorry. The main change was supposed to be this, although it seems that new Zotero.Item(undefined) actually defaults to 'webpage' either way, so aside from removing "libraryCatalog" (which, FWIW, doesn't even apply to 'webpage' items, i think?), the PR actually doesn't change anything. And it means that it doesn't block the connector changes either.

I'll push the initial changes to this PR and we can start over, although we need to merge the connector code first, so it will have to wait. Once again, sorry to bother you.

adam3smith · 2017-10-11T13:21:55Z

no worries, thanks for clarifying.

adomasven force-pushed the feature/generic-EM-translator branch from 21dbb16 to 72f5ceb Compare October 9, 2017 11:41

adomasven mentioned this pull request Oct 10, 2017

Generic EM translation in Zotero zotero/zotero-connectors#193

Open

Allow EM to save a webpage even when no embedded tags are detected

e7ee6c3

adomasven force-pushed the feature/generic-EM-translator branch from 72f5ceb to e7ee6c3 Compare October 11, 2017 09:32

zuphilip added the on hold label Jan 4, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make EM the generic translator. Addresses #1092 #1434

Make EM the generic translator. Addresses #1092 #1434

adomasven commented Oct 6, 2017

dstillman commented Oct 9, 2017

adomasven commented Oct 9, 2017

adomasven commented Oct 9, 2017 •

edited

Loading

adomasven commented Oct 9, 2017

adomasven commented Oct 10, 2017

adam3smith commented Oct 11, 2017

dstillman commented Oct 11, 2017

zuphilip commented Oct 11, 2017

adomasven commented Oct 11, 2017 •

edited

Loading

dstillman commented Oct 11, 2017

adomasven commented Oct 11, 2017

adam3smith commented Oct 11, 2017

adomasven commented Oct 11, 2017

adam3smith commented Oct 11, 2017

Make EM the generic translator. Addresses #1092 #1434

Are you sure you want to change the base?

Make EM the generic translator. Addresses #1092 #1434

Conversation

adomasven commented Oct 6, 2017

dstillman commented Oct 9, 2017

adomasven commented Oct 9, 2017

adomasven commented Oct 9, 2017 • edited Loading

adomasven commented Oct 9, 2017

adomasven commented Oct 10, 2017

adam3smith commented Oct 11, 2017

dstillman commented Oct 11, 2017

zuphilip commented Oct 11, 2017

adomasven commented Oct 11, 2017 • edited Loading

dstillman commented Oct 11, 2017

adomasven commented Oct 11, 2017

adam3smith commented Oct 11, 2017

adomasven commented Oct 11, 2017

adam3smith commented Oct 11, 2017

adomasven commented Oct 9, 2017 •

edited

Loading

adomasven commented Oct 11, 2017 •

edited

Loading