Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Frieze.js #1259

Merged
merged 3 commits into from
Aug 4, 2018
Merged

Add Frieze.js #1259

merged 3 commits into from
Aug 4, 2018

Conversation

owcz
Copy link
Contributor

@owcz owcz commented Mar 12, 2017

Art magazine https://frieze.com/

@zuphilip zuphilip added the New Translator Pull requests for new translators label Nov 12, 2017
@owcz
Copy link
Contributor Author

owcz commented Jul 8, 2018

ready for review

@owcz -- much preferred to get item type right in `detectWeb`. Among other things, otherwise tests fail.
Copy link
Collaborator

@adam3smith adam3smith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good in general -- see if my question about not excluding /event and /media items from multiples makes sense.



function detectWeb(doc, url) {
if (url.includes("/article/") { // does not handle /event/ or /media/ pages, which EM alone can handle
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For those /event and media pages, you should be able to exclude them from the EM fixes you make in scrape, right? That way they'd work correctly when imported as part of search results (currently that'd e.g. all turn them into blogposts)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The event/media pages are basically listings, so I didn't think anyone would want to cite them like we wouldn't handle classified ads in a newspaper (hence why I excluded them from the multis)
Let me know if you'd still want them

let href = rows[i].href;
let title = ZU.trimInternal(rows[i].textContent);
if (!href || !title) continue;
if (/\/(event|media)\//.test(href)) continue; // scrap items that link to /event/ or /media/ pages
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if what I say above is right, you could then remove this restriction.

@adam3smith adam3smith removed the on hold label Aug 4, 2018
@adam3smith adam3smith merged commit c2b6d8f into zotero:master Aug 4, 2018
@adam3smith
Copy link
Collaborator

Thanks!
(I'm not sure what the deal is with the failing tests: it complains about a deleted file that's no in deleted.txt -- but we're not actually deleting anything here)

@owcz owcz deleted the frieze branch August 4, 2018 19:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
New Translator Pull requests for new translators
Development

Successfully merging this pull request may close these issues.

3 participants