Add Sacramento Bee #1363

owcz · 2017-07-08T23:00:35Z

tests text()/attr() polyfill from #1277

dstillman · 2017-07-08T23:33:17Z

Sacramento Bee.js

+
+	// Authors
+	var authorMetadata = doc.querySelectorAll('.ng_byline_name');
+	if (authorMetadata) {


querySelectorAll() always returns a NodeList, even if it's empty, so you would want to use authorMetadata.length here.

dstillman · 2017-07-08T23:36:32Z

Sacramento Bee.js

+	item.title = attr(doc,'[property="og:title"]','content');
+	item.date = text(doc,'.published-date');
+	item.abstractNote = text(doc,'#content-body- p');
+	item.tags = attr(doc,'meta[name="keywords"]','content').split(", ");


attr() will return null if the selector isn't found, which would produce an error, so it'd be safer to assign it to a keywords variable and then do if (keywords) { item.tags = keywords.split(", "); }.

adam3smith

Looks good. Some small things.

adam3smith · 2017-07-09T03:37:20Z

Sacramento Bee.js

+
+function scrape(doc, url) {
+	var item = new Zotero.Item("newspaperArticle");
+	item.websiteTitle = "The Sacramento Bee";


item.publicationTitle cf. https://aurimasv.github.io/z2csl/typeMap.xml#map-newspaperArticle

adam3smith · 2017-07-09T04:17:44Z

Sacramento Bee.js

+function detectWeb(doc, url) {
+	if (url.match(/article\d+/)) {
+		return "newspaperArticle";
+	} else if (url.match(/\/(news|sports|entertainment)\//)) {


I'd be more comfortable if you implement a version of a getSearchResults function (using querySelectorAll instead of ZU.xpath, of course). These matches cover a broad set of pages and we really want to avoid false positives. There's a reason @zuphilip puts them in all translators.

Also, use .search(/regex/)!= -1 for efficient tests of strings against regexs

.search is more efficient than .match?

yes, because it just has to return true/false vs. an actual match. See https://jsperf.com/exec-vs-match-vs-test-vs-search/5 (indexOf is dramatically more efficient, so you could also use that for this one since you don't really need a regex, just three separate strings)

mozilla actually makes this distinction explicitin in the docs

When you want to know whether a pattern is found and also its index in a string use search() (if you only want to know it exists, use the similar test() method on the RegExp prototype, which returns a boolean); for more information (but slower execution) use match() (similar to the regular expression exec() method).

I poked around and it looks like test() becomes (15%) more efficient than multiple indexOf()s when comparing for multiple strings

adam3smith · 2017-07-09T04:18:54Z

Sacramento Bee.js

+		return "multiple";
+	} else if (url.indexOf("/search/?q=") != -1) {
+		return "multiple";
+	} else return null;


(no need for an explicit return null or false here)

adam3smith · 2017-07-09T04:20:02Z

Sacramento Bee.js

+		item.tags = keywords.split(", ");
+	}
+	item.attachments.push({
+		title: "The Sacramento Bee snapshot",


capitalize "snapshot" per convention.

adam3smith · 2017-07-09T04:22:47Z

Sacramento Bee.js

+						"creatorType": "author"
+					}
+				],
+				"date": "January 08, 2015 1:09 PM",


use ZU.strToISO to avoid importing English dates into non-English locales.

adam3smith · 2017-07-09T16:35:50Z

Sacramento Bee.js

+function detectWeb(doc, url) {
+	if (url.search(/article\d+/) != -1) {
+		return "newspaperArticle";
+	} else if (url.search(/(\/((news|sports|entertainment)\/)|(search\/\?q=))|sacbee\.com\/?$/) != -1  && getSearchResults(doc, true)) {


@owcz -- this here is half the reason we want the getSearchResults function (or something like it): by including it in detectWeb, you make sure you don't get false positives. Even if you're quite confident that you have the site covered as it is now, imagine they change their CMS -- with things as you had them, you could get false positives in all sorts of places, including on article pages where the generic metadata translator might perform OK otherwise.

adam3smith · 2017-07-09T16:44:11Z

This is good to merge, but I'll hold off for a little to see if the discussion in #1277 changes anything.

test() is most efficient for testing regex, more efficient than multiple indexOf()s: https://jsperf.com/zotero/1

adam3smith · 2017-07-09T18:27:21Z

Sacramento Bee.js

@@ -40,9 +40,9 @@
 function attr(doc,selector,attr,index){if(index>0){var elem=doc.querySelectorAll(selector).item(index);return elem?elem.getAttribute(attr):null}var elem=doc.querySelector(selector);return elem?elem.getAttribute(attr):null}function text(doc,selector,index){if(index>0){var elem=doc.querySelectorAll(selector).item(index);return elem?elem.textContent:null}var elem=doc.querySelector(selector);return elem?elem.textContent:null}

 function detectWeb(doc, url) {
-	if (url.search(/article\d+/) != -1) {
+	if (/article\d+/.test(url) != false) {


no need for != false -- that's what if does aready.

adam3smith · 2017-07-15T21:20:39Z

Cool, thanks!

* use test() in detectWeb test() is most efficient for testing regex, more efficient than multiple indexOf()s: https://jsperf.com/zotero/1

Add Sacramento Bee

9d2fe33

dstillman reviewed Jul 8, 2017

View reviewed changes

fixes from review, handle more multis

3de67c9

adam3smith requested changes Jul 9, 2017

View reviewed changes

owcz and others added 2 commits July 9, 2017 01:06

misc fixes & fixes per review

727cf9f

use getSearchResults in detectWeb

8bf1c9a

adam3smith reviewed Jul 9, 2017

View reviewed changes

use test() in detectWeb

2078956

test() is most efficient for testing regex, more efficient than multiple indexOf()s: https://jsperf.com/zotero/1

adam3smith reviewed Jul 9, 2017

View reviewed changes

rmv redundant check against false

6fc4601

adam3smith merged commit ae41654 into zotero:master Jul 15, 2017

zuphilip pushed a commit to zuphilip/translators that referenced this pull request Mar 28, 2018

Add Sacramento Bee (zotero#1363)

f61e1fe

* use test() in detectWeb test() is most efficient for testing regex, more efficient than multiple indexOf()s: https://jsperf.com/zotero/1

zuphilip pushed a commit to zuphilip/translators that referenced this pull request Mar 28, 2018

Add Sacramento Bee (zotero#1363)

6fdfe8a

* use test() in detectWeb test() is most efficient for testing regex, more efficient than multiple indexOf()s: https://jsperf.com/zotero/1

owcz deleted the sacbee branch July 8, 2018 13:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Sacramento Bee #1363

Add Sacramento Bee #1363

owcz commented Jul 8, 2017

dstillman Jul 8, 2017

dstillman Jul 8, 2017 •

edited

Loading

adam3smith left a comment

adam3smith Jul 9, 2017

adam3smith Jul 9, 2017

owcz Jul 9, 2017

adam3smith Jul 9, 2017

adam3smith Jul 9, 2017

owcz Jul 9, 2017 •

edited

Loading

adam3smith Jul 9, 2017

adam3smith Jul 9, 2017

adam3smith Jul 9, 2017

adam3smith Jul 9, 2017

adam3smith commented Jul 9, 2017

adam3smith Jul 9, 2017

adam3smith commented Jul 15, 2017

Add Sacramento Bee #1363

Add Sacramento Bee #1363

Conversation

owcz commented Jul 8, 2017

Choose a reason for hiding this comment

dstillman Jul 8, 2017 • edited Loading

Choose a reason for hiding this comment

adam3smith left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

owcz Jul 9, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adam3smith commented Jul 9, 2017

Choose a reason for hiding this comment

adam3smith commented Jul 15, 2017

dstillman Jul 8, 2017 •

edited

Loading

owcz Jul 9, 2017 •

edited

Loading