Kstudy translator #1308

yunusong · 2017-05-26T15:33:35Z

Translator for the Kstudy database.

Help appreciated for attachments, changing authors to a single field, and opening a javascript url at "journal/list_name.asp" detect. But any suggestions to improve the translator is welcome.

Thank you.

fbennett · 2017-05-27T02:55:07Z

I've raised my hand, so ... a question first for the translator gurus listening in. The translator is based on the Framework, but it's going to need some out-of-the-ordinary post-processing of name field content, at least one intermediate xmlhttprequest call to acquire a ToC for one form of "multiple" content, and another to acquire attachments.

Is it possible to shoehorn arbitrary JS code into a Framework translator, or should things be recoded from scratch?

avram · 2017-05-27T02:57:32Z

So FW does have some capacity to hook into the processing and run arbitrary JS, but I suspect it's better to rewrite than to do much complex work in those hooks

zuphilip · 2017-05-27T12:12:14Z

Is it possible to shoehorn arbitrary JS code into a Framework translator, or should things be recoded from scratch?

One can add more JS to the functions detectWeb, doWeb for example

translators/Foreign Policy.js

Lines 17 to 23 in 346e164

    
           function detectWeb(doc, url) { 
        
           	if(!ZU.xpath(doc, '//article/h1').length && !ZU.xpath(doc, '//div[@class="inner"]//div[contains(@class, "text_wrapper")]/h2/a')) { 
        
           		return; 
        
           	} 
        
           	return FW.detectWeb(doc, url); 
        
           } 
        
           function doWeb(doc, url) { return FW.doWeb(doc, url); }

; or define a new function and call it somewhere in the FW chain.

Personally, I prefer the non-FW translators and here it seems that anything else would be pretty complex as well. @yunusong I can rewrite what you have as a normal translator without the FW, if this helps.

fbennett · 2017-05-27T12:40:30Z

@zuphilip if you have time, and if @yunusong agrees, that would be great (Philipp is much more skilled with translators than I am).

With a non-FW translator as a base, I could then splice in conditional operations that would wake up in Juris-M, for transliterated names and whatnot.

zuphilip · 2017-05-27T14:02:32Z

@fbennett Here is a non-FW version of this code: https://github.com/zuphilip/translators/blob/yunusong-patch-2/KStudy.js

yunusong · 2017-05-27T14:14:38Z

Thanks @zuphilip @fbennett !

Author field is not that big of a deal, actually. I tried adding ".remove(/(/).remove()/)" after ".split(/,/)" so that it won't take ")" as the last name. Then it took the romanized last name as the last name, and that was good enough for me. Separating last name and first name is almost never necessary in Korean academia.

On the other hand, I don't mind you guys rewriting the code at all. If it helps improving the code, I am all for it. What I would really like is the translator to automatically fetch the pdf files. The xpath for link to pdf files is "//div[@Class="choice"]/span[1]/a/@href" and an example link is
"javascript:SiteSelect1('http://210.101.116.18/kiss10/inFTP_Journal.asp', '87300332.pdf', 2201, 25169, 1);"
I tried rewriting the code without the framework myself, but I didn't have any better luck with the attachments. If you can help me with the attachments, that would be wonderful. Thank you.

yunusong · 2017-05-27T15:50:58Z

This looks great. You should replace the original file with this one on the pull request and also add your name to the creator. Thank you for this. 2017년5월27일 (토요일) 오전 10:02에 Philipp Zumstein <[email protected]>님이 쓴 메시지: @fbennett Here is a non-FW version of this code: https://github.com/zuphilip/translators/blob/yunusong-patch-2/KStudy.js— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

zuphilip · 2017-05-27T16:26:17Z

@yunusong I created a PR towards your fork and branch yunusong#1 . After accepting this, we should see it here as well.

Rewrite KStudy.js as Non-FW translator

I figured out how to unravel the javascripted links in Journal Issue type of "multiple". I tested on my zotero and confirmed that it works.

zuphilip · 2017-05-27T21:35:29Z

KStudy.js

+{
+	"translatorID": "b298ca93-0010-48f5-97fb-e9923519a380",
+	"label": "KStudy",
+	"creator": "Yunwoo Song","Philipp Zumstein"


Okay, thank you, but this needs to be one string containing both names, i.e.

"Yunwoo Song, Philipp Zumstein"

yunusong · 2017-05-27T22:18:13Z

I located the JS file which contains the function for downloading pdf.
The function goes as
function SiteSelect1(ftproot, a_imag, inst_key, publ_key,isDownLoad) {
var url = "../search/download.asp?ftproot=" + ftproot + "&inst_key=" + inst_key + "&a_imag=" + a_imag + "&isDownLoad=" + isDownLoad + "&publ_key=" + publ_key
window.open(url, "", "toolbar=0,location=0,directories=0,status=0,menubar=0,scrollbars=1,resizable=1,copyhistory=1,top=170,left=250,width=540,height=500");
}

So I coded item.attachments as follows.

var pdfJSurl = ZU.xpathText(doc, '//div[@class="search_box"]/div[@class="choice"]/span[1]/a/@href');
var pdfurlKeys = pdfJSurl.match(/Select1\(\'(\S+)\'\,\s?\'(\S+)\'\,\s?(\S+)\,\s?(\S+)\,\s?(\S+)\)\;/);
var pdfurl = "../search/download.asp?ftproot=" + pdfurlKeys[1] + "&inst_key=" + pdfurlKeys[3] + "&a_imag=" + pdfurlKeys[2] + "&isDownLoad=" + pdfurlKeys[5] + "&publ_key=" + pdfurlKeys[4];
item.attachments.push({
	title : item.title,
	url : pdfurl,
	mimeType : "application/pdf",
	snapshot : false
})

But Zotero still won't fetch the pdf.

I tried fetching the resulting url by assigning to an unused slot. eg. issn. as
item.ISSN = pdfurl;
An example of a resulting url is as follows.
../search/download.asp?ftproot=http://210.101.116.17/kiss8/inFTP_Journal.asp&inst_key=7074&a_imag=40323420.pdf&isDownLoad=1&publ_key=25408

I tried pasting this address directly on a browser and it will start downloading the pdf.
So I am wondering what is still blocking Zotero from storing the pdf.

The odd thing I noticed about the original function was that it does not end the "var url =" statement with a semicolon. The semicolon only appears at the end of the second line. Would this have something to do with the problem? Thank you guys for the comments.

zuphilip · 2017-05-27T22:37:02Z

KStudy.js

@@ -1,7 +1,7 @@
 {
 	"translatorID": "b298ca93-0010-48f5-97fb-e9923519a380",
 	"label": "KStudy",
-	"creator": "Yunwoo Song","Philipp Zumstein"
+	"creator": "Yunwoo Song, Philipp Zumstein"


This is not valid JSON. You need to add a comma at the end of this line.

zuphilip · 2017-05-27T22:37:53Z

Can you try snapshot : true in your code above?

fbennett · 2017-05-27T22:41:25Z

I'll take a look on Monday (I need to be in the University net to get access). As an offhand thought in the meantime, assuming that the translator is loading properly, there may be a redirect that the translator is not able to follow without some further jiggery-pokery. (You can follow the transactions by opening Developer Tools in the browser and then refreshing on the download URL.)

yunusong · 2017-05-27T22:48:01Z

Thank you @fbennett !
@zuphilip I just tried "snapshot : true" but nothing changed. Zotero still shows a big red X for attachments, and the test page at scaffold will not show any value for "url" under attachments. I also tried "snapshot : false," but no pdf either.

adam3smith · 2017-05-28T03:50:46Z

Thanks everyone -- please let me know once this is ready for final review. As for FW, given #1277, I'd prefer us avoiding new FW translators.

yunusong · 2017-05-28T15:12:29Z

@fbennett I have one more question.
When I erased all other attributes of the attachment and just left the url as "url : pdfurl" (which gives a long url that would have been the result of the function SiteSelect1), Zotero saves the snapshot of the page that leads to downloading, and the browser downloads the pdf file to my download folder in my computer.

I think I need the translator to access this final url directly. The gist of the html code that calls the url is as follows.

<body onLoad="document.down.submit();">
<form name="down" method="post" action="http://210.101.116.17/kiss8/inftp_journal.asp">
<input type="hidden" name="a_code" value="40323420.pdf">
<input type="hidden" name="code" value="7624112825580073101">
<input type="hidden" name="isDownLoad" value="1">
</form>
</body>

If you paste this html code into a new text and open it in a browser, you will see that it downloads a pdf file even without requiring proxy. Do you think you can work around this code and have the zotero to fetch the final url? I really appreciate your help.

fbennett · 2017-05-28T23:01:22Z

Yes, that's the plan. Likely it can be done by nesting another call that runs inside the pop-up.

…

On May 29, 2017 00:12, "yunusong" ***@***.***> wrote: @fbennett <https://github.com/fbennett> I have one more question. When I erased all other attributes of the attachment and just left the url as "url : pdfurl" (which gives a long url that would have been the result of the function SiteSelect1), Zotero saves the snapshot of the page that leads to downloading, and the browser downloads the pdf file to my download folder in my computer. I think I need the translator to access this final url directly. The gist of the html code that calls the url is as follows. If you paste this html code into a new text and open it in a browser, you will see that it downloads a pdf file even without requiring proxy. Do you think you can work around this code and have the zotero to fetch the final url? I really appreciate your help. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1308 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEmSj241_OO6f-rfnGR3xNFLU5gZB5tks5r-Y7egaJpZM4Nnvpq> .

yunusong · 2017-05-29T13:41:34Z

Can you guys tell me what is wrong with my code? I don't think I quite understand how to use ZU.doPost. But I am quite certain that I have to use it because the actual page that leads to pdf has method="post".

So in the scrape function I added the code for attachments as follows.

`var` pdfJSurl = ZU.xpathText(doc, '//div[@class="search_box"]/div[@class="choice"]/span[1]/a/@href');
`var` pdfurlKeys = pdfJSurl.match(/Select1\(\'(\S+)\'\,\s?\'(\S+)\'\,\s?(\S+)\,\s?(\S+)\,\s?(\S+)\)\;/);
`var` pdflinkurl = "../search/download.asp?ftproot=" + pdfurlKeys[1] + "&inst_key=" + pdfurlKeys[3] + "&a_imag=" + pdfurlKeys[2] + "&isDownLoad=" + pdfurlKeys[5] + "&publ_key=" + pdfurlKeys[4];

`item.attachments.push({`
	url : ZU.processDocuments(pdflinkurl, getpdf),
	snapshot : true,
})

I was hoping that this would process the getpdf function on the page that opens. And for getpdf function, I coded it as follows.

function getpdf(doc, url) {
var theurl = ZU.xpathText(doc, '//form[@name="down"]/@action');
var a_code = ZU.xpathText(doc, '//input[@name="a_code"]/@value');
var code = ZU.xpathText(doc, '//input[@name="code"]/@value');
var isDownload = ZU.xpathText(doc, '//input[@name="isDownload"]/@value');
var postData = [a_code, code, isDownload];
var headerData = {"a_code": a_code, "code": code, "isDownload": isDownload};
ZU.doPost(theurl, a_code + code + isDownload, function(pdfurl2) {});
}

I checked how other translators use the doPost, and they were very different from how I wrote it, but I don't quite understand how to fix it.

zuphilip · 2017-05-29T18:59:32Z

That is a random guess, but I think you have to switch the two things, i.e. something like

ZU.doPost(pdflinkurl, postdata, function(pdfdoc) {
   item.attachments.push({
	doc: pdfdoc,
	title: "Fulltext",
        mimetype: "pplication/pdf"
   });
   item.complete();
}

But I don't really know if this can work, because usually the function has to work with text...

fbennett · 2017-05-29T19:02:05Z

Picking up now (fell asleep last night, but I turned on the office proxy yesterday, so I'm "at work" this morning). Will fiddle around a bit and see what I can come up with.

yunusong · 2017-05-29T21:40:26Z

Ok, I think I figured out how to use ZU.doPost. But I still couldn't get Zotero to save pdf. This is the code I added at the end of the scrape function. The weird thing is, though, when I run test in Scaffold, the result will show "Translation Successful" AND my browser will download the pdf to my computer, even though I only ran test. And when I actually test it with Zotero, it will not fetch pdf. Any thoughts on what I should change?

var pdfJSurl = ZU.xpathText(doc, '//div[@class="search_box"]/div[@class="choice"]/span[1]/a/@href');
var pdfurlKeys = pdfJSurl.match(/Select1\(\'(\S+)\'\,\s?\'(\S+)\'\,\s?(\S+)\,\s?(\S+)\,\s?(\S+)\)\;/);
var pdflinkurl = "../search/download.asp?ftproot=" + pdfurlKeys[1] + "&inst_key=" + pdfurlKeys[3] + "&a_imag=" + pdfurlKeys[2] + "&isDownLoad=" + pdfurlKeys[5] + "&publ_key=" + pdfurlKeys[4];
ZU.processDocuments(pdflinkurl, function(doc, url) {
	var actionurl = ZU.xpathText(doc, '//form[@name="down"]/@action');
	var inputData = "a_code=" + ZU.xpathText(doc, '//input[@name="a_code"]/@value') + "&code=" + ZU.xpathText(doc, '//input[@type="code"]/@value') + "&isDownload=" + ZU.xpathText(doc, '//input[@name="isDownload"]/@value');
	ZU.doPost(actionurl, inputData, function(pdfdoc) {
		item.attachments.push({
			url : pdfdoc,
		});
	});
});

item.complete();

fbennett · 2017-05-29T23:02:28Z

Cracked it. I have code that attaches a readable PDF. As it turned out, no further call (to doGet, doPost or processDocuments) was necessary. Tried various things, but what did the trick was to use the viewer.asp URL that is the ultimate target of a view request. It's a POST call on the site, but setting the post data as a query string on the URL (which I presume the translator calls with GET) worked anyway. The only post data needed is code_num, which is just the server-side filename of the PDF. So it turns out to be super simple, after all our guesswork.

I'll file a pull request in a bit.

zuphilip · 2018-01-02T08:08:15Z

Scaffold has its own browser now and you can simply copy and paste whichever URL you wanted to see into the Current URL field. After copy and paste you have to hit enter and wait for the page to load. Does this work?

zuphilip · 2018-01-02T12:40:57Z

🌥 Okay, I am continuing to work here now. I will let you know when I have a new version...

zuphilip · 2018-01-02T15:51:08Z

I generalized the translator for the example(s) found by @adam3smith but choosed in the end journalArticle for that documented in new test cases. However, it seems for me, that the PDF download we see in the public part is maybe tailored for that and could be fragile at all:

Thus, I decided to skip the PDF download for the moment. Maybe, we can start with the translator as it is now. Any improvement for PDF download could be added later by someone with access to the database.

Deleted also the comments for downloading PDFs, which we do not yet handle.

zuphilip · 2018-01-02T15:58:04Z

Please have a look at the new version.

socheres · 2018-01-02T19:02:06Z

@zuphilip below some more test cases and remarks.
author in Korean should import in a "single field" in Zotero., because the first letters(Silben) of Korean names are usually surname and the other two letters are first name. Also, there is no middle name in korean.
Authors in latinized form with comma is separated correctly.
one more suggestions for issn.
//e.g. issn = ISSN(Print) var issn = ZU.xpathText(doc, '//li[label[text()="ISSN(Print)"]]'); if(issn){ item.ISSN = ZU.cleanISSN(issn); }

test cases:

http://kiss.kstudy.com/public/public2-article.asp?key=50789039

18:30:10 Returned item:
           {
             "itemType": "journalArticle"
             "creators": [
               {
                 "firstName": "Yoon Tae"
                 "lastName": "Kim"
                 "creatorType": "author"
               }
               {
                 "firstName": "Hyun Suk"
                 "lastName": "Park"
                 "creatorType": "author"
               }
             ]
             "notes": []
             "tags": []
             "seeAlso": []
             "attachments": []
             "title": "KOLMOGOROV DISTANCE FOR MULTIVARIATE NORMAL APPROXIMATION"
             "language": "ko-KR"
             "publicationTitle": "Korean Journal of mathematics (강원경기수학회)"
             "volume": "23"
             "issue": "1"
             "date": "2015"
             "pages": "1–10"
             "abstractNote": "초록 보기"
             "libraryCatalog": "kiss"
           }
18:30:10 Translation successful

http://kiss.kstudy.com/thesis/thesis-view.asp?key=3550330

18:31:51 Returned item:
           {
             "itemType": "journalArticle"
             "creators": [
               {
                 "lastName": "( Yoon Tae Kim ) "
                 "fieldMode": true
                 "creatorType": "author"
               }
               {
                 "lastName": " ( Hyun Suk Park )"
                 "fieldMode": true
                 "creatorType": "author"
               }
             ]
             "notes": []
             "tags": []
             "seeAlso": []
             "attachments": []
             "title": "Convergence rate of a test statistics observed by the longitudinal data with long memory"
             "language": "ko-KR"
             "publicationTitle": "CSAM(Communications for Statistical Applications and Methods)"
             "volume": "24"
             "issue": "5"
             "date": "2017-09"
             "pages": "481–492"
             "abstractNote": "초록 보기 This paper investigates a convergence rate of a test statistics given by two scale sampling method based on Ait-Sahalia and Jacod (Annals of Statistics, 37, 184-222, 2009). This statistics tests for longitudinal data having the existence of long memory dependence driven by fractional Brownian motion with Hurst parameter H ∈ (1/2, 1). We obtain an upper bound in the Kolmogorov distance for normal approximation of this test statistic. As a main tool for our works, the recent results in Nourdin and Peccati (Probability Theory and Related Fields, 145, 75-118, 2009; Annals of Probability, 37, 2231-2261, 2009) will be used. These results are obtained by employing techniques based on the combination between Malliavin calculus and Stein`s method for normal approximation."
             "libraryCatalog": "kiss"
           }
18:31:51 Translation successful

http://kiss.kstudy.com/public/public2-article.asp?key=50375984

18:33:10 Returned item:
           {
             "itemType": "journalArticle"
             "creators": [
               {
                 "firstName": "Ja-Son Y."
                 "lastName": "Kim"
                 "creatorType": "author"
               }
               {
                 "firstName": "Tae-Joon"
                 "lastName": "Park"
                 "creatorType": "author"
               }
               {
                 "firstName": "Jin-Sol"
                 "lastName": "Lee"
                 "creatorType": "author"
               }
               {
                 "firstName": "Ji-Yong"
                 "lastName": "Chun"
                 "creatorType": "author"
               }
               {
                 "firstName": "Joon-Seol"
                 "lastName": "Bae"
                 "creatorType": "author"
               }
               {
                 "firstName": "Byung-Lae"
                 "lastName": "Park"
                 "creatorType": "author"
               }
               {
                 "firstName": "Hyun-Sub"
                 "lastName": "Cheong"
                 "creatorType": "author"
               }
               {
                 "firstName": "Hyo-Suk"
                 "lastName": "Lee"
                 "creatorType": "author"
               }
               {
                 "firstName": "Yoon-Jun"
                 "lastName": "Kim"
                 "creatorType": "author"
               }
               {
                 "firstName": "Hyoung-Doo"
                 "lastName": "Shin"
                 "creatorType": "author"
               }
             ]
             "notes": []
             "tags": []
             "seeAlso": []
             "attachments": []
             "title": "Association Analysis of SERPINB5 Polymorphisms with HBV Clearance and HCC Occurrence in a Korean Population"
             "language": "ko-KR"
             "publicationTitle": "Genomics & informatics (한국유전체학회)"
             "volume": "8"
             "issue": "1"
             "date": "2010"
             "pages": "1–8"
             "abstractNote": "초록 보기"
             "libraryCatalog": "kiss"
           }
18:33:10 Translation successful

I added an itemType "report" for one type of new formats and added some minor details. I think we need to add "url.indexOf('public2-jrd-sub.asp')>-1" as one of the options for "multiple", but for some reason I keep getting errors. Can someone take a look at this? The example address is: http://kiss.kstudy.com/public/public2-jrd-sub.asp?key1=100171&key2=10017&selYEAR=2012&selVOL1=1&selNUM1=2&selMon=

zuphilip · 2018-01-02T19:38:39Z

KStudy.js

@@ -131,11 +133,23 @@ function scrape(doc, url) {
 		var containerParts = containerValue.match(/(.*)\s+(\d+)\D\s*(\d+)/);
 		if (containerParts) {
 			item.publicationTitle = containerParts[1];
-			item.volume = containerParts[2];
+			if (/\d{4}/.test(containerParts[2])) {
+				containerParts[2] = null;


Add a comment why you are doing this. Moreover, I guess you don't need to delete the value in the first case, but rather simply want to concentrate on the second case. Is this correct? Then you could instead try something like

if (containerParts[2].length<4) { // or the regexp if really needed item.volume = containerParts[2]; }

Oh, okay thank you for pointing that out.
The reason I added that line was that I found that sometimes the pages have published year in place of the volume number. So sometimes it would appear as Vol. 2013, and I didn't want that to happen, but your correction makes much more sense. Thank you. I'll fix that.

zuphilip · 2018-01-02T19:45:59Z

@yunusong For your question about the multiple in the public part: You need to tweak also the CSS selector for the different entries, i.e. this line

var rows = ZU.xpath(doc, '//div[contains(@class, "thesis-info")]/h5/a[contains(@href, "/thesis")]');

(The last check is not true contains(@href, "/thesis"); but change it wisely/carefully in order not to influence the other multiples negatively.)

zuphilip · 2018-01-02T19:55:08Z

@socheres Thank you for the review! The ISSN is a good idea to add as well. I am not sure that I understand the naming correctly. Did you see currently an error in the splitting of author names? Example?

socheres · 2018-01-02T20:21:53Z

@zuphilip authors name splitting works only correct in case 1, not in 2 or 3.
1.Kim, Moon Tae , Park, Hyun Suk
2.( Yoon Tae Kim ) , ( Hyun Suk Park )
3.김윤태 ( Yuntae Kim )

(Kim|Park|김) = last name

In case 2,3 they should be - if possible - in a full name field, like in Zotero where i can switch author field from double in single field.
3. 김 = last name
윤태 = first name
Kim = last name
Yuntae = first name

socheres · 2018-01-02T20:35:14Z

i think the website structure - authors - is too fragile and i think the effort is not worth it, because all the metadata on http://kiss.kstudy.com/ can be downloaded - third button below 인용하기 - in RIS.

zuphilip · 2018-01-02T22:58:42Z

Uh... RIS, yes we should use that in the translator (it is the blue button with the arrow to the right). I will look into this now...

zuphilip · 2018-01-03T00:23:06Z

Please have a look at the new version, which is pretty much a rewrite, but much simpler by using the RIS data. 🚀

adam3smith

A couple of small questions & requests.

adam3smith · 2018-01-03T03:51:38Z

KStudy.js

+
+function scrape(doc, url) {
+	var key = url.match(/\bkey=(\d+)\b/)[1];
+	var risURL = "http://kiss.kstudy.com/p-common/export_endnote.asp";


lets make this relative in case the site starts offering https

adam3smith · 2018-01-03T03:51:58Z

KStudy.js

+
+
+function scrape(doc, url) {
+	var key = url.match(/\bkey=(\d+)\b/)[1];


we should check for this in detect, then, to make sure we have a key

adam3smith · 2018-01-03T03:52:57Z

KStudy.js

+			}
+			item.language = "ko-KR";
+			if (item.pages) {
+				item.pages = item.pages.replace('-', '–');


we don't usually do this -- do we actually prefer en-dashes over hyphens in the database?

I don't think there's any harm in doing so, though we should probably just be automatically converting hyphens in page ranges to en dashes on save.

Hm... my mistake then. How is the page range then shown in citation styles? Maybe the replacement could also happens at some later step... I will delete it on the translator code, beause this would then be a larger issue.

adam3smith · 2018-01-03T03:53:32Z

KStudy.js

+		"items": [
+			{
+				"itemType": "journalArticle",
+				"title": "KOLMOGOROV DISTANCE FOR MULTIVARIATE NORMAL APPROXIMATION",


let's title case all caps titles

I tried to think about this before. I don't know how to check easily whether the title is all-uppercase but not a Korean title. Any ideas?

You should be able to just use ZU.capitalizeTitle(title, true).

Thank you. This also changes the title 치과용 콘빔 CT를 이용한 상악 정중과잉치의 3차원 분석, which I wanted to avoid. I guess some test like [\u0000-\u00FF]* should work for Latin alphabet only titles...

How does it change it?

it will be great if the English title can be normalized, taking capitalization e.g. proper noun into account, and other words such as USA, EU in capitalize form. There are some similar projects (https://github.com/nlp-compromise/compromise), but I haven't found a "killer Javascript function" yet.

If a site gives a title in all-caps, as seems to be the case here, we generally just title case it according to some basic rules and leave the fixing of proper nouns and acronyms up to users. (Along other things, we don’t even necessarily know the language in order to reliably exclude words.)

The issue here just seems to be (I haven’t looked at this closely) that some Korean titles include English characters, and we want to avoid changing the case of those, either in the translator or in a more generalized way in capitalizeTitle.

@dstillman Okay, my mistake. But the code pattern can still make sense then, to prevent making title case, when sentence case is correctly given.

I test that the majority of the title is from Latin alphabet. But your suggestion for capitalizeTitle on the words

not change words if they contain any characters outside [\u0000-\u00FF]

sounds reasonable for me.

@socheres The function ZU.capitalizeTitle can be used to "normalize" English titles into title case in translators, see https://github.com/zotero/zotero/blob/master/chrome/content/zotero/xpcom/utilities.js#L921 . However, it will add a capital letter to more or less every word with only a small number of exceptions cf. https://github.com/zotero/zotero/blob/2baa537542fcd5bc6e2826eb0eef6b34c571dcfc/chrome/content/zotero/xpcom/utilities.js#L922-L924 . To do anything similar e.g. for German and thereby differentiating between nouns and verbs etc. might be out of scope for Zotero.

Here is another example http://kiss.kstudy.com/thesis/thesis-view.asp?key=3480580 with the title

Analysis of Enzymes related to Lignin Modification of Phanerochaete chrysosporium ATCC20696 through the Transcriptomic and Proteomic Approaches

which is fine, but not title cased.

thank you @avram @zuphilip for explaining. i totally understand that fixing crappy data records is out of scope for zotero. it was just feedback - nice to have feature - from the users/scientists. publisher who create metadata have to deal it.

socheres · 2018-01-03T10:05:50Z

KStudy.js

+						"creatorType": "author",
+						"fieldMode": 1
+					}
+				],


romanized korean names with comma create two last names:

저자 : Kim, Yoon Tae, Park, Hyun Suk

Korean names are correct imported in "filedMode 1", but it will be nice to have, if korean names can be separated in last and first name like the function "ZU.cleanAuthor" for korean names.

Okay, you are right. The RIS data contains wrongly four authors in the AU tags. But I still can reuse some of the previous code for scraping it directly in these cases. Stay tuned..

socheres · 2018-01-03T10:08:46Z

KStudy.js

+						"lastName": "Hyun Suk",
+						"creatorType": "author",
+						"fieldMode": 1
+					}


line403-423

zuphilip · 2018-01-03T13:15:43Z

I did changes according to the reviews of @adam3smith and @socheres . Sometimes it is needed to scrape the information for the authors directly from the page. I hope that we have the author information now correctly.

Please have a look at the new version.

socheres · 2018-01-03T13:33:39Z

it looks 👍
source: > ( Chang-young Hong ) , ( Su-yeon Lee ) , ( Myungkil Kim ) , ( In-gyu Choi )
{ "itemType": "journalArticle" "creators": [ { "firstName": "Chang-young" "lastName": "Hong" "creatorType": "author" } { "firstName": "Su-yeon" "lastName": "Lee" "creatorType": "author" } { "firstName": "Myungkil" "lastName": "Kim" "creatorType": "author" } { "firstName": "In-gyu" "lastName": "Choi" "creatorType": "author" }

source > 저자 : Kim, Yoon Tae, Park, Hyun Suk
{ "itemType": "journalArticle" "creators": [ { "firstName": "Yoon Tae" "lastName": "Kim" "creatorType": "author" } { "firstName": "Hyun Suk" "lastName": "Park" "creatorType": "author" } ]
source > 김윤태 ( Yuntae Kim )
{ "itemType": "journalArticle" "creators": [ { "lastName": "김윤태" "creatorType": "author" "fieldMode": 1 }👏

yunusong · 2018-01-03T14:54:10Z

I checked the update, and things look really great. Thanks for the work.

adam3smith · 2018-01-04T13:00:04Z

Great, thanks everyone!

zuphilip · 2018-01-04T14:01:59Z

✨ Thank you all @yunusong, @fbennett, @avram, @dstillman, @socheres, @adam3smith for the work and the review! That was a massive collaboration on a Zotero translator!

Using RIS data and author tweaks

Create KStudy.js

df0a51a

Rewrite KStudy.js as Non-FW translator

c33a7d5

yunusong added 4 commits May 27, 2017 14:05

Merge pull request #1 from zuphilip/yunusong-patch-2

bd15aa7

Rewrite KStudy.js as Non-FW translator

Update href links for Journal Issues

f22c3b0

I figured out how to unravel the javascripted links in Journal Issue type of "multiple". I tested on my zotero and confirmed that it works.

added zuphilipp as a creator

df47881

creator change again

8cd64e6

zuphilip reviewed May 27, 2017

View reviewed changes

Update KStudy.js

72cf089

zuphilip reviewed May 27, 2017

View reviewed changes

comma added

d4fa029

Update KStudy.js to work also in the public part

50bbe5d

Deleted also the comments for downloading PDFs, which we do not yet handle.

zuphilip reviewed Jan 2, 2018

View reviewed changes

Update KStudy.js

2fe5b39

[KStudy] Rewrite by using RIS data + tweak multiples

add5b92

change indexOf to includes

c554a9d

adam3smith requested changes Jan 3, 2018

View reviewed changes

socheres reviewed Jan 3, 2018

View reviewed changes

KStudy.js Outdated

"lastName": "Hyun Suk",

"creatorType": "author",

"fieldMode": 1

}

Copy link

Contributor

socheres Jan 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line403-423

[KStudy.js] Update according to the reviews

5f07940

Remove space before colon

0e8d3be

adam3smith merged commit f4a0d8e into zotero:master Jan 4, 2018

GuyAglionby pushed a commit to GuyAglionby/translators that referenced this pull request Jan 28, 2018

Kstudy translator (zotero#1308)

193ce62

Using RIS data and author tweaks

jonschz pushed a commit to jonschz/translators that referenced this pull request Feb 8, 2018

Kstudy translator (zotero#1308)

d51cd53

Using RIS data and author tweaks



		function scrape(doc, url) {
		var key = url.match(/\bkey=(\d+)\b/)[1];

Kstudy translator #1308

Kstudy translator #1308

Conversation

yunusong commented May 26, 2017

fbennett commented May 27, 2017

avram commented May 27, 2017

zuphilip commented May 27, 2017

fbennett commented May 27, 2017 • edited by zuphilip Loading

zuphilip commented May 27, 2017

yunusong commented May 27, 2017 • edited Loading

yunusong commented May 27, 2017 via email

zuphilip commented May 27, 2017

zuphilip May 27, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yunusong commented May 27, 2017

Choose a reason for hiding this comment

zuphilip commented May 27, 2017

fbennett commented May 27, 2017

yunusong commented May 27, 2017

adam3smith commented May 28, 2017

yunusong commented May 28, 2017 • edited Loading

fbennett commented May 28, 2017 via email

yunusong commented May 29, 2017

zuphilip commented May 29, 2017

fbennett commented May 29, 2017

yunusong commented May 29, 2017

fbennett commented May 29, 2017 • edited Loading

zuphilip commented Jan 2, 2018

zuphilip commented Jan 2, 2018 • edited Loading

zuphilip commented Jan 2, 2018

zuphilip commented Jan 2, 2018

socheres commented Jan 2, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zuphilip commented Jan 2, 2018

zuphilip commented Jan 2, 2018

socheres commented Jan 2, 2018

socheres commented Jan 2, 2018

zuphilip commented Jan 2, 2018

zuphilip commented Jan 3, 2018

adam3smith left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zuphilip commented Jan 3, 2018

socheres commented Jan 3, 2018

yunusong commented Jan 3, 2018

adam3smith commented Jan 4, 2018

zuphilip commented Jan 4, 2018

fbennett commented May 27, 2017 •

edited by zuphilip

Loading

yunusong commented May 27, 2017 •

edited

Loading

zuphilip May 27, 2017 •

edited

Loading

yunusong commented May 28, 2017 •

edited

Loading

fbennett commented May 29, 2017 •

edited

Loading

zuphilip commented Jan 2, 2018 •

edited

Loading